Tutorial 1

This tutorial will show you how to work with the program to perform certain actions and scrape web data you want. We will configure the agent to capture the data including all the product names, price, images and save it into the Excel file.

1 Creating New Project

We create a new project for each new web site we want to extract data from. It is also possible to create several extraction agents inside of one project, but it is not so comfortable. Click the Project button in the Tool Bar or in the File > New Project Menu.

Enter a Project Name in the dialog window. Click the Finish button. A new project will appear in the Workspace view in the right upper corner of the window.

2 Starting Page Navigation

Navigate to the starting page from which the agent will start working. For this enter URL of the starting page into the Navigation Bar . The target url site is http://www.websundew.com/demo/ . Type this address. Press the Enter or click button.

Wait until navigation is over. Now we are ready to start creating Agent that will capture product information from Demo Store

3 Creating New Agent

Click the Agent button in the Tool Bar or in the File > New Agent Menu.

You will see and Agent Configuration Wizard in which you can enter the Agent name, starting URL list, etc. In this extraction project we have only one URL, so click the Finish button. You will see Agent Editor .

4 Configuring the Agent

On the left hand corner of the Agent Editor there is an Agent Diagram . This Diagram shows the Agent's states. Init State is the initial state from which the Agent starts working. This state loads the initial page. Page 1 state reflects the loaded page. To the right there is a Browser Window .

5 Capturing Data

Click the Capture button in the Tool Bar .
To capture the data we first need to create Data Extraction Pattern . Select Data Iterator Pattern that extracts the data displayed as a list or a table. Click the Finish button. On the left hand corner there will appear an Iterator Pattern Wizard .
Click on the item you want to extract (i.e. product image). It will be highlighted in green.
Click Add .

Repeat it for every required item (that is name and price) You can change default fields names Field1, Field2, Field3 by clicking on it's name. Now we are ready to lookup for patterns. Click Find . The program will find patterns which return data extracted from this page.

Select 10 rows. All the items will be highlighted in blue on the page. Click the Next button in the Wizard. Enter a Name of the Pattern. Click Finish . Capture Data Dialog Window appears. Click Finish .

6 Linked Pages Navigation

As the web site contains linked pages, click the Paginator button in the Tool Bar to create a Paginator which will enable the Agent to visit all the linked pages and extract data from all of them.
Select Simple Next Page Pattern in the dialog that appears. Click the Finish button. Simple Next Page Wizard will appear on the left hand window. Click Next Page Link in the browser window.

The Simple Next Page Wizard will look like

Click Next in the Simple Next Page Wizard at the bottom of the page. Enter the name of the Next Page Pattern . Click Finish in the Simple Next Page Wizard in the bottom of the page. Paginator dialog appears. Click the Finish button.

7 Saving Data

Click Datasource button in the Tool Bar to create Datasourse.
Datasourse Wizard will appear. Select the format you want. In our case it will Excel. Select Excel . Click the Next button. Select the Agent and mark the fields you want to save.

Click Next . The Excel datasource configuration page will appears.

Click Next if you want to use default settings. Enter Datasource name. Click the Finish button.

8 Running the Agent

We have created the Data Extraction Agent . Now we can use it to extract and save the data we want. Click the Run button in the Tool Bar .

Agent will start. Wait till the Agent completes working. A dialog window will appear.

You can see the results of the agent's work and the path to the saved file. Select data source then click Open to view result.

Page Modified 03.02.12 12:50