Tutorial Contents
[hide]Tutorial 1
This tutorial will show you how to work with the program to perform certain actions and scrape web data you want. We will configure the agent to capture the data including all the product names, price, images and save it into the Excel file.
1 Creating New Project
We create a new project for each new web site we want to extract data from. It is also possible to create several extraction
agents inside of one project, but it is not so comfortable. Click the Project
button in the Tool Bar
or in the File > New Project
Menu.
Enter a Project Name
in the dialog window. Click the Finish
button. A new project will appear in the Workspace
view in the right upper corner of the window.
2 Starting Page Navigation
Navigate to the starting page from which the agent will start working. For this enter URL of the starting page into the Navigation Bar
. The target url site is http://www.websundew.com/demo/
. Type this address. Press the Enter
or click
button.
Wait until navigation is over. Now we are ready to start creating Agent
that will capture product information from Demo Store
3 Creating New Agent
Click the Agent
button in the Tool Bar or in the File > New Agent
Menu.
You will see and Agent Configuration Wizard in which you can enter the Agent name,
starting URL list, etc. In this extraction project we have only one URL, so click the Finish
button. You will see Agent Editor
.
4 Configuring the Agent
On the left hand corner of the Agent Editor
there is an Agent Diagram
. This Diagram shows the Agent's states. Init State
is the initial state from which
the Agent starts working. This state loads the initial page. Page 1
state reflects the loaded page. To the right there is a Browser Window
.
5 Capturing Data
Click the Capture
button in the Tool Bar
.
To capture the data we first need to create Data Extraction Pattern
. Select Data Iterator Pattern
that extracts
the data displayed as a list or a table. Click the Finish
button. On the left hand corner there will appear an Iterator Pattern Wizard
.
Click on the item you want
to extract (i.e. product image). It will be highlighted in green.
Click Add
.
Repeat it for every required item (that is name and price) You can change default fields names Field1, Field2, Field3 by clicking on it's name.
Now we are ready to lookup for patterns. Click Find
. The program will find patterns which return data extracted from this page.
Select 10 rows. All the items will be highlighted in blue on the page. Click the Next
button in the Wizard. Enter a Name of the Pattern. Click Finish
.
Capture Data Dialog Window appears. Click Finish
.
6 Linked Pages Navigation
As the web site contains linked pages, click the Paginator
button in the Tool Bar
to create a Paginator
which will enable the Agent
to visit all the linked pages and extract
data from all of them.
Select Simple Next Page Pattern
in the dialog that appears. Click the Finish
button. Simple Next Page Wizard
will appear on the left hand window.
Click Next Page Link
in the browser window.
The Simple Next Page Wizard
will look like
Click Next
in the Simple Next Page Wizard
at the bottom of the page. Enter the name of the Next Page Pattern
.
Click Finish
in the Simple Next Page Wizard
in the bottom of the page. Paginator dialog appears. Click the Finish
button.
7 Saving Data
Click Datasource
button in the Tool Bar to create Datasourse.
Datasourse Wizard
will appear. Select the format you want. In our case it will Excel. Select Excel
. Click the Next
button.
Select the Agent
and mark the fields you want to save.
Click Next
. The Excel
datasource configuration page will appears.
Click Next
if you want to use default settings. Enter Datasource
name. Click the Finish
button.
8 Running the Agent
We have created the Data Extraction Agent
. Now we can use it to extract and save the data we want. Click the Run
button in the Tool Bar
.
Agent will start. Wait till the Agent
completes working.
A dialog window will appear.
You can see the results of the agent's work and the path to the saved file.
Select data source then click Open
to view result.



