Scrape Data From Local Web Files

WebSundew allows to extract data from local HTML files. You may create Agent or Extractor. In this walkthrough we will use Agent. If you are not familiar with Agent, Extractor and other concepts you may read about them here

If you did not download and install WebSundew you can find information here and follow instructions. If you downloaded and installed WebSundew start it by clicking on desktop icon or from OS menu.

Step 1 - Create New Project

Click New Project in the application toolbar.

Create New Project

Step 2 - Create New Agent

Click New Agent in the application toolbar.

Create New Agent

New agent dialog will appear:

Select Startup Mode

Select Local Files. The agent's start up mode will change. Select folder with target HTML files. You can add several folders to process, just click Add Folder. Also you can configure file filter Allowed Files to include only files that match pattern. In this case it will be files with html or htm extension.

You may preview collected files by clicking Preview. Click Finihs to complete creating the agent. The agent's editor will open. The content of the first HTML file will be available in the browser part of the agent editor.

Other Steps

You configured agent that searches HTML files in the folders you provided. Now you need to capture and export extracted data. These steps depend on the HTML file structure and require export format. You can read more about capture and about export. Also you can read our tutorials.

Edit Agent

You can modify folders properties after you created the agent:

  1. You need to open Agent for editing.
  2. Select Loop in the agent's graph:

Click Loop

  1. Open Properties View and modify folders.

Edit File Iterator