WISE IF


Background

  Currently the users use crawlers to crawl web pages from the Internet and extractors to extract information
  they need. But no existing robots can the two together since the web pages are of various formats.
  While the need of a robot, who can crawl the web pages user need and extract accurate information from
  the pages crawled, is increasing.


Overview

  IF is such a system to meet the requirement. Through rule definition, IF can accurately know what kind of
  page to crawl and what information to extract. It's a powerful tool to help the user to find and collect accurate
  information.

Special Features

 

 ■ High performance and high quality

    - Gives best access to information and best content gathering functionality

    - Collects the data chosen by customer with precision

    - Can collect data from various sources including java script, certified pages and many other formats

 ■ Convenience of use

    - Convenient interface for management and use.

    - Combination of rule based and automatic collection process

    - Web based tool for Collection, Analysis and Storage

 ■ Stability

    - Stable and convenient system

    - Speedy processing of large scale data

    - Management of dead links to decrease the error in collection process


Main Features

 ■ Rule register

    Through IF, the user can decide what kind of page to crawl and information to extract. Rule register is
    an application with a web explorer built in, together with some other components. When browsing the web
    pages, the user can easily define the crawling rules, extracting rules and so on.

 ■ Crawler and extractor

    After the user define the rated rules, the crawler will crawl the pages according to the crawling rules and
    extractor will extract information according to the extracting rules. The information extracted will be saved
    into the database in a format pre-defined.

 ■ Web management tool

    A web tool to view statistic information of crawling tasks and so on.

 ■ Exporter

    The user can export the information from the database to a file or another database in the format defined.


System Architecture