
Currently the users use crawlers to crawl web pages from the Internet and extractors to extract information
they need. But no existing robots can the two together since the web pages are of various formats.
While the need of a robot, who can crawl the web pages user need and extract accurate information from
the pages crawled, is increasing.
■ High performance and high quality
- Gives best access to information and best content gathering functionality
- Collects the data chosen by customer with precision
- Can collect data from various sources including java script, certified pages and many other formats
■ Convenience of use
- Convenient interface for management and use.
- Combination of rule based and automatic collection process
- Web based tool for Collection, Analysis and Storage
■ Stability
- Stable and convenient system
- Speedy processing of large scale data
- Management of dead links to decrease the error in collection process
■ Rule register
Through IF, the user can decide what kind of page to crawl and information to extract. Rule register is
an application with a web explorer built in, together with some other components. When browsing the web
pages, the user can easily define the crawling rules, extracting rules and so on.
■ Crawler and extractor
After the user define the rated rules, the crawler will crawl the pages according to the crawling rules and
extractor will extract information according to the extracting rules. The information extracted will be saved
into the database in a format pre-defined.
■ Web management tool
A web tool to view statistic information of crawling tasks and so on.
The user can export the information from the database to a file or another database in the format defined.
