Friday, October 19, 2007

Web-Harvest is Open Source Web Data Extraction tool written in Java.



Web-Harvest is Open Source Web Data Extraction tool written in Java.
It offers a way to collect desired Web pages and extract useful
data from them. In order to do that, it leverages well
established techniques and technologies for text/xml manipulation such as
XSLT, XQuery and Regular Expressions. Web-Harvest
mainly focuses on HTML/XML based web sites which still make vast majority of
the Web content. On the other hand, it could be easily supplemented by custom Java
libraries in order to augment its extraction capabilities.

0 comments: