Sunday, April 22, 2007

How-to Nutch sous Windows

bin/nutch crawl urls -dir crawl -depth 3 >& crawl.log

Intranet Crawling

Web Interface for Search

Open up a web browser and navigate to the Tomcat webapps manager (e.g. [WWW] http://localhost:8080/manager/html) and upload the nutch WAR file to the context.

In your Environment Variables settings, add NUTCH_JAVA_HOME and the location of your JVM (e.g. C:\j2sdk1.4.2_09) as a new Environment Variable