Friday, October 19, 2007

Using OpenSearch with Nutch

Using the OpenSearch API

OpenSearch is an
extension of RSS 2.0 for publishing search engine results, and was
developed by, the search engine
owned by Nutch supports OpenSearch 1.0 out of the box.
The OpenSearch results for the search in Figure 1 can be accessed
by clicking on the RSS link in the bottom right-hand corner of the
page. This is the XML that is returned:

This document is an RSS 2.0 document, where each hit is
represented by an item element. Notice the two extra
namespaces, opensearch and nutch, which
allow search-specific data to be included in the RSS document. For
example, the opensearch:totalResults element tells you
the number of search results available (not just those returned in
this page). Nutch also defines its own extensions, allowing
consumers of this document to access page metadata or related
resources, such as the cached content of a page, via the URL in the
nutch:cache element.