wiki:waue/2010/0526

Version 1 (modified by waue, 14 years ago) (diff)

--

  1. Introduction

The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also support all the properties from the JCifs library.

You can find more information on the following site: http://jcifs.samba.org/ The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).

  1. Installation

1) Binaries only:

The protocol-smb files can be found in the ../plugins directory.

Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.

Put the "smb.properties" file in the NUTCHHOME/conf directory.

Configure the properties in "smb.properties" file

Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory

e.g. <property>

<name>plugin.includes</name> <value>protocol-smb| other plugins...</value> <description>

</description>

</property>