source: nutchez-0.1/README.nutch @ 172

Last change on this file since 172 was 89, checked in by waue, 15 years ago
File size: 2.2 KB
Line 
1Apache Nutch README
2
3Important note: Due to licensing issues we cannot provide two libraries that
4are normally provided with PDFBox (jai_core.jar, jai_codec.jar), the parser
5library we use for parsing PDF files. If you encounter unexpected problems when
6working with PDF files please
7
81. download the two missing libraries  from:
9   http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/external/
10
112. Put them to directory src/plugin/parse-pdf/lib
123. follow the instructions in file src/plugin/parse-pdf/plugin.xml
134. Rebuild nutch.
14
15
16
17Interesting files include:
18
19
20  docs/api/index.html
21      Javadocs for the Nutch software.
22
23  CHANGES.txt
24      Log of changes to Nutch.
25
26
27For the latest information about Nutch, please visit our website at:
28
29   http://lucene.apache.org/nutch/
30
31and our wiki, at:
32
33   http://wiki.apache.org/nutch/
34
35To get started using Nutch read Tutorial:
36
37   http://lucene.apache.org/nutch/tutorial.html
38   
39Export Control
40
41This distribution includes cryptographic software.  The country in which you
42currently reside may have restrictions on the import, possession, use, and/or
43re-export to another country, of encryption software.  BEFORE using any encryption
44software, please check your country's laws, regulations and policies concerning the
45import, possession, or use, and re-export of encryption software, to see if this is
46permitted.  See <http://www.wassenaar.org/> for more information.
47
48The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has
49classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which
50includes information security software using or performing cryptographic functions with
51asymmetric algorithms.  The form and manner of this Apache Software Foundation
52distribution makes it eligible for export under the License Exception ENC Technology
53Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations,
54Section 740.13) for both object code and source code.
55
56The following provides more details on the included cryptographic software:
57
58Apache Nutch uses the PDFBox API in its parse-pdf plugin for extracting textual content
59and metadata from encrypted PDF files. See http://incubator.apache.org/pdfbox/ for more
60details on PDFBox.
Note: See TracBrowser for help on using the repository browser.