Index: /nutchez-0.1/CHANGES.txt
===================================================================
--- /nutchez-0.1/CHANGES.txt	(revision 72)
+++ /nutchez-0.1/CHANGES.txt	(revision 73)
@@ -1,1188 +1,2 @@
-Nutch Change Log
+nutchez Change Log
 
-Release 1.0 - 2009-03-23
-
- 1. NUTCH-474 - Fetcher2 crawlDelay and blocking fix (Dogacan Guney via ab)
-
- 2. NUTCH-443 - Allow parsers to return multiple Parse objects.
-    (Dogacan Guney et al, via ab)
-
- 3. NUTCH-393 - Indexer should handle null documents returned by filters.
-    (Eelco Lempsink via ab)
-
- 4. NUTCH-456 - Parse msexcel plugin speedup (Heiko Dietze via siren)
-
- 5. NUTCH-446 - RobotRulesParser should ignore Crawl-delay values of other
-    bots in robots.txt (Dogacan Guney via siren)
-
- 6. NUTCH-482 - Remove redundant plugin lib-log4j (siren)
- 
- 7. NUTCH-483 - Remove redundant commons-logging jar from ontology plugin
-    (siren)
-
- 8. NUTCH-161 - Change Plain text parser to
-    use parser.character.encoding.default property for fall back encoding
-    (KuroSaka TeruHiko, siren)
-
- 9. NUTCH-61 - Support for adaptive re-fetch interval and detection of
-    unmodified content. (ab)
-
-10. NUTCH-392 - OutputFormat implementations should pass on Progressable.
-    (cutting via ab)
-
-11. NUTCH-495 - Unnecessary delays in Fetcher2 (dogacan)
-
-12. NUTCH-443 - allow parsers to return multiple Parse object, this will speed 
-    up the rss parser (dogacan via mattmann). This update is a fix and semantics
-    change from the original patch for NUTCH-443. The original patch did not tell
-    the  Indexer to read crawl_parse too so that it can pickup sub-urls' fetch 
-    datums. This patch addresses that issue. Now, if Fetcher gets a null content, 
-    instead of pushing an empty content, it filters the null content.
-    
-13. NUTCH-485 - Change HtmlParseFilter 's to return ParseResult object instead of 
-    Parse object. (Gal Nitzan via dogacan)
-
-14. NUTCH-489 - URLFilter-suffix management of the url path when the url contains 
-    some query parameters. (Emmanuel Joke via dogacan)
-
-15. NUTCH-502 - Bug in SegmentReader causes infinite loop. 
-    (Ilya Vishnevsky via dogacan)
-    
-16. NUTCH-444 Possibly use a different library to parse RSS feed for improved 
-    performance and compatibility. This patch introduced a new plugin, feed,
-    that includes an index filter and a parse plugin for feeds that uses ROME.
-    There was discussion to remove parse-rss, in light of the feed plugin, 
-    however, this patch does not explicitly remove parse-rss. (dogacan, mattmann)
-
-17. NUTCH-471 - Fix synchronization in NutchBean creation. 
-    (Enis Soztutar via dogacan)
-
-18. Upgrade to Lucene 2.2.0 and Hadoop 0.12.3. (ab)
-
-19. NUTCH-468 - Scoring filter should distribute score to all outlinks at 
-    once. (dogacan)
-
-20. NUTCH-504 - NUTCH-443 broke parsing during fetching. (dogacan)
-
-21. NUTCH-497 -  Extreme Nested Tags causes StackOverflowException in 
-	DomContentUtils...Spider Trap. (kubes)
-
-22. NUTCH-434 - Replace usage of ObjectWritable with something based on 
-    GenericWritable. (dogacan)
-
-23. NUTCH-499 - Refactor LinkDb and LinkDbMerger to reuse code. (dogacan)
-
-24. NUTCH-498 - Use Combiner in LinkDb to increase speed of linkdb generation.
-    (Espen Amble Kolstad via dogacan)
-
-25. NUTCH-507 - lib-lucene-analyzers jar defintion is wrong in plugin.xml.
-    (Emmanuel Joke via dogacan)
-
-26. NUTCH-503 - Generator exits incorrectly for small fetchlists. 
-    (Vishal Shah via dogacan)
-
-27. NUTCH-505 - Outlink urls should be validated. (dogacan)
-
-28. NUTCH-510 - IndexMerger delete working dir. (Enis Soztutar via dogacan)
-
-29. NUTCH-513 - suffix-urlfilter.txt does not have a template. (dogacan)
-
-30. NUTCH-515 - Next fetch time is set incorrectly. (dogacan)
-
-30. NUTCH-506 - Nutch should delegate compression to Hadoop. (dogacan)
-
-31. NUTCH-517 - build encoding should be UTF-8. (Enis Soztutar via dogacan).
-
-32. NUTCH-518 - Fix OpicScoringFilter to respect scoring filter chaining.
-    (Enis Soztutar via dogacan)
-
-33. NUTCH-516 - Next fetch time is not set when it is a 
-    CrawlDatum.STATUS_FETCH_GONE. (Emmanuel Joke via dogacan)
-
-34. NUTCH-525 - DeleteDuplicates generates ArrayIndexOutOfBoundsException 
-    when trying to rerun dedup on a segment. (Vishal Shah via dogacan)
-
-35. NUTCH-514 - Indexer should only index pages with fetch status SUCCESS.
-    (dogacan) Note: There is a bigger problem, i.e how to deal
-    with redirected pages, and this issue can be considered as a band-aid 
-    for the time being. See NUTCH-273 and NUTCH-353 for more details. 
-
-36. NUTCH-533 - LinkDbMerger: url normalized is not updated in the key and 
-    inlinks list. (Emmanuel Joke via dogacan)
-
-37. NUTCH-535 -ParseData's contentMeta accumulates unnecessary values during 
-    parse. (dogacan)
-
-38. NUTCH-522 - Use URLValidator in the Injector. (Emmanuel Joke, dogacan)
-
-39. NUTCH-536 - Reduce number of warnings in nutch core. (dogacan)
-
-40. NUTCH-439 - Top Level Domains Indexing / Scoring. Also adds 
-    domain-related utilities. (Enis Soztutar via dogacan)
-
-41. NUTCH-544 - Upgrade Carrot2 clustering plugin to the newest stable 
-    release (2.1). (Dawid Weiss via dogacan)
-
-42. NUTCH-545 - Configuration and OnlineClusterer get initialized in every
-    request. (Dawid Weiss via dogacan)
-
-43. NUTCH-532 - CrawlDbMerger: wrong computation of last fetch time. 
-    (Emmanuel Joke via dogacan)
-
-44. NUTCH-550 - Parse fails if db.max.outlinks.per.page is -1. (dogacan)
-
-45. NUTCH-546 - file URL are filtered out by the crawler. (dogacan)
-
-46. NUTCH-554 - Generator throws IOException on invalid urls.
-    (Brian Whitman via ab)
-
-47. NUTCH-529 - NodeWalker.skipChildren doesn't work for more than 1 child.
-    (Emmanuel Joke via dogacan)
-
-48. NUTCH-25 - needs 'character encoding' detector.
-    (Doug Cook, dogacan, Marcin Okraszewski, Renaud Richardet via dogacan)
-
-49. NUTCH-508 - ${hadoop.log.dir} and ${hadoop.log.file} are not propagated
-    to the tasktracker. (Mathijs Homminga, Emmanuel Joke via dogacan)
-    
-50. NUTCH-562 - Port mime type framework to use Tika mime detection framework.
-    (mattmann)
-    
-51. NUTCH-488 - Avoid parsing uneccessary links and get a more relevant outlink 
-    list. (Emmanuel Joke, Marcin Okraszewski via kubes)
-
-52. NUTCH-501 -  Implement a different caching mechanism for objects cached in
-    configuration. (dogacan)
-
-53. NUTCH-552 - Upgrade Nutch to Hadoop 0.15.x. (kubes)
-
-54. NUTCH-565 - Arc File to Nutch Segments Converter. (kubes)
-
-55. NUTCH-547 - Redirection handling: YahooSlurp's algorithm.
-    (dogacan, kubes via dogacan)
-
-56. NUTCH-548 - Move URLNormalizer from Outlink to ParseOutputFormat.
-    (Emmanuel Joke via dogacan)
-
-57. NUTCH-538 - Delete unused classes under o.a.n.util. (dogacan)
-
-58. NUTCH-494 - FindBugs: CrawlDbReader and DeleteDuplicates. (dogacan)
-
-59. NUTCH-574 - Including inlink anchor text in index can create irrelevant 
-    search results.  Created index-anchor plugin, removed functionality from 
-    index-basic plugin. For backwards compatibility, add index-anchor plugin to 
-    nutch-site.xml plugin.includes. (kubes)
-
-60. NUTCH-581 - DistributedSearch does not update search servers added to 
-    search-servers.txt on the fly.  (Rohan Mehta via kubes)
-
-61. NUTCH-586 - Add option to run compiled classes without job file
-    (enis via ab)
-
-62. NUTCH-559 - NTLM, Basic and Digest Authentication schemes for web/proxy
-    server. (Susam Pal via dogacan)
-
-63. NUTCH-534 - SegmentMerger: add -normalize option (Emmanuel Joke via ab)
-
-64. NUTCH-528 - CrawlDbReader: add some new stats + dump into a CSV format
-    (Emmanuel Joke via ab)
-
-65. NUTCH-597 - NPE in Fetcher2 (Remco Verhoef via ab)
-
-66. NUTCH-584 - urls missing from fetchlist (Ruslan Ermilov, ab)
-
-67. NUTCH-580 - Remove deprecated hadoop api calls (FS) (siren)
-
-68. NUTCH-587 - Upgrade to Hadoop 0.15.3 (kubes)
-
-69. NUTCH-604 - Upgrade to Lucene 2.3.0 (ab)
-
-70. NUTCH-602 - Allow configurable number of handlers for search servers
-    (hartbecke via kubes)
-
-71. NUTCH-607 - Update build.xml to include tika jar when building war (kubes)
-
-72. NUTCH-608 - Upgrade nutch to use released apache-tika-0.1-incubating (mattmann)
-
-73. NUTCH-606 - Refactoring of Generator, run all urls through checks (kubes)
-
-74. NUTCH-605 - Change deprecated configuration methods for Hadoop (kubes)
-
-75. NUTCH-603 - Add more default url normalizations (kubes)
-
-76. NUTCH-611 - Upgrade Nutch to use Hadoop 0.16 (kubes)
-
-77. NUTCH-44 - Too many search results, limits max results returned from a 
-    single search. (Emilijan Mirceski and Susam Pal via kubes)
-
-78. NUTCH-567 - Proper (?) handling of URIs in TagSoup. TagSoup library is
-    updated to 1.2 version. (dogacan)
-
-79. NUTCH-613 - Empty summaries and cached pages (kubes via ab)
-
-80. NUTCH-612 - URL filtering was disabled in Generator when invoked
-    from Crawl (Susam Pal via ab)
-
-81. NUTCH-601 - Recrawling on existing crawl directory (Susam Pal via ab)
-
-82. NUTCH-575 - NPE in OpenSearchServlet (John H. Lee via ab)
-
-83. NUTCH-126 - Fetching https does not work with a proxy (Fritz Elfert via ab)
-
-84. NUTCH-615 - Redirected URL-s fetched without setting fetchInterval.
-    Guard against reprUrl being null. (Emmanuel Joke, ab)
-
-85. NUTCH-616 - Reset Fetch Retry counter when fetch is successful (Emmanuel
-    Joke, ab)
-
-86. NUTCH-220 - Upgrade to PDFBox 0.7.3 (ab)
-
-87. NUTCH-223 - Crawl.java uses Integer.MAX_VALUE (Jeff Ritchie via ab)
-
-88. NUTCH-598 - Remove deprecated use of ToolBase. Use generics in Hadoop API.
-    (Emmanuel Joke, dogacan, ab)
-
-89. NUTCH-620 - BasicURLNormalizer should collapse runs of slashes with a
-    single slash. (Mark DeSpain via ab)
-
-90. NUTCH-500 - Add hadoop masters configuration file into conf folder. 
-    (Emmanuel Joke via kubes)
-
-91. NUTCH-596 - ParseSegments parse content even if its not
-    CrawlDatum.STATUS_FETCH_SUCCESS (dogacan)
-    
-92. NUTCH-618 - Tika error "Media type alias already exists" (mattmann,kubes)
-
-93. NUTCH-634 - Upgrade Nutch to Hadoop 0.17.1 (Michael Gottesman, Lincoln
-    Ritter, ab)
-
-94. NUTCH-641 - IndexSorter inorrectly copies stored fields (ab)
-
-95. NUTCH-645 - Parse-swf unit test failing (ab)
-
-96. NUTCH-642 - Unit tests fail when run in non-local mode (ab)
-
-97. NUTCH-639 - Change LuceneDocumentWrapper visibility from
-    private to _public_ (Guillaume Smet via dogacan)
-
-98. NUTCH-651 - Remove bin/{start|stop}-balancer.sh from svn
-    tracking. (dogacan)
-
-99. NUTCH-375 - Add support for Content-Encoding: deflated
-    (Pascal Beis, ab)
-
-100. NUTCH-633 - ParseSegment no longer allow reparsing.
-     (dogacan)
-
-101. NUTCH-653 - Upgrade to hadoop 0.18. (dogacan)
-
-102. NUTCH-621 - Nutch needs to declare it's crypto usage (mattmann)
-
-103. NUTCH-654 - urlfilter-regex's main does not work.
-     (dogacan)
-
-104. NUTCH-640 - confusing description "set it to Integer.MAX_VALUE".
-     (dogacan)
-     
-105. NUTCH-662 - Upgrade Nutch to use Lucene 2.4. (kubes)
-
-106. NUTCH-663 - Upgrade Nutch to use Hadoop 0.19 (kubes)
-
-107. NUTCH-647 - Resolve URLs tool (kubes)
-
-108. NUTCH-665 - Search Load Testing Tool (kubes)
-
-109. NUTCH-667 - Input Format for working with Content in Hadoop Streaming
-                 (kubes)
-
-110. NUTCH-635 -  LinkAnalysis Tool for Nutch. (kubes)
-
-111. NUTCH-646 -  New Indexing Framework for Nutch. (kubes)
-
-112. NUTCH-668 -  Domain URL Filter. (kubes)
-
-113. NUTCH-594 -  Serve Nutch search results in multiple formats including 
-                  XML and JSON. (kubes)
-
-114. NUTCH-442 - Integrate Solr/Nutch. (dogacan, original version by siren) 
-
-115. NUTCH-652 - AdaptiveFetchSchedule#setFetchSchedule doesn't calculate
-                 fetch interval correctly. (dogacan)
-
-116. NUTCH-627 - Minimize host address lookup (Otis Gospodnetic)
-
-117. NUTCH-678 - Hadoop 0.19 requires an update of jets3t.
-                 (julien nioche via dogacan)
-
-118. NUTCH-681 - parse-mp3 compilation problem. 
-                 (Wildan Maulana via dogacan)
-
-119. NUTCH-676 - MapWritable is written inefficiently and confusingly.
-                 (dogacan)
-
-120. NUTCH-579 - Feed plugin only indexes one post per feed due to identical
-                 digest. (dogacan)
-
-121. NUTCH-571 - parse-mp3 plugin doesn't always index album of mp3.
-                 (Joseph Chen, dogacan)
-
-122. NUTCH-682 - SOLR indexer does not set boost on the document.
-                 (julien nioche via dogacan)
-
-123. NUTCH-279 - Additions to urlnormalizer-regex (Stefan Neufeind, ab)
-
-124. NUTCH-671 - JSP errors in Nutch searcher webapp (Edwin Chu via ab)
-
-125. NUTCH-643 - ClassCastException in PDF parser (Guillaume Smet, ab)
-
-126. NUTCH-636 - Httpclient plugin https doesn't work on IBM JRE
-     (Curtis d'Entremont, ab)
-
-127. NUTCH-683 - NUTCH-676 broke CrawlDbMerger. (dogacan)
-
-128. NUTCH-631 - MoreIndexingFilter fails with NoSuchElementException
-     (Stefan Will, siren)
-     
-129. NUTCH-691 - Update jakarta poi jars to the most relevant version
-     (Dmitry Lihachev via siren)
-
-130. NUTCH-563 - Include custom fields in BasicQueryFilter
-     (Julien Nioche via siren)
-     
-131. NUTCH-695 - Incorrect mime type detection by MoreIndexingFilter plugin
-     (Dmitry Lihachev via siren)
-     
-132. NUTCH-694 - Distributed Search Server fails (siren)
-
-133. NUTCH-626 - Fetcher2 breaks out the domain with db.ignore.external.links
-     set at cross domain redirects (Remco Verhoef, dogacan via siren)
-
-134. NUTCH-247 - Robot parser to restrict (kubes, siren)
-
-135. NUTCH-698 - CrawlDb is corrupted after a few crawl cycles (dogacan
-     via siren)
-     
-136. NUTCH-699 - Add an "official" solr schema for solr integration (dogacan,
-     Dmitry Lihachev via siren)
-
-137. NUTCH-703 - Upgrade to Hadoop 0.19.1 (ab)
-
-138. NUTCH-419 - Unavailable robots.txt kills fetch (Carsten Lehmann,
-     Doug Cook via ab)
-     
-139. NUTCH-700 - Neko1.9.11 goes into a loop (Julien Nioche, siren)
-
-140. NUTCH-669 - Consolidate code for Fetcher and Fetcher2 (siren)
-
-141. NUTCH-711 - Indexer failing after upgrade to Hadoop 0.19.1 (ab)
-
-142. NUTCH-684 - Dedup support for Solr. (dogacan)
-
-143. NUTCH-715 - Subcollection plugin doesn't work with default
-     subcollections.xml file (Dmitry Lihachev via siren)
-     
-144. NUTCH-722 - Nutch contains JAI jars that we cannot redistribute
-
-Release 0.9 - 2007-04-02
-
- 1. Changed log4j confiquration to log to stdout on commandline
-    tools (siren)
-
- 2. NUTCH-344 - Fix for thread blocking issue (Greg Kim via siren)
- 
- 3. NUTCH-260 - Update hadoop version to 0.5.0 (Renaud Richardet,
-    siren)
-
- 4. Optionally skip pages with abnormally large values of Crawl-Delay
-    (Dennis Kubes via ab)
-
- 5. Change readdb -stats to use CombiningCollector (ab)
-
- 6. NUTCH-348 - Fix Generator to select highest scoring pages (Chris
-    Schneider and Stefan Groschupf via ab)
-
- 7. NUTCH-347 - Adjust plugin build script not to emit warnings when copying
-    dependant jars (siren)
-    
- 8. NUTCH-338 - Remove the text parser as an option for parsing PDF files
-    in parse-plugins.xml (Chris A. Mattmann via siren)
-    
- 9. NUTCH-105 - Network error during robots.txt fetch causes file to
-    be ignored (Greg Kim via siren)
-    
-10. NUTCH-367 - DistributedSearch thown ClassCastException (siren)
-
-11. NUTCH-332 - Fix the problem of doubling scores caused by links pointing
-    to the current page (e.g. anchors). (Stefan Groschupf via ab)
-
-12. NUTCH-365 - Flexible URL normalization (ab)
-
-13. NUTCH-336 - Differentiate between newly discovered pages and newly
-    injected pages (Chris Schneider via ab) NOTE: this changes the
-    scoring API, filter implementations need to be updated.
-
-14. NUTCH-337 - Fetcher ignores the fetcher.parse value (Stefan Groschupf
-    via ab)
-
-15. NUTCH-350 - Urls blocked by http.max.delays incorrectly marked as GONE
-    (Stefan Groschupf via ab)
-
-16. NUTCH-374 - when http.content.limit be set to -1 and  
-    Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing 
-    (King Kong via pkosiorowski)
-
-17. NUTCH-383 - upgrade to Hadoop 0.7.1 and Lucene 2.0.0. (ab)
-
-  ****************************** WARNING !!! ********************************
-  * This upgrade breaks data format compatibility. A tool 'convertdb'       *
-  * was added to migrate existing CrawlDb-s to the new format. Segment data *
-  * can be partially migrated using 'mergesegs', however segments will      *
-  * require re-parsing (and consequently re-indexing).                      *
-  ****************************** WARNING !!! ********************************
-
-18. NUTCH-371 - DeleteDuplicates now correctly implements both parts of
-    the algorithm. (ab)
-
-19. NUTCH-391 - ParseUtil logs file contents to log file when it cannot
-    find parser (siren)
-
-20. NUTCH-379 - ParseUtil does not pass through the content's URL to the
-    ParserFactory (Chris A. Mattmann via siren)
-
-21. NUTCH-361, NUTCH-136 - When jobtracker is 'local' generate only one
-    partition. (ab)
-
-22. NUTCH-399 - Change CommandRunner to use concurrent api from jdk (siren)
-
-23. NUTCH-395 - Increase fetching speed (siren)
-
-24. NUTCH-388 - nutch-default.xml has outdated example for urlfilter.order
-    (reported by Jared Dunne)
-
-25. NUTCH-404 - Fix LinkDB Usage - implementation mismatch (siren)
-
-26. NUTCH-403 - Make URL filtering optional in Generator (siren)
-
-27. NUTCH-405 - Content object is not properly initialized in map method
-    of ParseSegment (siren)
-
-28. NUTCH-362 - Remove parse-text from unsupported filetypes in
-    parse-plugins.xml (siren)
-    
-29. NUTCH-305 - Update crawl and url filter lists to exclude
-    jpeg|JPEG|bmp|BMP, suffix-urlfilter.txt (contributed by Stefan
-    Neufeind) is also updated (siren)
-    
-30. NUTCH-406 - Metadata tries to write null values (mattmann)
-
-31. NUTCH-415 - Generator should mark selected records in CrawlDb. 
-    Due to increased resource consumption this step is optional. 
-    Application-level locking has been added to prevent concurrent
-    modification of databases. (ab)
-
-32. NUTCH-416 - CrawlDatum status and CrawlDbReducer refactoring. It is
-    now possible to correctly update CrawlDb from multiple segments.
-    Introduce new status codes for temporary and permanent
-    redirection. (ab)
-
-33. NUTCH-322 - Fix Fetcher to store redirected pages and to store
-    protocol-level status. This also should fix NUTCH-273. (ab)
-
-34. Change default Fetcher behavior not to follow redirects immediately.
-    Instead Fetcher will record redirects as new pages to be added to CrawlDb.
-    This also partially addresses NUTCH-273. (ab)
-
-35. Detect and report when Generator creates 0-sized segments. (ab)
-
-36. Fix Injector to preserve already existing CrawlDatum if the seed list
-    being injected also contains such URL. (ab)
-
-37. NUTCH-425, NUTCH-426 - Fix anchors pollution. Continue after
-    skipping bad URLs. (Michael Stack via ab)
-
-38. NUTCH-325 - UrlFilters.java throws NPE in case urlfilter.order contains
-    Filters that are not in plugin.includes (Stefan Groschupf, siren)
-    
-39. NUTCH-421 - Allow predeterminate running order of indexing filters
-    (Alan Tanaman, siren)
-
-40. When indexing pages with redirection, drop all intermediate pages and
-    index only the final page. (ab)
-
-41. Upgrade to Hadoop 0.10.1. (ab)
-
-42. NUTCH-420 - Fix a bug in DeleteDuplicates where results depended on the
-    order in which IndexDoc-s are processed. (Dogacan Guney via ab)
-
-43. NUTCH-428 - NullPointerException thrown when agent name is not
-    configured properly. Changed to throw RuntimeException instead.
-    (siren)
-
-44. NUTCH-430 - Integer overflow in HashComparator.compare (siren)
-
-45. NUTCH-68 - Add a tool to generate arbitrary fetchlists. (ab)
-
-46. NUTCH-433 - java.io.EOFException in newer nightlies in mergesegs
-    or indexing from hadoop.io.DataOutputBuffer (siren)
-
-47. NUTCH-339 - Fetcher2: a queue-based fetcher implementation. (ab)
-
-48. NUTCH-390 - Javadoc warnings (mattmann)
-
-49. NUTCH-449 - Make junit output format configurable. (nigel via cutting)
-
-50. NUTCH-432 - Fix a bug where platform name with spaces would break the
-    bin/nutch script. (Brian Whitman via ab)
-
-51. Upgrade to Hadoop 0.11.2 and Lucene 2.1.0 release. (ab)
-
-52. NUTCH-167 - Observation of robots "noarchive" directive. (ab)
-
-53. NUTCH-384 - Protocol-file plugin does not allow the parse plugins
-    framework to operate properly (Heiko Dietze via mattmann)
-
-54. NUTCH-233 - Wrong regular expression hangs reduce process forever (Stefan
-    Groschupf via kubes)
-    
-55. NUTCH-436 - Incorrect handling of relative paths when the embedded URL 
-    path is empty (kubes)
-
-56. Upgrade to Hadoop 0.12.1 release. (ab)
-
-57. NUTCH-246 - Incorrect segment size being generated due to time
-    synchronization issue (Stefan Groschupf via ab)
-
-58. Upgrade to Hadoop 0.12.2 release. (ab)
-
-59. NUTCH-333 - SegmentMerger and SegmentReader should use NutchJob. (Michael
-    Stack and Dogacan Guney via kubes)
-
-Release 0.8 - 2006-07-25
-
- 0. Totally new architecture, based on hadoop
-    [http://lucene.apache.org/hadoop] (cutting)
-
- 1. NUTCH-107 - Typo in plugin/urlfilter-*/plugin.xml. (Stephen Cross).
-
- 2. NUTCH-108 - Log hosts that exceed generate.max.per.host.
-    (Rod Taylor via cutting)
-
- 3. NUTCH-88 - Enhance ParserFactory plugin selection policy
-    (jerome)
-
- 4. NUTCH-124 - Protocol-httpclient does not follow redirects when 
-    fetching robots.txt (cutting)
-
- 5. NUTCH-130 - Be explicit about target JVM when building (1.4.x?)
-    (stack@archive.org, cutting)
-
- 6. NUTCH-114 -	Getting number of urls and links from crawldb
-    (Stefan Groschupf via ab)
-
- 7. NUTCH-112 - Link in cached.jsp page to cached content is an 
-    absolute link (Chris A. Mattmann via jerome)
-
- 8. NUTCH-135 - Http header meta data are case insensitive in the
-    real world (Stefan Groschupf via jerome)
-
- 9. NUTCH-145 - Build of war file fails on Chinese (zh) .xml files due
-    to UTF-8 BOM (KuroSaka TeruHiko via siren)
-
-10. NUTCH-121 - SegmentReader for mapred (Rod Taylor via ab)
-
-11. Added support for OpenSearch (cutting)
-
-12. NUTCH-142 - NutchConf should use the thread context classloader
-    (Mike Cannon-Brookes via pkosiorowski)
-
-13. NUTCH-160 - Use standard Java Regex library rather than
-    org.apache.oro.text.regex (Rod Taylor via cutting)
-
-14. NUTCH-151 - CommandRunner can hang after the main thread exec is
-    finished and has inefficient busy loop (Paul Baclace via cutting)
-
-15. NUTCH-174 - Problem encountered with ant during compilation
-
-16. NUTCH-190 - ParseUtil drops reason for failed parse
-    (stack@archive.org via ab)
-
-17. NUTCH-169 - Remove static NutchConf (Marko Bauhardt via ab)
-
-18. NUTCH-194 - Nutch-169 introduced two tiny bugs (Marko Bauhardt via ab)
-
-19. NUTCH-178 - in search.jsp must be session creation "false"
-    (YourSoft via siren)
-
-20. NUTCH-200 - OpenSearch Servlet ist broken
-    (Marko Bauhardt via siren)
-
-21. NUTCH-81 - Webapp only works when deployed in root
-    (AJ Banck, Michael Nebel via siren)
-
-22. NUTCH-139 - Standard metadata property names in the ParseData
-    metadata (Chris A. Mattmann, jerome)
-
-23. NUTCH-192 - Meta data support for CrawlDatum
-    (Stefan Groschupf via ab)
-    
-24. NUTCH-52 - Parser plugin for MS Excel files
-    (Rohit Kulkarni via jerome)
-
-25. NUTCH-53 - 	Parser plugin for Zip files
-    (Rohit Kulkarni via jerome)
-
-26. NUTCH-137 - footer is not displayed in search result page
-    (KuroSaka TeruHiko via siren)
-
-27. NUTCH-118 - FAQ link points to invalid URL
-    (Steve Betts via siren)
-
-28. NUTCH-184 - Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin)
-    translation (Ivan Sekulovic via siren)
-
-29. NUTCH-211 - FetchedSegments leave readers open (Stefan Groschupf
-    via cutting)
-
-30. NUTCH-140 - Add alias capability in parse-plugins.xml file that
-    allows mimeType->extensionId mapping (Chris A. Mattmann via jerome)
-
-31. NUTCH-214 - Added Links to web site to search mailling list
-    (Jake Vanderdray via jerome)
-
-32. NUTCH-204 - Multiple field values in HitDetails
-    (Stefan Groschupf via jerome)
-
-33. NUTCH-219 - file.content.limit & ftp.content.limit should be changed
-    to -1 to be consistent with http (jerome)
-    
-34. NUTCH-221 - Prepare nutch for upcoming lucene 2.0 (siren)
-
-35. NUTCH-91 - Empty encoding causes exception (Michael Nebel via
-    pkosiorowski)
-
-36. NUTCH-228 - Clustering plugin descriptor broken (Dawid Weiss via
-    jerome)
-
-37. NUTCH-229 - Improved handling of plugin folder configuration
-    (Stefan Groschupf via ab)
-
-38. NUTCH-206 - Search server throws InstantiationException (ab)
-    
-39. NUTCH-203 - ParseSegment throws InstantiationException (Marko Bauhardt
-    via ab)
-
-40. NUTCH-3 - Multi values of header discarded (Stefan Groschupf via ab)
-
-41. Update to lucene 1.9.1 (cutting)
-
-42. NUTCH-235 - Duplicate Inlink values (ab)
-
-43. NUTCH-234 - Clustering extension code cleanups and a real
-    JUnit test case for the current implementation (Dawid Weiss via ab)
-    
-44. NUTCH-210 - Context.xml file for Nutch web application
-    (Chris A. Mattmann via jerome)
-
-45. NUTCH-231 - Invalid CSS entries (AJ Banck via jerome)
-
-46. NUTCH-232 - Search.jsp has multiple search forms creating
-    invalid html / incorrect focus function (jerome)
-    
-47. NUTCH-196 - lib-xml and lib-log4j plugins (ab, jerome)
-
-48. NUTCH-244 - Inconsistent handling of property values
-    boundaries / unable to set db.max.outlinks.per.page to
-    infinite (jerome)
-    
-49. NUTCH-245 -	DTD for plugin.xml configuration files
-    (Chris A. Mattmann via jerome)
-
-50. NUTCH-250 - Generate to log truncation caused by
-    generate.max.per.host (Rod Taylor via cutting)
-    
-51. NUTCH-125 - OpenOffice Parser plugin (ab)
-
-52. Switch from using java.io.File to org.apache.hadoop.fs.Path.
-    (cutting)
-
-53. NUTCH-240 - Scoring API: extension point, scoring filters and
-    an OPIC plugin (ab)
-    
-54. NUTCH-134 - Summarizer doesn't select the best snippets (jerome)
-
-55. NUTCH-268 - Generator and lib-http use different definitions of
-    "unique host" (ab)
-    
-56. NUTCH-280 - Url query causes NullPointerException (Grant Glouser
-    via siren)
-
-57. NUTCH-285 - LinkDb Fails rename doesn't create parent directories
-    (Dennis Kubes via ab)
-
-58. NUTCH-201 - Add support for subcollections
-    (siren)
-
-59. NUTCH-298 - If a 404 for a robots.txt is returned a NPE is thrown
-    (Stefan Groschupf via jerome)
-
-60. NUTCH-275 - Fetcher not parsing XHTML-pages at all (jerome)
-
-61. NUTCH-301 - CommonGrams loads analysis.common.terms.file for each query
-    (Stefan Groschupf via jerome)
-
-62. NUTCH-110 - OpenSearchServlet outputs illegal xml characters
-    (stack@archive.org via siren)
-
-63. NUTCH-292 - OpenSearchServlet: OutOfMemoryError: Java heap space
-    (Stefan Neufeind via siren)
-
-64. NUTCH-307 - Wrong configured log4j.properties (jerome)
-
-65. NUTCH-303 - Logging improvements (jerome)
-
-66. NUTCH-308 - Maximum search time limit (ab)
-
-67. NUTCH-306 - DistributedSearch.Client liveAddresses concurrency
-    problem (Grant Glouser via siren)
-
-68. Update to hadoop-0.4 (Milind Bhandarkar, cutting)
-
-69. NUTCH-317 - Clarify what the queryLanguage argument of
-    Query.parse(...) means (jerome)
-
-70. Added alternative experimental web gui in contrib containing
-    extensions like subcollection, keymatch, user preferences,
-    caching, implemented mainly using tiles and jstl (siren)
-
-71. NUTCH-320 DmozParser does not output list of urls to stdout
-    but to a log file instead. Original functionality restored.
-
-72. NUTCH-271 - Add ability to limit crawling to the set of initially
-    injected hosts (db.ignore.external.links) (Philippe Eugene,
-    Stefan Neufeind via ab)
-
-73. NUTCH-293 - Support for Crawl-Delay (Stefan Groschupf via ab)
-
-74. NUTCH-327 - Fixed logging directory on cygwin (siren)
-
-Release 0.7 - 2005-08-17
-
- 1. Added support for "type:" in queries. Search results are limited/qualified
-    by mimetype or its primary type or sub type. For example,
-    (1) searching with "type:application/pdf" restricts results
-    to pages which were identified to be of mimetype "application/pdf".
-    (2) with "type:application", nutch will return pages of
-    primary type "application".
-    (3) with "type:pdf", only pages of sub type "pdf" will be listed.
-    (John Xing, 20050120)
-
- 2. Added support for "date:" in queries. Last-Modified is indexed.
-    Search results are restricted by lower and upper date (inclusive)
-    as date:yyyymmdd-yyyymmdd. For example, date:20040101-20041231
-    only returns pages with Last-Modified in year 2004.
-    (John Xing, 20050122)
-
- 3. Add URLFilter plugin interface and convert existing url filters into
-    plugins. (John Xing, 20050206)
-
- 4. Add UpdateSegmentsFromDb tool, which updates the scores and
-    anchors of existing segments with the current values in the web
-    db.  This is used by CrawlTool, so that pages are now only fetched
-    once per crawl.  (Doug Cutting, 20050221)
-
- 5. Moved code into org.apache.nutch sub-packages.  Changed license to
-    Apache 2.0.  Removed jar files whose licenses do not permit
-    redistribution by Apache.  Disabled compilation of plugins which
-    require these libraries.  (Doug Cutting 20050301)
-
- 6. Index host and title in separate fields.  Host was indexed
-    previously only as a part of the URL.  Title was indexed as an
-    anchor.  Now boosts for matching these fields may be adjusted
-    separately from boosts for matching anchors and url.  Also: move
-    site indexing to index-basic plugin to minimize the number of
-    times the URL needs to be parsed; and, stop using anchor analyzer
-    for anything but anchors.  (Piotr Kosiorowski via Doug Cutting
-    20050323)
-
- 7. Add servlet Cached.java that serves cached Content of any mime type.
-    Slightly modified are web.xml and cached.jsp.
-    (John Xing, 20050401)
-
- 8. Add skipCompressedByteArray() to WritableUtils.java.
-    (John Xing, 20050402)
-
- 9. Fixes to jsp and static web pages.  These now use relative links,
-    so that the Nutch webapp file can be used in places other than at
-    the root.  Also fixed links to the about and help pages.  Bug #32.
-    (Jerome Charron via cutting, 20050404)
-
-10. Added some features to DistributedSearch: new segments can be added
-    to searchservers without restarting the frontend, defective search
-    servers are not queried until tey come back online, watchdog keeps
-    an eye for your searchservers and writes simple statistics.
-    (Sami Siren, 20050407)
-    
-11. Fix for bug #4 - Unbalanced quote in query eats all resources.
-	(Piotr Kosiorowski, Sami Siren, 20050407)
-
-12. Close Issue #33 - MIME content type detector (using magic char sequences).
-    (Jerome Charron and Hari Kodungallur via John Xing, 20050416)
-
-13. Add a servlet that implements A9's OpenSearch RSS web service.
-    (cutting, 20050418)
-
-14. Remove references to link analysis from tutorial, and enable
-    scoring by link count when generating fetchlists and searching.
-    (cutting, 20040419)
-
-15. Make query boosts for host, title, anchor and phrase matches
-    configurable.  (Piotr Kosiorowski via cutting, 20050419)
-
-16. Add support for sorting search results and search-time deduping by
-    fields other than site.
-
-17. Automatically convert range queries into cached range filters.
-    This improves the performance and scalability of, e.g., date range
-    searching.
-
-18. Several methods have been renamed due to misspellings.  The old
-    methods have been deprecated and will be removed before the 1.0
-    release.
-
-
-Release 0.6
-
- 1. Added clustering-carrot2 plugin, together with introduction of clustering
-    api and modification to search jsp. (Dawid Weiss via John Xing, 20040809)
-
- 2. Make a number of changes to NDFS (Nutch Distributed File System)
-    to fix bugs, add admin tools, etc.
-
-    Also, modify all command line tools so you can indicate whether to
-    use NDFS or the local filesystem.  If you indicate nothing, then
-    it defaults to the local fs.
-
-    I've used this to do a 35m page crawl via NDFS, distributed over a
-    dozen machines.  (Mike Cafarella)
-
- 3. Add support for BASE tags in HTML.  Outlinks are now correctly
-    extracted when a BASE tag is present.  (cutting)
-
- 4. Fix two bugs in result pagination.  When the last hit on a page
-    was the last hit overall, the "next" button was sometimes shown
-    when the "show all" button should be shown instead.  Also, in
-    certain cases, the "show all" button would be shown when the
-    "next" button should have been shown.  (cutting)
-
- 5. Add config parameter "indexer.max.tokens" that determines the
-    maximum number of tokens indexed per field.  (Andy Hedges via cutting)
-
- 6. Add parser for mp3 files.  (Andy Hedges via cutting)
-
- 7. Add RegexUrlNormalizer.  This is useful for things like stripping
-    out session IDs from URLs.  To use it, add values for
-    urlnormalizer.class and urlnormalizer.regex.file to your
-    nutch-site.xml.  The RegexUrlNormalizer class extends the
-    BasicUrlNormalizer, and does basic normalization as well.
-    (Luke Baker via cutting)
-
- 8. Added Swedish translation (Stefan Verzel via Sami Siren, 20040910)
-
- 9. Added Polish translation (Andrzej Bialecki, 20040911)
- 
-10. Added 3 more language profiles to language identifier (ru,hu,pl).
-	Other changes to language identifier: Porfiles converted to utf8,
-	added some test cases, changed the similarity calculation.
-	(Sami Siren, 20040925)
-
-11. Added plugin parse-rtf (Andy Hedges via John Xing, 20040929)
-
-12. Added plugin index-more and more.jsp (John Xing, 20041003)
-
-13. Added "View as Plain Text" feature. A new op OP_PARSETEXT is introduced
-    in DistributedSearch.java. text.jsp is added. (John Xing, 20041006)
-
-14. Fixed a bug that fails cached.jsp, explain.jsp, anchors.jsp and text.jsp
-    (but not search.jsp) with NullPointerException in distributed search.
-    It seems that this bug appears after "hits per site" stuff is added.
-    The fix is done in Hit.java, making sure String site is never null.
-    Hope this fix not have bad effetct on "hits per site" code.
-    (John Xing, 20041006)
-
-15. Fixed a bug that fails fullyDelete() in FileUtil.java for
-    LocalFileSystem.java. This bug also exposes possible incompleteness
-    of NDFSFile.java, where a few methods are not supported, including
-    delete(). Nothing changed in NDFSFile.java though. Leave it for future
-    improvement (John Xing, 20041022).
-
-16. Introduced option -noParsing to Fetcher.java and added ParseSegment.java.
-    A new status code CANT_PARSE is added to FetcherOutput.java.
-    Without option -noParsing , no change in fetcher behavior. With
-    option -noParsing, fetcher does crawls only, no parsing is carried out.
-    Then, ParseSegment.java should be used to parse in separate pass.
-    (John Xing, 20041025)
-
-17. Added ontology plugin. Currently it is used for query refinement, as
-    examplified in refine-query-init.jsp and refine-query.jsp. By default,
-    query refinement is disabled in search.jsp. Please check
-    ./src/plugin/ontology/README.txt for further description.
-    Ontology plugin certainly can be used for many other things.
-    (Michael J. Pan via John Xing, 20041129)
- 
-18. Changed fetcher.server.delay to be a float, so that sub-second
-    delays can be specified.  (cutting)
-
-19. Added plugin.includes config parameter that determines which
-    plugins are included.  By default now only http, html and basic
-    indexing and search plugins are enabled, rather than all plugins.
-    This should make default performance more predictable and reliable
-    going forward. (cutting)
-
-20. Cleaned up some filesystem code, including:
-
-    - Replaced BufferedRandomAccessFile with two simpler utilties,
-      NFSDataInputStream and NFSDataOutputStream.
-
-    - Fixed the bug where SequenceFiles were no longer flushed when
-      created, so that, when fetches crashed, segments were
-      unreadable.  Now segments are always readable after crashes.
-      Only the contents of the last buffer is lost.
-
-    - Simplified the FSOutputStream API to not include seek().  We
-      should never need that functionality.
-
-    - Simplified LocalFileSystem's implementations of FSInputStream
-      and FSOutputStream and optimized FSInputStream.seek().
-
-    (cutting)
-
-21. Fixed BasicUrlNormalizer to better handle relative urls.  The file
-    part of a URL is normalized in the following manner:
-
-      1. "/aa/../" will be replaced by "/" This is done step by step until
-	 the url doesnÂ´t change anymore. So we ensure, that
-	 "/aa/bb/../../" will be replaced by "/", too
-
-      2. leading "/../" will be replaced by "/"
-
-    (Sven Wende via cutting)
-
-22. Fix Page constructors so that next fetch date is less likely to be
-    misconstrued as a float.  This patches a problem in WebDBInjector,
-    where new pages were added to the db with nextScore set to the
-    intended nextFetch date.  This, in turn, confused link analysis.
-
-23. In ndfs code, replace addLocalFile(), putToLocalFile() with
-    copyFromLocalFile(), moveFromLocalFile(), copyToLocalFile() and
-    moveToLocalFile(). (John Xing, 20041217)
-
-24. Added new config parameter fetcher.threads.per.host.  This is used
-    by the Http protocol.  When this is one behavior is as before.
-    When this is greater than one then multiple threads are permitted
-    to access a host at once.  Note that fetcher.server.delay is no
-    longer consistently observed when this is greater than one.
-    (Luke Baker via Doug Cutting)
-
-Release 0.5
-
- 1. Changed plugin directory to be a list of directories.
-
- 2. Permit Plugin to be the default plugin implementation.
-
- 3. Added pluggable interface for network protocols in new package
-    net.nutch.protocol.  Moved http code from core into a plugin.
-
- 4. Added pluggable interface for content parsing in new package
-    net.nutch.parse.  Moved html parsing code from core into a
-    plugin.
-
- 5. Fixed a bug in NutchAnalysis where 16-bit characters were not
-    processed correctly.
-
- 6. Fixed bug #971731: random summaries on result page.
-    (Daniel Naber via cutting)
-
- 7. Made Nutch logo transparent. (Daniel Naber via cutting)
-
- 8. Added file protocol plugin.  (John Xing via cutting)
-
- 9. Added ftp protocol plugin.  (John Xing via cutting)
-
-10. Added pdf and msword parser plugins.  (John Xing via cutting)
-
-11. Added pluggable indexing interface.  By default, url, content,
-    anchors and title are indexed, as before, but now one can easily
-    alter this to, e.g., index metadata.  A demonstration is provided
-    which extracts and indexes Creative Commons license urls. (cutting)
-
-12. Add language identification plugin. 
-
-    The process of identification is as follows:
-
-    1. html (html only, HTML 4.0 "lang" attribute)
-    2. meta tags (html only, http-equiv, dc.language)
-    3. http header (Content-Language)
-    4. if all above fail "statistical analysis"
-
-    1 & 2 are run during the fetching phase and 3 & 4 are run on
-    indexing phase.
-
-    Currently supported languages (in "statistical analysis") are
-    da,de,el,en,es,fi,fr,it,nl,sv and pt. The corpus used was grabbed
-    from http://www.isi.edu/~koehn/europarl/ and the profiles were
-    build with tool supplied in patch.
-
-    After indexing the language can be found from field named "lang"
-
-    It's not 100% accurate but it's a start.
-    (Sami Siren)
-
-13. Added SegmentMergeTool and "mergesegs" command, to remove
-    duplicated or otherwise not used content from several segments and
-    joining them together into a single new segment.  The tool also
-    optionally performs several other steps required for proper
-    operation of Nutch - such as indexing segments, deleting
-    duplicates, merging indices, and indexing the new single segment.
-    (Andrzej Bialecki)
-
-14. Add the ability to retrieve ParseData of a search hit. ParseData
-    contains many valuable properties of a search hit.
-
-    This is required (among others) to properly display the cached
-    content because it's not possible to determine the character
-    encoding from the output of the getContent() method (which returns
-    byte[]). The symptoms are that for HTML pages using non-latin1 or
-    non-UTF8 encodings the cached preview will almost certainly look
-    broken. Using the attached patch it is possible to determine the
-    character encoding from the ParseData (for HTTP: Content-Type
-    metadata), and encode the content accordingly. (Andrzej Bialecki)
-
-15. Add a pluggable query interface.  By default, the content, anchor
-    and url fields are searched as before.  A sample plugin indexes
-    the host name and adds a "site:" keyword to query parsing.
-
-16. Added support for "lang:" in queries.  For example, searching with
-    "lang:en" restricts results to pages which were identified to
-    be in English.
-
-17. Automatically optimize field queries to use cached Lucene filters.
-    This makes, for example, searches restricted by languages or sites
-    that are very common much faster.
-
-18. Improved charset handling in jsp pages.  (jshin by cutting)
-
-19. Permit topic filtering when injecting DMOZ pages.  (jshin by cutting)
-
-20. When parsing crawled pages, interpret charset specifications in
-    html meta tags.  (jshin by cutting)
-
-21. Added support for "cc:licensed" in queries, which searches for documents
-    released under Creative Commons licenses.  Attributes of the
-    license may also be queried, with, e.g., "cc:by" for
-    attribution-required licenses, "cc:nc" for non-commercial
-    licenses, etc.
-
-22. Relative paths named in plugin.folders are now searched for on the
-    classpath.  This makes, e.g., deployment in a war file much simpler.
-
-23. Modifications to Fetcher.java.
-
-    1. Make sure it works properly with regard to creation and initialization
-    of plugin instances. The problem was that multiple threads race to
-    startUp() or shutDown() plugin instances. It was solved by synchronizing
-    certain codes in PluginRepository.java and Extension.java.
-    (Stefan Groschupf via John Xing)
-
-    2. Added code to explictly shutDown() plugins. Otherwise FetcherThreads
-    may never return (quit) if there are still data or other structures
-    (e.g., persistent socket connections) associated with plugins. (John Xing)
-    
-    3. Fixed one type of Fetcher "hang" problems by monitoring named
-    FetcherThreads. If all FetcherThreads are gone (finished),
-    Fetcher.java is considered done. The problem was: there could be
-    runaway threads started by external libs via FetcherThreads.
-    Those threads never return, thus keep Fetcher from exiting normally.
-    (John Xing)
-
-24. Eliminate excessive hits from sites.  This is done efficiently by
-    adding the site name to Hit instances, and, when needed,
-    re-querying with too-frequent sites prohibited in the query.
-
-
-Release 0.4
-
- 1. Http class refactored.  (Kevin Smith via Tom Pierce)
-
- 2. Add Finnish translation. (Sampo Syreeni via Doug Cutting)
-
- 3. Added Japanese translation. (Yukio Andoh via Doug Cutting)
-
- 4. Updated Dutch translation. (Ype Kingma via Doug Cutting)
-
- 5. Initial version of Distributed DB code.  (Mike Cafarella)
-
- 6. Make things more tolerant of crashed fetcher output files.
-    (Doug Cutting)
-
- 7. New skin for website. (Frank Henze via Doug Cutting)
-
- 8. Added Spanish translation. (Diego Basch via Doug Cutting)
-
- 9. Add FTP support to fetcher.  (John Xing via Doug Cutting)
-
-10. Added Thai translation. (Pichai Ongvasith via Doug Cutting)
-
-11. Added Robots.txt & throttling support to Fetcher.java.  (Mike
-    Cafarella)
-
-12. Added nightly build. (Doug Cutting)
-
-13. Default all link scores to 1.0. (Doug Cutting)
-
-14. Permit one to keep internal links. (Doug Cutting)
-
-15. Fixed dedup to select shortest URL. (Doug Cutting)
-
-16. Changed index merger so that merged index is written to named
-    directory, rather than to a generated name in that directory.
-    (Doug Cutting)
-
-17. Disable coordination weighting of query clauses and other minor
-    scoring improvements. (Doug Cutting)
-
-18. Added a new command, crawl, that constructs a database, injects a
-    url file and performs a few rounds of generate/fetch/updatedb.
-    This simplifies use for intranet sites.  Changed some defaults to
-    be more intranet friendly.  (Doug Cutting)
-
-19. Fixed a bug where Fetcher.java didn't construct correct relative
-    links when a page was redirected.  (Doug Cutting)
-
-20. Fixed a query parser problem with lookahead over plusses and minuses.
-    (Doug Cutting)
-
-21. Add support for HTTP proxy servers.  (Sami Siren via Doug Cutting)
-
-22. Permit searching while fetching and/or indexing.
-    (Sami Siren via Doug Cutting)
-
-23. Fix a bug when throttling is disabled.  (Sami Siren via Doug Cutting)
-
-24. Updated Bahasa Malaysia translation.  (Michael Lim via Doug Cutting)
-
-25. Added Catalan translation.  (Xavier Guardiola via Doug Cutting)
-
-26. Added brazilian portuguese translation.
-    (A. Moreir via Doug Cutting)
-
-27. Added a french translation.  (Julien Nioche via Doug Cutting)
-
-28. Updated to Lucene 1.4RC3.  (Doug Cutting)
-
-29. Add capability to boost by link count & use it in crawl tool.
-    (Doug Cutting)
-
-30. Added plugin system.  (Stefan Groschupf via Doug Cutting)
-
-31. Add this change log file, for recording significant changes to
-    Nutch.  Populate it with changes from the last few months.
Index: /nutchez-0.1/debian/control
===================================================================
--- /nutchez-0.1/debian/control	(revision 72)
+++ /nutchez-0.1/debian/control	(revision 73)
@@ -1,3 +1,3 @@
-Source: nutch
+Source: nutchez
 Section:devel
 Priority: extra
Index: /nutchez-0.1/debian/files
===================================================================
--- /nutchez-0.1/debian/files	(revision 72)
+++ /nutchez-0.1/debian/files	(revision 73)
@@ -1,1 +1,1 @@
-nutch_1.0-1_i386.deb devel extra
+nutchez_0.1-1_i386.deb devel extra
Index: /nutchez-0.1/debian/nutchez.install
===================================================================
--- /nutchez-0.1/debian/nutchez.install	(revision 72)
+++ /nutchez-0.1/debian/nutchez.install	(revision 73)
@@ -6,5 +6,4 @@
 tomcat		opt/nutch
 plugins		opt/nutch
-urls		opt/nutch
 *.jar		opt/nutch
 *.job		opt/nutch