= 2011-10-28 =

== Sematic Web & Crawlzilla ==

 * 延續 [wiki:jazz/10-11-14 2010-11-14]
  * 由於注意到 Google Reader 訂閱時會出現歷史的紀錄，因此最近在思考能否從歷史 RSS 當作爬取資料的來源（Ex. 給抓抓龍用），所以查了一下有沒有類似的作法。Google 的文章解釋了可行的作法：
   * [http://googlesystem.blogspot.com/2007/06/reconstruct-feeds-history-using-google.html Reconstruct a Feed's History Using Google Reader]
{{{
http://www.google.com/reader/atom/feed/FEED_URL?r=n&n=NUMBER_OF_ITEMS
}}}
 * 延續 [wiki:jazz/10-11-15 2010-11-15]
  * [http://readitlaterlist.com/api/docs/ ReadItLater 的 API] - 拿來爬平常標記起來的網址
 * [http://www.evernote.com/about/developer/api/ Evernote 的 API] - 如果有用 Evernote 寫筆記的人，應該也可以拿來統計筆記的內容

== Crawlzilla ==

 * 我從 [http://readitlaterlist.com/ readitlater 的網站]上，使用 export HTML 功能，把未讀的書籤匯出成 HTML 檔，並上傳到 http://cloud.nchc.org.tw/~jazz/ril_export.html
 * 使用 demo.crawlzilla.info 設定爬兩層