= 2012-01-04 = == AJAX Crawler / Crawling AJAX == * [wiki:jazz/10-10-17 2010-10-17] * <參考> [http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178 Crawling AJAX] {{{ Shreeraj Shah's paper, Crawling Ajax-driven Web 2.0 Applications, does a nice job of describing the "event-driven" approach to web crawling. It has following three key components 1. Javascript analysis and interpretation with linking to Ajax 2. DOM event handling and dispatching 3. Dynamic DOM content extraction The easiest way to implement an AJAX-enabled, event-driven crawler is to use Watir and Crowbar, that will allow you to control Firefox or IE from code, allowing you to extract page data after it has processed any Javascript. }}} * 可以用的工具包括基於 Ruby 可以控制 IE 的 [http://watir.com/ Watir],跟可以用 GET/PUT 方式控制 Firefox 的 [http://simile.mit.edu/wiki/Crowbar Crowbar],兩個的授權都是 BSD。 * [http://code.google.com/intl/zh-TW/web/ajaxcrawling/ Making AJAX Applications Crawlable] - Google 提出一個應變標準(Specification)來讓 AJAX 應用程式或網頁可以被搜尋得到。 * [http://crawljax.com/ crawljax] - 用 Java 寫的 AJAX Crawler ,[http://crawljax.com/documentation/publications/ 有很多論文發表] * http://watij.com/ - Watij – Web Application Testing in Java * http://htmlunit.sourceforge.net/ - !HtmlUnit is a "GUI-Less browser for Java programs"