Changes between Initial Version and Version 1 of jazz/12-01-04


Ignore:
Timestamp:
Jan 4, 2012, 3:13:10 PM (13 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/12-01-04

    v1 v1  
     1= 2012-01-04 =
     2
     3== AJAX Crawler / Crawling AJAX ==
     4
     5 * [wiki:jazz/10-10-17 2010-10-17]
     6 * <參考> [http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178 Crawling AJAX]
     7{{{
     8Shreeraj Shah's paper, Crawling Ajax-driven Web 2.0 Applications, does a nice job of
     9describing the "event-driven" approach to web crawling.
     10
     11It has following three key components
     12
     131. Javascript analysis and interpretation with linking to Ajax
     142. DOM event handling and dispatching
     153. Dynamic DOM content extraction
     16
     17The easiest way to implement an AJAX-enabled, event-driven crawler is to use Watir and
     18Crowbar, that will allow you to control Firefox or IE from code, allowing you to extract
     19page data after it has processed any Javascript.
     20}}}
     21 * 可以用的工具包括基於 Ruby 可以控制 IE 的 [http://watir.com/ Watir],跟可以用 GET/PUT 方式控制 Firefox 的 [http://simile.mit.edu/wiki/Crowbar Crowbar],兩個的授權都是 BSD。
     22 * [http://code.google.com/intl/zh-TW/web/ajaxcrawling/ Making AJAX Applications Crawlable] - Google 提出一個應變標準(Specification)來讓 AJAX 應用程式或網頁可以被搜尋得到。
     23 * [http://crawljax.com/ crawljax] - 用 Java 寫的 AJAX Crawler ,[http://crawljax.com/documentation/publications/ 有很多論文發表]
     24 * http://watij.com/ - Watij – Web Application Testing in Java
     25 * http://htmlunit.sourceforge.net/ - HtmlUnit is a "GUI-Less browser for Java programs"