| 1 | = 2012-01-04 = |
| 2 | |
| 3 | == AJAX Crawler / Crawling AJAX == |
| 4 | |
| 5 | * [wiki:jazz/10-10-17 2010-10-17] |
| 6 | * <參考> [http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178 Crawling AJAX] |
| 7 | {{{ |
| 8 | Shreeraj Shah's paper, Crawling Ajax-driven Web 2.0 Applications, does a nice job of |
| 9 | describing the "event-driven" approach to web crawling. |
| 10 | |
| 11 | It has following three key components |
| 12 | |
| 13 | 1. Javascript analysis and interpretation with linking to Ajax |
| 14 | 2. DOM event handling and dispatching |
| 15 | 3. Dynamic DOM content extraction |
| 16 | |
| 17 | The easiest way to implement an AJAX-enabled, event-driven crawler is to use Watir and |
| 18 | Crowbar, that will allow you to control Firefox or IE from code, allowing you to extract |
| 19 | page data after it has processed any Javascript. |
| 20 | }}} |
| 21 | * 可以用的工具包括基於 Ruby 可以控制 IE 的 [http://watir.com/ Watir],跟可以用 GET/PUT 方式控制 Firefox 的 [http://simile.mit.edu/wiki/Crowbar Crowbar],兩個的授權都是 BSD。 |
| 22 | * [http://code.google.com/intl/zh-TW/web/ajaxcrawling/ Making AJAX Applications Crawlable] - Google 提出一個應變標準(Specification)來讓 AJAX 應用程式或網頁可以被搜尋得到。 |
| 23 | * [http://crawljax.com/ crawljax] - 用 Java 寫的 AJAX Crawler ,[http://crawljax.com/documentation/publications/ 有很多論文發表] |
| 24 | * http://watij.com/ - Watij – Web Application Testing in Java |
| 25 | * http://htmlunit.sourceforge.net/ - HtmlUnit is a "GUI-Less browser for Java programs" |