Context Navigation

2012-01-04

AJAX Crawler / Crawling AJAX

2010-10-17

Shreeraj Shah's paper, Crawling Ajax-driven Web 2.0 Applications, does a nice job of 
describing the "event-driven" approach to web crawling.

It has following three key components

1. Javascript analysis and interpretation with linking to Ajax
2. DOM event handling and dispatching
3. Dynamic DOM content extraction

The easiest way to implement an AJAX-enabled, event-driven crawler is to use Watir and 
Crowbar, that will allow you to control Firefox or IE from code, allowing you to extract 
page data after it has processed any Javascript.

可以用的工具包括基於 Ruby 可以控制 IE 的 Watir，跟可以用 GET/PUT 方式控制 Firefox 的 Crowbar，兩個的授權都是 BSD。
Making AJAX Applications Crawlable - Google 提出一個應變標準（Specification）來讓 AJAX 應用程式或網頁可以被搜尋得到。
crawljax - 用 Java 寫的 AJAX Crawler ，有很多論文發表
http://watij.com/ - Watij – Web Application Testing in Java
http://htmlunit.sourceforge.net/ - HtmlUnit is a "GUI-Less browser for Java programs"

Last modified 14 years ago Last modified on Jan 24, 2012, 9:20:42 AM

Download in other formats:

Plain Text