= 2010-10-20 =

== Hadoop ==

 * http://github.com/tomwhite/hadoop-book/ - Tom White "Hadoop: The Definitive Guide" 的範例程式

 * [http://www.sys-con.com/node/1573226/print NoHadoop: Big Data Requires Not Only Hadoop]
   * 這篇文章講了很多 Hadoop 替代方案，針對許多不同的應用情境：
     * [http://cloudscale.com CloudScale] - 即時資料倉儲需求（realtime data-wharehouse requirements）
       * !CloudScale 稱之為 Redoop - 出自 [http://www.sys-con.com/node/1572508/print Hadoop and Realtime Cloud Computing](2010-10-15)
     * MPI , BSP - 若有高速運算需求（supercomputing requirements）
     * [http://portal.acm.org/citation.cfm?id=1582723 Pregel (架構,沒有code,Google)] - 若要作大量圖形演算法運算（graph computing requirements）
     * [http://www.google.com/research/pubs/archive/36726.pdf Percolator (架構,沒有code,Google)]- 若必須針對大量資料作持續的更新（incrementally update the analytics on a massive data set continuously）
   * [https://www.nytimes.com/external/gigaom/2010/10/23/23gigaom-beyond-hadoop-next-generation-big-data-architectu-81730.html?pagewanted=print Beyond Hadoop: Next-Generation Big Data Architectures]

 * 看樣子今年 Hadoop World NYC 發生了很多事情。
 * 第一： Oracle 與 Hadoop 整合 - 一個新的專案叫做 [http://www.questsoftware.in/Ora-Oop/ OraOop] 
  * [http://www.ctoedge.com/content/helping-oracle-get-along-hadoop Helping Oracle Get Along with Hadoop]
  * [[Image(http://img.itbe.com/ctoedge/quest5.gif)]]
 * 第二： Membase 與 Hadoop 整合 - 可以讓 Hadoop 更即時（Real-Time） ?? 跟我最初想的方法很像(Ex. 結合 ActiveMQ)，想想用 Memcache 也是不錯的選擇啦~搞不好可以用 memcache 來作參數傳遞跟公用變數（Global Variable）。
   * [http://www.smartbrief.com/news/aaaa/industryMW-detail.jsp?id=838FF0EC-C664-4F16-9F77-F61CA011BB57 Membase-Cloudera Integration Joins Leading Hadoop Distribution and Real-Time NoSQL Database]
   * [http://blog.membase.com/membase-cloudera-integration Membase and Cloudera Integration]
   * [http://www.h-online.com/open/news/item/Bi-directional-connection-for-Membase-and-Cloudera-Hadoop-1106525.html Bi-directional connection for Membase and Cloudera Hadoop]
{{{
The first consists of a Membase NodeCode module that streams data from Membase to CDH in real-time, 
while the second consists of a Sqoop-derived batch loader utility that allows for the loading of data 
to and from Membase and CDH.
}}}
 * 第三： Twitter 與 Hadoop - 推特用 Hadoop 來儲存 Tweets
   * [http://siliconangle.com/blog/2010/10/12/hadoop-is-a-big-part-of-twitter%E2%80%99s-ecosystem/ Hadoop is a "Big part of Twitter’s ecosystem"]
{{{
For something like Hadoop, the presence of a robust, public tool really helps 
to build a prosperous ecosystem.
}}}
   * [http://www.computerworld.com/s/article/print/9191098/Twitter_solves_its_data_formatting_challenge?taxonomyName=Storage&taxonomyId=19 Twitter solves its data formatting challenge]
{{{
While primary copies of user Tweets are kept in MySQL and Cassandra databases, 
the company is also building a second data repository, running on Hadoop, that 
can be used for analytics and applications. 
}}}

== MapReduce ==

 * [http://cloudcomputing.sys-con.com/node/1528655/print Google Dumps MapReduce]
 * [http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/print.html Google search index splits with MapReduce]
 * [http://www.theregister.co.uk/2010/09/24/google_percolator/print.html Google Percolator – global search jolt sans MapReduce comedown]