Changes between Initial Version and Version 1 of jazz/10-10-20


Ignore:
Timestamp:
Oct 20, 2010, 7:54:19 PM (14 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/10-10-20

    v1 v1  
     1= 2010-10-20 =
     2
     3== Hadoop ==
     4
     5 * http://github.com/tomwhite/hadoop-book/ - Tom White "Hadoop: The Definitive Guide" 的範例程式
     6
     7 * [http://www.sys-con.com/node/1573226/print NoHadoop: Big Data Requires Not Only Hadoop]
     8   * 這篇文章講了很多 Hadoop 替代方案,針對許多不同的應用情境:
     9     * [http://cloudscale.com CloudScale] - 即時資料倉儲需求(realtime data-wharehouse requirements)
     10       * !CloudScale 稱之為 Redoop - 出自 [http://www.sys-con.com/node/1572508/print Hadoop and Realtime Cloud Computing](2010-10-15)
     11     * MPI , BSP - 若有高速運算需求(supercomputing requirements)
     12     * Pregel (架構,沒有code,Google) - 若要作大量圖形演算法運算(graph computing requirements)
     13     * Percolator (架構,沒有code,Google)- 若必須針對大量資料作持續的更新(incrementally update the analytics on a massive data set continuously)
     14
     15 * 看樣子今年 Hadoop World NYC 發生了很多事情。
     16 * 第一: Oracle 與 Hadoop 整合 - 一個新的專案叫做 [http://www.questsoftware.in/Ora-Oop/ OraOop]
     17  * [http://www.ctoedge.com/content/helping-oracle-get-along-hadoop Helping Oracle Get Along with Hadoop]
     18  * [[Image(http://img.itbe.com/ctoedge/quest5.gif)]]
     19 * 第二: Membase 與 Hadoop 整合 - 可以讓 Hadoop 更即時(Real-Time) ?? 跟我最初想的方法很像(Ex. 結合 ActiveMQ),想想用 Memcache 也是不錯的選擇啦~搞不好可以用 memcache 來作參數傳遞跟公用變數(Global Variable)。
     20   * [http://www.smartbrief.com/news/aaaa/industryMW-detail.jsp?id=838FF0EC-C664-4F16-9F77-F61CA011BB57 Membase-Cloudera Integration Joins Leading Hadoop Distribution and Real-Time NoSQL Database]
     21   * [http://blog.membase.com/membase-cloudera-integration Membase and Cloudera Integration]
     22   * [http://www.h-online.com/open/news/item/Bi-directional-connection-for-Membase-and-Cloudera-Hadoop-1106525.html Bi-directional connection for Membase and Cloudera Hadoop]
     23{{{
     24The first consists of a Membase NodeCode module that streams data from Membase to CDH in real-time,
     25while the second consists of a Sqoop-derived batch loader utility that allows for the loading of data
     26to and from Membase and CDH.
     27}}}
     28 * 第三: Twitter 與 Hadoop
     29   * [http://siliconangle.com/blog/2010/10/12/hadoop-is-a-big-part-of-twitter%E2%80%99s-ecosystem/ Hadoop is a "Big part of Twitter’s ecosystem"]
     30{{{
     31For something like Hadoop, the presence of a robust, public tool really helps
     32to build a prosperous ecosystem.
     33}}}
     34   * [http://www.computerworld.com/s/article/print/9191098/Twitter_solves_its_data_formatting_challenge?taxonomyName=Storage&taxonomyId=19 Twitter solves its data formatting challenge]
     35{{{
     36While primary copies of user Tweets are kept in MySQL and Cassandra databases,
     37the company is also building a second data repository, running on Hadoop, that
     38can be used for analytics and applications.
     39}}}