2010-10-20
Hadoop
- http://github.com/tomwhite/hadoop-book/ - Tom White "Hadoop: The Definitive Guide" 的範例程式
- NoHadoop: Big Data Requires Not Only Hadoop
- 這篇文章講了很多 Hadoop 替代方案,針對許多不同的應用情境:
- CloudScale - 即時資料倉儲需求(realtime data-wharehouse requirements)
- CloudScale 稱之為 Redoop - 出自 Hadoop and Realtime Cloud Computing(2010-10-15)
- MPI , BSP - 若有高速運算需求(supercomputing requirements)
- Pregel (架構,沒有code,Google) - 若要作大量圖形演算法運算(graph computing requirements)
- Percolator (架構,沒有code,Google)- 若必須針對大量資料作持續的更新(incrementally update the analytics on a massive data set continuously)
- CloudScale - 即時資料倉儲需求(realtime data-wharehouse requirements)
- Beyond Hadoop: Next-Generation Big Data Architectures
- 這篇文章講了很多 Hadoop 替代方案,針對許多不同的應用情境:
- 看樣子今年 Hadoop World NYC 發生了很多事情。
- 第一: Oracle 與 Hadoop 整合 - 一個新的專案叫做 OraOop
- 第二: Membase 與 Hadoop 整合 - 可以讓 Hadoop 更即時(Real-Time) ?? 跟我最初想的方法很像(Ex. 結合 ActiveMQ),想想用 Memcache 也是不錯的選擇啦~搞不好可以用 memcache 來作參數傳遞跟公用變數(Global Variable)。
- Membase-Cloudera Integration Joins Leading Hadoop Distribution and Real-Time NoSQL Database
- Membase and Cloudera Integration
- Bi-directional connection for Membase and Cloudera Hadoop
The first consists of a Membase NodeCode module that streams data from Membase to CDH in real-time, while the second consists of a Sqoop-derived batch loader utility that allows for the loading of data to and from Membase and CDH.
- 第三: Twitter 與 Hadoop - 推特用 Hadoop 來儲存 Tweets
- Hadoop is a "Big part of Twitter’s ecosystem"
For something like Hadoop, the presence of a robust, public tool really helps to build a prosperous ecosystem.
- Twitter solves its data formatting challenge
While primary copies of user Tweets are kept in MySQL and Cassandra databases, the company is also building a second data repository, running on Hadoop, that can be used for analytics and applications.
- Hadoop is a "Big part of Twitter’s ecosystem"
MapReduce
Last modified 13 years ago
Last modified on Sep 25, 2011, 11:41:05 AM