= 2010-10-20 = == Hadoop == * http://github.com/tomwhite/hadoop-book/ - Tom White "Hadoop: The Definitive Guide" 的範例程式 * [http://www.sys-con.com/node/1573226/print NoHadoop: Big Data Requires Not Only Hadoop] * 這篇文章講了很多 Hadoop 替代方案,針對許多不同的應用情境: * [http://cloudscale.com CloudScale] - 即時資料倉儲需求(realtime data-wharehouse requirements) * !CloudScale 稱之為 Redoop - 出自 [http://www.sys-con.com/node/1572508/print Hadoop and Realtime Cloud Computing](2010-10-15) * MPI , BSP - 若有高速運算需求(supercomputing requirements) * [http://portal.acm.org/citation.cfm?id=1582723 Pregel (架構,沒有code,Google)] - 若要作大量圖形演算法運算(graph computing requirements) * [http://research.google.com/pubs/archive/36726.pdf Percolator (架構,沒有code,Google)]- 若必須針對大量資料作持續的更新(incrementally update the analytics on a massive data set continuously) * [https://www.nytimes.com/external/gigaom/2010/10/23/23gigaom-beyond-hadoop-next-generation-big-data-architectu-81730.html?pagewanted=print Beyond Hadoop: Next-Generation Big Data Architectures] * 看樣子今年 Hadoop World NYC 發生了很多事情。 * 第一: Oracle 與 Hadoop 整合 - 一個新的專案叫做 [http://www.questsoftware.in/Ora-Oop/ OraOop] * [http://www.ctoedge.com/content/helping-oracle-get-along-hadoop Helping Oracle Get Along with Hadoop] * [[Image(http://img.itbe.com/ctoedge/quest5.gif)]] * 第二: Membase 與 Hadoop 整合 - 可以讓 Hadoop 更即時(Real-Time) ?? 跟我最初想的方法很像(Ex. 結合 ActiveMQ),想想用 Memcache 也是不錯的選擇啦~搞不好可以用 memcache 來作參數傳遞跟公用變數(Global Variable)。 * [http://www.smartbrief.com/news/aaaa/industryMW-detail.jsp?id=838FF0EC-C664-4F16-9F77-F61CA011BB57 Membase-Cloudera Integration Joins Leading Hadoop Distribution and Real-Time NoSQL Database] * [http://blog.membase.com/membase-cloudera-integration Membase and Cloudera Integration] * [http://www.h-online.com/open/news/item/Bi-directional-connection-for-Membase-and-Cloudera-Hadoop-1106525.html Bi-directional connection for Membase and Cloudera Hadoop] {{{ The first consists of a Membase NodeCode module that streams data from Membase to CDH in real-time, while the second consists of a Sqoop-derived batch loader utility that allows for the loading of data to and from Membase and CDH. }}} * 第三: Twitter 與 Hadoop - 推特用 Hadoop 來儲存 Tweets * [http://siliconangle.com/blog/2010/10/12/hadoop-is-a-big-part-of-twitter%E2%80%99s-ecosystem/ Hadoop is a "Big part of Twitter’s ecosystem"] {{{ For something like Hadoop, the presence of a robust, public tool really helps to build a prosperous ecosystem. }}} * [http://www.computerworld.com/s/article/print/9191098/Twitter_solves_its_data_formatting_challenge?taxonomyName=Storage&taxonomyId=19 Twitter solves its data formatting challenge] {{{ While primary copies of user Tweets are kept in MySQL and Cassandra databases, the company is also building a second data repository, running on Hadoop, that can be used for analytics and applications. }}} == MapReduce == * [http://cloudcomputing.sys-con.com/node/1528655/print Google Dumps MapReduce] * [http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/print.html Google search index splits with MapReduce] * [http://www.theregister.co.uk/2010/09/24/google_percolator/print.html Google Percolator – global search jolt sans MapReduce comedown]