| 12 | == Social Network : Facebook == |
| 13 | |
| 14 | * [http://www.linux-mag.com/id/8705/ FBCMD: Command Line for Facebook] - 可以用命令列取得 Facebook 資訊的工具,看樣子這樣要作批次處理就變簡單了。 |
| 15 | * [http://gigaom.com/cloud/how-facebook-brings-a-new-data-center-online/ How Facebook Brings a New Data Center Online] - Facebook 最近擴充動作頻繁,這篇文章提到幾個自由軟體: |
| 16 | * [https://github.com/facebook/flashcache/ FlashCache] - 看起來是加速 MySQL 資料庫的工具 |
| 17 | {{{ |
| 18 | Flashcache with MySQL allows us to achieve twice the throughput on each of |
| 19 | our new MySQL machine...we need to run two MySQL instances on each machine... |
| 20 | }}} |
| 21 | * 另外有個佈署工具叫做 Kobold,不過還找不到 code。 |
| 22 | |
| 23 | == Hadoop == |
| 24 | |
| 25 | * 多數 Hadoop 的 Patch 由 Yahoo 提供。 |
| 26 | * [[Image(http://ydn.zenfs.com/blogs/22/HadoopPatches.png,width=800)]] |
| 27 | * 下一代的 Hadoop MapReduce 架構 - [http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/ The Next Generation of Apache Hadoop MapReduce] |
| 28 | * [[Image(http://ydn.zenfs.com/blogs/22/MapReduce_NextGen.jpg)]] |
| 29 | |
| 30 | * [http://blogs.computerworlduk.com/app-dev-and-programme-management/2011/05/big-data-technology-getting-hotter-but-still-too-hard-for-most-developers/ "Big Data" technology: getting hotter, but still too hard] |
| 31 | * 最近在 !LinkedIn 的社群也看到一樣的問題:Hadoop 不好學、對一般企業來說太複雜了!! |
| 32 | |
| 33 | * [http://nosql.mypopescu.com/post/5397319531/hadoop-ecosystem-emc-netapp-mellanox-snaplogic Hadoop Ecosystem: EMC, NetApp, Mellanox, SnapLogic, DataStax] |
| 34 | * DataStax Brisk: Hadoop and Hive on Cassandra (詳 [wiki:jazz/11-04-01 2011-04-01]) |
| 35 | * [http://www.snaplogic.com/solutions/bigdata/ SnapLogic SnapReduce] - 這間公司目標想把 Hadoop 變成更簡單,設計了圖形化介面來作 Map / Reduce 工作的規劃。(詳 [wiki:jazz/11-05-12 2011-05-12]) |
| 36 | * [http://www.mellanox.com/content/pages.php?pg=web_2_0 Mellanox Hadoop-Direct] - - mellanox 用硬體去加速 Hadoop 與 Memcached (詳 [wiki:jazz/11-05-12 2011-05-12]) |
| 37 | * [http://blogs.netapp.com/exposed/2011/05/what-are-hadooplers.html NetApp Hadoop Shared DAS] ([wiki:jazz/11-05-12 2011-05-12] 有提到 !NetApp 特製的硬體 [http://www.netapp.com/us/products/storage-systems/e5400/e5400.html NetApp e5400] ,是 !NetApp 針對 Big Data 應用(Ex. Hadoop)強化 IOPS ) |
| 38 | * [[Image(http://blogs.netapp.com/.a/6a00d8341ca27e53ef01538e5e5c80970b-pi)]] |
| 39 | * 看了一下 Shared DAS 主要做幾件事情: |
| 40 | {{{ |
| 41 | #!text |
| 42 | <1> 幫忙做背景的複本工作(用硬體 RAID 減少複本執行時間) |
| 43 | reduce the amount of background replication tasks by employing highly efficient RAID |
| 44 | <2> 降低 Disk I/O 的反應時間(用硬體方式提高 IOPS) |
| 45 | NetApp E-Series Shared DAS enables significantly higher disk I/O bandwidth at lower latency |
| 46 | <3> 減少複本個數(用硬體 RAID 減少複本個數,增加硬碟可用空間,或許跟去重複技術也有關) |
| 47 | reducing the number of object replicas within a rack |
| 48 | Fewer replicas mean less disks to buy or more objects stored within the same infrastructure. |
| 49 | }}} |
| 50 | * [http://www.greenplum.com/products/greenplum-hd EMC Greenplum HD] |
| 51 | {{{ |
| 52 | #!text |
| 53 | EMC Greenplum provides fault tolerance for the Name Node and Job Tracker, |
| 54 | both single points of failure in Hadoop. |
| 55 | }}} |
| 56 | |
| 57 | * Cascalog - 基於 Clojure 寫的一個 Hadoop 查詢語言,可以方便分析師用類似 SQL 語法 / Datalog 語法做分析 |
| 58 | * [http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html Introducing Cascalog: a Clojure-based query language for Hadoop] |
| 59 | * https://github.com/nathanmarz/cascalog |
| 60 | * <案例分享> [http://tech.backtype.com/52456836 Why Yieldbot chose Cascalog over Pig for Hadoop processing] |
| 61 | |