Changes between Version 4 and Version 5 of jazz/11-05-17


Ignore:
Timestamp:
May 18, 2011, 12:48:26 AM (13 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/11-05-17

    v4 v5  
    1010 * http://code.google.com/intl/zh-TW/apis/predict/ - Google Prediction API
    1111
     12== Social Network : Facebook ==
     13
     14 * [http://www.linux-mag.com/id/8705/ FBCMD: Command Line for Facebook] - 可以用命令列取得 Facebook 資訊的工具,看樣子這樣要作批次處理就變簡單了。
     15 * [http://gigaom.com/cloud/how-facebook-brings-a-new-data-center-online/ How Facebook Brings a New Data Center Online] - Facebook 最近擴充動作頻繁,這篇文章提到幾個自由軟體:
     16  * [https://github.com/facebook/flashcache/ FlashCache] - 看起來是加速 MySQL 資料庫的工具
     17{{{
     18Flashcache with MySQL allows us to achieve twice the throughput on each of
     19our new MySQL machine...we need to run two MySQL instances on each machine...
     20}}}
     21  * 另外有個佈署工具叫做 Kobold,不過還找不到 code。
     22
     23== Hadoop ==
     24
     25 * 多數 Hadoop 的 Patch 由 Yahoo 提供。
     26 * [[Image(http://ydn.zenfs.com/blogs/22/HadoopPatches.png,width=800)]]
     27 * 下一代的 Hadoop MapReduce 架構 - [http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/ The Next Generation of Apache Hadoop MapReduce]
     28 * [[Image(http://ydn.zenfs.com/blogs/22/MapReduce_NextGen.jpg)]]
     29
     30 * [http://blogs.computerworlduk.com/app-dev-and-programme-management/2011/05/big-data-technology-getting-hotter-but-still-too-hard-for-most-developers/ "Big Data" technology: getting hotter, but still too hard]
     31   * 最近在 !LinkedIn 的社群也看到一樣的問題:Hadoop 不好學、對一般企業來說太複雜了!!
     32
     33 * [http://nosql.mypopescu.com/post/5397319531/hadoop-ecosystem-emc-netapp-mellanox-snaplogic Hadoop Ecosystem: EMC, NetApp, Mellanox, SnapLogic, DataStax]
     34 * DataStax Brisk: Hadoop and Hive on Cassandra (詳 [wiki:jazz/11-04-01 2011-04-01])
     35 * [http://www.snaplogic.com/solutions/bigdata/ SnapLogic SnapReduce] - 這間公司目標想把 Hadoop 變成更簡單,設計了圖形化介面來作 Map / Reduce 工作的規劃。(詳 [wiki:jazz/11-05-12 2011-05-12])
     36 * [http://www.mellanox.com/content/pages.php?pg=web_2_0 Mellanox Hadoop-Direct] - - mellanox 用硬體去加速 Hadoop 與 Memcached (詳 [wiki:jazz/11-05-12 2011-05-12])
     37 * [http://blogs.netapp.com/exposed/2011/05/what-are-hadooplers.html NetApp Hadoop Shared DAS] ([wiki:jazz/11-05-12 2011-05-12] 有提到 !NetApp 特製的硬體 [http://www.netapp.com/us/products/storage-systems/e5400/e5400.html NetApp e5400] ,是 !NetApp 針對 Big Data 應用(Ex. Hadoop)強化 IOPS )
     38 * [[Image(http://blogs.netapp.com/.a/6a00d8341ca27e53ef01538e5e5c80970b-pi)]]
     39 * 看了一下 Shared DAS 主要做幾件事情:
     40{{{
     41#!text
     42<1> 幫忙做背景的複本工作(用硬體 RAID 減少複本執行時間)
     43reduce the amount of background replication tasks by employing highly efficient RAID
     44<2> 降低 Disk I/O 的反應時間(用硬體方式提高 IOPS)
     45NetApp E-Series Shared DAS enables significantly higher disk I/O bandwidth at lower latency
     46<3> 減少複本個數(用硬體 RAID 減少複本個數,增加硬碟可用空間,或許跟去重複技術也有關)
     47reducing the number of object replicas within a rack
     48Fewer replicas mean less disks to buy or more objects stored within the same infrastructure.
     49}}}
     50 * [http://www.greenplum.com/products/greenplum-hd EMC Greenplum HD]
     51{{{
     52#!text
     53EMC Greenplum provides fault tolerance for the Name Node and Job Tracker,
     54both single points of failure in Hadoop.
     55}}}
     56
     57 * Cascalog - 基於 Clojure 寫的一個 Hadoop 查詢語言,可以方便分析師用類似 SQL 語法 / Datalog 語法做分析
     58   * [http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html Introducing Cascalog: a Clojure-based query language for Hadoop]
     59   * https://github.com/nathanmarz/cascalog
     60   * <案例分享> [http://tech.backtype.com/52456836 Why Yieldbot chose Cascalog over Pig for Hadoop processing]
     61
    1262== 影響力 ==
    1363