= 2011-05-17 = == Prediction == * [http://www.bnext.com.tw/print/article/id/18420 福特採用Google資料預測演算法,研發節能智慧汽車] {{{ #!text Google 搜尋所向無敵的秘訣--人工智慧和機器學習演算法 }}} * http://code.google.com/intl/zh-TW/apis/predict/ - Google Prediction API == Social Network : Facebook == * [http://www.linux-mag.com/id/8705/ FBCMD: Command Line for Facebook] - 可以用命令列取得 Facebook 資訊的工具,看樣子這樣要作批次處理就變簡單了。 * [http://gigaom.com/cloud/how-facebook-brings-a-new-data-center-online/ How Facebook Brings a New Data Center Online] - Facebook 最近擴充動作頻繁,這篇文章提到幾個自由軟體: * [https://github.com/facebook/flashcache/ FlashCache] - 看起來是加速 MySQL 資料庫的工具 {{{ Flashcache with MySQL allows us to achieve twice the throughput on each of our new MySQL machine...we need to run two MySQL instances on each machine... }}} * 另外有個佈署工具叫做 Kobold,不過還找不到 code。 == Hadoop == * 多數 Hadoop 的 Patch 由 Yahoo 提供。 * [[Image(http://ydn.zenfs.com/blogs/22/HadoopPatches.png,width=800)]] * 下一代的 Hadoop MapReduce 架構 - [http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/ The Next Generation of Apache Hadoop MapReduce] * [[Image(http://ydn.zenfs.com/blogs/22/MapReduce_NextGen.jpg)]] * [http://blogs.computerworlduk.com/app-dev-and-programme-management/2011/05/big-data-technology-getting-hotter-but-still-too-hard-for-most-developers/ "Big Data" technology: getting hotter, but still too hard] * 最近在 !LinkedIn 的社群也看到一樣的問題:Hadoop 不好學、對一般企業來說太複雜了!! * [http://nosql.mypopescu.com/post/5397319531/hadoop-ecosystem-emc-netapp-mellanox-snaplogic Hadoop Ecosystem: EMC, NetApp, Mellanox, SnapLogic, DataStax] * DataStax Brisk: Hadoop and Hive on Cassandra (詳 [wiki:jazz/11-04-01 2011-04-01]) * [http://www.snaplogic.com/solutions/bigdata/ SnapLogic SnapReduce] - 這間公司目標想把 Hadoop 變成更簡單,設計了圖形化介面來作 Map / Reduce 工作的規劃。(詳 [wiki:jazz/11-05-12 2011-05-12]) * [http://www.mellanox.com/content/pages.php?pg=web_2_0 Mellanox Hadoop-Direct] - - mellanox 用硬體去加速 Hadoop 與 Memcached (詳 [wiki:jazz/11-05-12 2011-05-12]) * [http://blogs.netapp.com/exposed/2011/05/what-are-hadooplers.html NetApp Hadoop Shared DAS] ([wiki:jazz/11-05-12 2011-05-12] 有提到 !NetApp 特製的硬體 [http://www.netapp.com/us/products/storage-systems/e5400/e5400.html NetApp e5400] ,是 !NetApp 針對 Big Data 應用(Ex. Hadoop)強化 IOPS ) * [[Image(http://blogs.netapp.com/.a/6a00d8341ca27e53ef01538e5e5c80970b-pi)]] * 看了一下 Shared DAS 主要做幾件事情: {{{ #!text <1> 幫忙做背景的複本工作(用硬體 RAID 減少複本執行時間) reduce the amount of background replication tasks by employing highly efficient RAID <2> 降低 Disk I/O 的反應時間(用硬體方式提高 IOPS) NetApp E-Series Shared DAS enables significantly higher disk I/O bandwidth at lower latency <3> 減少複本個數(用硬體 RAID 減少複本個數,增加硬碟可用空間,或許跟去重複技術也有關) reducing the number of object replicas within a rack Fewer replicas mean less disks to buy or more objects stored within the same infrastructure. }}} * [http://www.greenplum.com/products/greenplum-hd EMC Greenplum HD] {{{ #!text EMC Greenplum provides fault tolerance for the Name Node and Job Tracker, both single points of failure in Hadoop. }}} * Cascalog - 基於 Clojure 寫的一個 Hadoop 查詢語言,可以方便分析師用類似 SQL 語法 / Datalog 語法做分析 * [http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html Introducing Cascalog: a Clojure-based query language for Hadoop] * https://github.com/nathanmarz/cascalog * <案例分享> [http://tech.backtype.com/52456836 Why Yieldbot chose Cascalog over Pig for Hadoop processing] == 影響力 == * 我們團隊架設的網站在近期國網中心的流量佔 26.41%,僅次於科學志工網站。 * [[Image(jazz/11-05-17:11-05-17_nchc_traffic.png)]] * 國網中心關鍵字第一名:Clonezilla,再生龍第五,partclone第十。至於 v86d 是先前追 jfbterm 造成的。比較有趣的是 opennebula 進榜了。 * [[Image(jazz/11-05-17:11-05-17_nchc_search_top10.png)]] * [[Image(jazz/11-05-17:11-05-17_hadoop.tw_search.png)]]