Changes between Version 4 and Version 5 of jazz/12-10-24


Ignore:
Timestamp:
Oct 25, 2012, 12:41:49 PM (12 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/12-10-24

    v4 v5  
    11= 2012-10-24 =
    22
    3 == Hadoop World 2012 ==
     3== Hadoop World 2012 (Keynotes) ==
    44
    55 * '''Big Answers''' - Mike Olson (Cloudera)
     
    5050   * Mission > Consumerism
    5151   * Three Key mission: (1) Security and Freedom of World (2) (3)
     52
     53== Hadoop World 2012 (Sessions) ==
     54
     55 * 10:50am Wednesday, 10/24/2012
     56 * '''MapReduce Design Patterns''', Donald Miner (EMC Greenplum)
     57   * 這本書感覺上是本不錯的程式設計參考指南,特別是會把該 Pattern 的特徵也用 SQL 或 Pig 語法呈獻。對於跨團隊溝通上有蠻好的幫助。
     58 * 11:40am Wednesday, 10/24/2012
     59 * '''Analyzing Millions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?''', Ilya Grigorik (Google), Brian Doll (GitHub)
     60   * Ilya 是 Google BigQuery 的開發者(Cool~), Brian 是 Github 的 Marketing
     61   * (Ilya)
     62   * 動機:追蹤太多專案,很難虧得全貌!'''Global Timeline''' -> only if we can access github archive
     63   * http://githubarchive.org - data start from March 2012
     64   * (Think: 未來個人在 githube 的活動也許會成為獵人頭公司判斷這個人程式設計能力的參考 -> 程式設計人員自我行銷的方法)
     65   * Dremel (Paper) -> [http://www.google.com/bigquery BigQuery]
     66{{{
     67$ wget http://data.githubarchive.org/2012-04-11-15.json.gz
     68$ ruby flatten.rb 2012-04-11-15.json.gz > flat.csv.gz
     69$ bg load github.timeline flat.csv.gz
     70}}}
     71   * bigquery is a SQL-like syntax
     72   * (Think: 可以將 BigQuery 用在 OpenData.TW 的一些應用上)
     73   * GitHub + BigQuery + MailChimp -> 每天用 crontab 跑 BigQuery
     74   * GitHub Data Challenge (Brian)
     75   * Ex. http://octoboard.com
     76   * (Think: 分析 github 的整體行為是否可以代表全球 Open Source 活動的特徵? Ex. Private vs Public)
     77   * comment emotional -> 寫在註解中的情緒語言
     78   * 蠻有趣的分析 - Programming language associations
     79   * 關聯分析 - Github 熱門語言 與 StackOverflow 的問題個數
     80   * activity by country - commits per 100k people (提送數 / 人口數)
     81 * 1:40pm Wednesday, 10/24/2012
     82 * '''Facebook’s Large Scale Monitoring System Built on HBase''' - Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
     83   * Facebook ODS
     84    * Problem : (1) MySQL table size limitations (2) Sharding scheme created hotspots (3) Data growth
     85    * Lesson Learned - Locality : Spliting HBase and HDFS is not good!
     86   * Pitfalls
     87   * HBase 效能調校 - Takeaway 1~5
     88 * 2:30pm Wednesday, 10/24/2012
     89 * '''Bringing Real-Time, End-to-End Analytics into Everyday Use''', Greg Khairallah (Intel), Vin Sharma (Intel)
     90   *