| 52 | |
| 53 | == Hadoop World 2012 (Sessions) == |
| 54 | |
| 55 | * 10:50am Wednesday, 10/24/2012 |
| 56 | * '''MapReduce Design Patterns''', Donald Miner (EMC Greenplum) |
| 57 | * 這本書感覺上是本不錯的程式設計參考指南,特別是會把該 Pattern 的特徵也用 SQL 或 Pig 語法呈獻。對於跨團隊溝通上有蠻好的幫助。 |
| 58 | * 11:40am Wednesday, 10/24/2012 |
| 59 | * '''Analyzing Millions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?''', Ilya Grigorik (Google), Brian Doll (GitHub) |
| 60 | * Ilya 是 Google BigQuery 的開發者(Cool~), Brian 是 Github 的 Marketing |
| 61 | * (Ilya) |
| 62 | * 動機:追蹤太多專案,很難虧得全貌!'''Global Timeline''' -> only if we can access github archive |
| 63 | * http://githubarchive.org - data start from March 2012 |
| 64 | * (Think: 未來個人在 githube 的活動也許會成為獵人頭公司判斷這個人程式設計能力的參考 -> 程式設計人員自我行銷的方法) |
| 65 | * Dremel (Paper) -> [http://www.google.com/bigquery BigQuery] |
| 66 | {{{ |
| 67 | $ wget http://data.githubarchive.org/2012-04-11-15.json.gz |
| 68 | $ ruby flatten.rb 2012-04-11-15.json.gz > flat.csv.gz |
| 69 | $ bg load github.timeline flat.csv.gz |
| 70 | }}} |
| 71 | * bigquery is a SQL-like syntax |
| 72 | * (Think: 可以將 BigQuery 用在 OpenData.TW 的一些應用上) |
| 73 | * GitHub + BigQuery + MailChimp -> 每天用 crontab 跑 BigQuery |
| 74 | * GitHub Data Challenge (Brian) |
| 75 | * Ex. http://octoboard.com |
| 76 | * (Think: 分析 github 的整體行為是否可以代表全球 Open Source 活動的特徵? Ex. Private vs Public) |
| 77 | * comment emotional -> 寫在註解中的情緒語言 |
| 78 | * 蠻有趣的分析 - Programming language associations |
| 79 | * 關聯分析 - Github 熱門語言 與 StackOverflow 的問題個數 |
| 80 | * activity by country - commits per 100k people (提送數 / 人口數) |
| 81 | * 1:40pm Wednesday, 10/24/2012 |
| 82 | * '''Facebook’s Large Scale Monitoring System Built on HBase''' - Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook) |
| 83 | * Facebook ODS |
| 84 | * Problem : (1) MySQL table size limitations (2) Sharding scheme created hotspots (3) Data growth |
| 85 | * Lesson Learned - Locality : Spliting HBase and HDFS is not good! |
| 86 | * Pitfalls |
| 87 | * HBase 效能調校 - Takeaway 1~5 |
| 88 | * 2:30pm Wednesday, 10/24/2012 |
| 89 | * '''Bringing Real-Time, End-to-End Analytics into Everyday Use''', Greg Khairallah (Intel), Vin Sharma (Intel) |
| 90 | * |