| | 52 | |
| | 53 | == Hadoop World 2012 (Sessions) == |
| | 54 | |
| | 55 | * 10:50am Wednesday, 10/24/2012 |
| | 56 | * '''MapReduce Design Patterns''', Donald Miner (EMC Greenplum) |
| | 57 | * 這本書感覺上是本不錯的程式設計參考指南,特別是會把該 Pattern 的特徵也用 SQL 或 Pig 語法呈獻。對於跨團隊溝通上有蠻好的幫助。 |
| | 58 | * 11:40am Wednesday, 10/24/2012 |
| | 59 | * '''Analyzing Millions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?''', Ilya Grigorik (Google), Brian Doll (GitHub) |
| | 60 | * Ilya 是 Google BigQuery 的開發者(Cool~), Brian 是 Github 的 Marketing |
| | 61 | * (Ilya) |
| | 62 | * 動機:追蹤太多專案,很難虧得全貌!'''Global Timeline''' -> only if we can access github archive |
| | 63 | * http://githubarchive.org - data start from March 2012 |
| | 64 | * (Think: 未來個人在 githube 的活動也許會成為獵人頭公司判斷這個人程式設計能力的參考 -> 程式設計人員自我行銷的方法) |
| | 65 | * Dremel (Paper) -> [http://www.google.com/bigquery BigQuery] |
| | 66 | {{{ |
| | 67 | $ wget http://data.githubarchive.org/2012-04-11-15.json.gz |
| | 68 | $ ruby flatten.rb 2012-04-11-15.json.gz > flat.csv.gz |
| | 69 | $ bg load github.timeline flat.csv.gz |
| | 70 | }}} |
| | 71 | * bigquery is a SQL-like syntax |
| | 72 | * (Think: 可以將 BigQuery 用在 OpenData.TW 的一些應用上) |
| | 73 | * GitHub + BigQuery + MailChimp -> 每天用 crontab 跑 BigQuery |
| | 74 | * GitHub Data Challenge (Brian) |
| | 75 | * Ex. http://octoboard.com |
| | 76 | * (Think: 分析 github 的整體行為是否可以代表全球 Open Source 活動的特徵? Ex. Private vs Public) |
| | 77 | * comment emotional -> 寫在註解中的情緒語言 |
| | 78 | * 蠻有趣的分析 - Programming language associations |
| | 79 | * 關聯分析 - Github 熱門語言 與 StackOverflow 的問題個數 |
| | 80 | * activity by country - commits per 100k people (提送數 / 人口數) |
| | 81 | * 1:40pm Wednesday, 10/24/2012 |
| | 82 | * '''Facebook’s Large Scale Monitoring System Built on HBase''' - Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook) |
| | 83 | * Facebook ODS |
| | 84 | * Problem : (1) MySQL table size limitations (2) Sharding scheme created hotspots (3) Data growth |
| | 85 | * Lesson Learned - Locality : Spliting HBase and HDFS is not good! |
| | 86 | * Pitfalls |
| | 87 | * HBase 效能調校 - Takeaway 1~5 |
| | 88 | * 2:30pm Wednesday, 10/24/2012 |
| | 89 | * '''Bringing Real-Time, End-to-End Analytics into Everyday Use''', Greg Khairallah (Intel), Vin Sharma (Intel) |
| | 90 | * |