Version 6 (modified by jazz, 12 years ago) (diff) |
---|
2012-10-24
Hadoop World 2012 (Keynotes)
- Big Answers - Mike Olson (Cloudera)
- http://cloudera.com/impla - Apache License , Realtime Query Engine
- Application : Agriculture , Social Security
- The End of the Data Warehouse - Ben Werther (Platfora)
- Add Datawarehouse between Web Interface and Hadoop
- In-Memory BI
- Moneyball for New York City - Michael Flowers (NYC Mayor's Office of Policy and Strategic Planning)
- Introduce "Data Sensing Lab" in the venue. (based on Arduino)
- Smart City - How to use NYC data more effectively
- Fire Risk Inspection Analytic - 預測火災的風險有多高 -> 有效地降低了
- 25 Years old staff , Start on 2005 PCs and MS Excel
- Use Web (Wall) for collaberation editing
- Grow and Enable Culture of Data-driven
- Thinking Big Together: Driving the Future of Data Science - Annika Jimenez (EMC Greenplum), Anthony Goldbloom (Kaggle)
- Data, Technology, Application, Data Science
- http://openchorus.org
- Kaggle - in short of professional exprt - 每次會議都會說公司在徵才,也代表這個領域的人才缺乏。
- http://greenplum.com/communities
- The Composite Database - Rich Hickey (Datomic)
- Big Data 雜誌 -
- Strata - 9:1 -> 投稿:接受
- breakdown tranditional database : storage, index, query
- Indexing as a component : Storage -> Indexing -> Ordered Storage
- Uqery as a component :
- Coordination as component
- Other component : notification, memory indexes (liveness), caching
- The Democratization of Big Data: Bringing Hadoop to the Masses - James Markarian (Informatica)
- 用分子表示是說明 Hadoop Ecosystem 的關聯性
- coding model work, but hadoop ecosystem need a friendly interface
- Big Data Direct – The Era of Self-driven Big Data Exploration Sharmila - Shahani-Mulligan (ClearStory Data)
- Ever-increasing complexity of sources - growing open data api - 資料的複雜度(complexity)愈來愈高
- 混合 private data 與 public data 可以得到許多洞見(insight) - 舉了不同的案例(Ex. 水管, Web Site SEO)
- $35 billion industry for databases and data containers
- Future : new era to analyze big data easilly and balance it with judgment and experience
- Solution Must Aid Human Insight - Big Data + Amplified Human Intelligence
- Next Generation Visualization - Only One Part of the Answer ( 一張圖也只是為了解釋某個答案的一部份)
- Bringing the 'So What' to Big Data - Tim Estes (Digital Reasoning)
- Machine Reading problem
- It's about People understanding data to change and impove their lives
- Big understanding > Big Data
- Key : Machine Learning , Bare Metal, Application
- People > Data
- Ex. 支持國家安全的分析, 確保自由。
- Ex. 從廣告找出人蛇集團,拯救未成年少女被綁架。
- Ex. 降低金融風險
- Mission > Consumerism
- Three Key mission: (1) Security and Freedom of World (2) (3)
Hadoop World 2012 (Sessions)
- 10:50am Wednesday, 10/24/2012
- MapReduce Design Patterns, Donald Miner (EMC Greenplum)
- 這本書感覺上是本不錯的程式設計參考指南,特別是會把該 Pattern 的特徵也用 SQL 或 Pig 語法呈獻。對於跨團隊溝通上有蠻好的幫助。
- 11:40am Wednesday, 10/24/2012
- Analyzing Millions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?, Ilya Grigorik (Google), Brian Doll (GitHub)
- Ilya 是 Google BigQuery 的開發者(Cool~), Brian 是 Github 的 Marketing
- (Ilya)
- 動機:追蹤太多專案,很難虧得全貌!Global Timeline -> only if we can access github archive
- http://githubarchive.org - data start from March 2012
- (Think: 未來個人在 githube 的活動也許會成為獵人頭公司判斷這個人程式設計能力的參考 -> 程式設計人員自我行銷的方法)
- Dremel (Paper) -> BigQuery
$ wget http://data.githubarchive.org/2012-04-11-15.json.gz $ ruby flatten.rb 2012-04-11-15.json.gz > flat.csv.gz $ bg load github.timeline flat.csv.gz
- bigquery is a SQL-like syntax
- (Think: 可以將 BigQuery 用在 OpenData.TW 的一些應用上)
- GitHub + BigQuery + MailChimp -> 每天用 crontab 跑 BigQuery
- GitHub Data Challenge (Brian)
- Ex. http://octoboard.com
- (Think: 分析 github 的整體行為是否可以代表全球 Open Source 活動的特徵? Ex. Private vs Public)
- comment emotional -> 寫在註解中的情緒語言
- 蠻有趣的分析 - Programming language associations
- 關聯分析 - Github 熱門語言 與 StackOverflow 的問題個數
- activity by country - commits per 100k people (提送數 / 人口數)
- 1:40pm Wednesday, 10/24/2012
- Facebook’s Large Scale Monitoring System Built on HBase - Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
- Facebook ODS
- Problem : (1) MySQL table size limitations (2) Sharding scheme created hotspots (3) Data growth
- Lesson Learned - Locality : Spliting HBase and HDFS is not good!
- Pitfalls
- HBase 效能調校 - Takeaway 1~5
- Facebook ODS
- 2:30pm Wednesday, 10/24/2012
- Bringing Real-Time, End-to-End Analytics into Everyday Use, Greg Khairallah (Intel), Vin Sharma (Intel)