wiki:jazz/12-10-24

2012-10-24

Hadoop World 2012 (Keynotes)

  • Big Answers - Mike Olson (Cloudera)
  • The End of the Data Warehouse - Ben Werther (Platfora)
    • Add Datawarehouse between Web Interface and Hadoop
    • In-Memory BI
  • Moneyball for New York City - Michael Flowers (NYC Mayor's Office of Policy and Strategic Planning)
    • Introduce "Data Sensing Lab" in the venue. (based on Arduino)
    • Smart City - How to use NYC data more effectively
    • Fire Risk Inspection Analytic - 預測火災的風險有多高 -> 有效地降低了
    • 25 Years old staff , Start on 2005 PCs and MS Excel
    • Use Web (Wall) for collaberation editing
    • Grow and Enable Culture of Data-driven
  • Thinking Big Together: Driving the Future of Data Science - Annika Jimenez (EMC Greenplum), Anthony Goldbloom (Kaggle)
  • The Composite Database - Rich Hickey (Datomic)
    • Big Data 雜誌 -
    • Strata - 9:1 -> 投稿:接受
    • breakdown tranditional database : storage, index, query
    • Indexing as a component : Storage -> Indexing -> Ordered Storage
    • Uqery as a component :
    • Coordination as component
    • Other component : notification, memory indexes (liveness), caching
  • The Democratization of Big Data: Bringing Hadoop to the Masses - James Markarian (Informatica)
    • 用分子表示是說明 Hadoop Ecosystem 的關聯性
    • coding model work, but hadoop ecosystem need a friendly interface
  • Big Data Direct – The Era of Self-driven Big Data Exploration Sharmila - Shahani-Mulligan (ClearStory Data)
    • Ever-increasing complexity of sources - growing open data api - 資料的複雜度(complexity)愈來愈高
    • 混合 private data 與 public data 可以得到許多洞見(insight) - 舉了不同的案例(Ex. 水管, Web Site SEO)
    • $35 billion industry for databases and data containers
    • Future : new era to analyze big data easilly and balance it with judgment and experience
    • Solution Must Aid Human Insight - Big Data + Amplified Human Intelligence
    • Next Generation Visualization - Only One Part of the Answer ( 一張圖也只是為了解釋某個答案的一部份)
  • Bringing the 'So What' to Big Data - Tim Estes (Digital Reasoning)
    • Machine Reading problem
    • It's about People understanding data to change and impove their lives
    • Big understanding > Big Data
    • Key : Machine Learning , Bare Metal, Application
    • People > Data
    • Ex. 支持國家安全的分析, 確保自由。
    • Ex. 從廣告找出人蛇集團,拯救未成年少女被綁架。
    • Ex. 降低金融風險
    • Mission > Consumerism
    • Three Key mission: (1) Security and Freedom of World (2) (3)

Hadoop World 2012 (Sessions)

  • 10:50am Wednesday, 10/24/2012
  • MapReduce Design Patterns, Donald Miner (EMC Greenplum)
    • 這本書感覺上是本不錯的程式設計參考指南,特別是會把該 Pattern 的特徵也用 SQL 或 Pig 語法呈獻。對於跨團隊溝通上有蠻好的幫助。
  • 11:40am Wednesday, 10/24/2012
  • Analyzing Millions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?, Ilya Grigorik (Google), Brian Doll (GitHub)
    • Ilya 是 Google BigQuery 的開發者(Cool~), Brian 是 Github 的 Marketing
    • (Ilya)
    • 動機:追蹤太多專案,很難虧得全貌!Global Timeline -> only if we can access github archive
    • http://githubarchive.org - data start from March 2012
    • (Think: 未來個人在 githube 的活動也許會成為獵人頭公司判斷這個人程式設計能力的參考 -> 程式設計人員自我行銷的方法)
    • Dremel (Paper) -> BigQuery
      $ wget http://data.githubarchive.org/2012-04-11-15.json.gz
      $ ruby flatten.rb 2012-04-11-15.json.gz > flat.csv.gz
      $ bg load github.timeline flat.csv.gz
      
    • bigquery is a SQL-like syntax
    • (Think: 可以將 BigQuery 用在 OpenData.TW 的一些應用上)
    • GitHub + BigQuery + MailChimp -> 每天用 crontab 跑 BigQuery
    • GitHub Data Challenge (Brian)
    • Ex. http://octoboard.com
    • (Think: 分析 github 的整體行為是否可以代表全球 Open Source 活動的特徵? Ex. Private vs Public)
    • comment emotional -> 寫在註解中的情緒語言
    • 蠻有趣的分析 - Programming language associations
    • 關聯分析 - Github 熱門語言 與 StackOverflow 的問題個數
    • activity by country - commits per 100k people (提送數 / 人口數)
  • 1:40pm Wednesday, 10/24/2012
  • Facebook’s Large Scale Monitoring System Built on HBase - Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
    • Facebook ODS
      • Problem : (1) MySQL table size limitations (2) Sharding scheme created hotspots (3) Data growth
      • Lesson Learned - Locality : Spliting HBase and HDFS is not good!
    • Pitfalls
    • HBase 效能調校 - Takeaway 1~5
  • 2:30pm Wednesday, 10/24/2012
  • Bringing Real-Time, End-to-End Analytics into Everyday Use, Greg Khairallah (Intel), Vin Sharma (Intel)
Last modified 12 years ago Last modified on Oct 25, 2012, 12:42:21 PM