Changes between Version 1 and Version 2 of jazz/12-10-25


Ignore:
Timestamp:
Oct 31, 2012, 4:53:06 PM (12 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/12-10-25

    v1 v2  
    55 * 9:10am '''Hadoop: Thinking Big''' - John Schroeder (MapR Technologies)
    66   * MapR breaks Terasort benchmark record on Google Compute Engine
    7  * 9:20am Plenary
    8  * '''Beyond Batch''' - Doug Cutting (Cloudera)
     7 * 9:20am '''Beyond Batch''' - Doug Cutting (Cloudera)
    98   * HBase: First Non-Batch Component
    109   * Google Give US Map - 2012 Spanner Paper , 26 authors!
    1110   * Cloudera Impala (2010) -> Google Dremel (2010) : online queries !!
    12  * 9:30am Plenary
    13  * '''Cloud, Mobile and Big Data – How Analytics Provides Value to the Buzzwords''' - Paul Kent (SAS)
     11 * 9:30am '''Cloud, Mobile and Big Data – How Analytics Provides Value to the Buzzwords''' - Paul Kent (SAS)
    1412   * 讓企業可以更即時地做出決策 - Action in Time
    1513   * Predicting Future outcomes
    16  * 9:35am Plenary
    17  * '''They Don't Teach You That In School''' - Cathy O'Neil, Julie Steele (O'Reilly Media, Inc.)
     14 * 9:35am '''They Don't Teach You That In School''' - Cathy O'Neil, Julie Steele (O'Reilly Media, Inc.)
    1815   * What is the requirement of Data Scientist - Machine Learning, Statistics
    1916   * Feature Selection - Machine Learning for Ad.
    20  * 9:45am Plenary
    21  * '''From Traditional Database to Big Data Platform''' - Irfan Khan (SAP)
    22  * 9:50am Plenary
    23  * '''Of Rocket Ships and Washing Machines: Data Technology for People''' - Joe Hellerstein (Trifacta and UC Berkeley)
     17 * 9:45am '''From Traditional Database to Big Data Platform''' - Irfan Khan (SAP)
     18 * 9:50am '''Of Rocket Ships and Washing Machines: Data Technology for People''' - Joe Hellerstein (Trifacta and UC Berkeley)
    2419   * 就像洗碗機的發明,我們還在很早期的資料科學發展階段,因為八成的資料處理工作都在整理資料 - 80% work is in cleaning the data
    2520   * Develop productivity technology
    2621   * Shreddr - http://www.captricity.com
    2722   * Analytic Trifecta
    28  * 10:00am Plenary
    29  * '''Are We Really Winning the Information Revolution?''' - Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community)
     23 * 10:00am '''Are We Really Winning the Information Revolution?''' - Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community)
    3024   * 我們骨子裡知道答案就在那一堆資料裡,然而現在我們有太多太多的資料了。
    3125   * 資料太多,必須要透過選擇、考慮優先權,才有辦法真正從中得到洞見,做出正確的決策。
     
    6256   * (Think: 這是需求的最開始規劃階段應該思考的問題, 考慮 MapReduce 跟 HDFS -> 多少計算、儲存,但是網路常常會被忽略 -> Switch 選擇與監控支援. It's all about SCALE!!)
    6357   * QoS 支援 - 這些問題都是在非常大型的環境裏面才會發生
    64    * [http://www.openflow.org OpenFlow] (SDN, Software Defined Network)對 Hadoop 環境的影響 - 為了 Data Locality / Rack Aware 過去必須要靠人工設定
     58   * OpenFlow (SDN, Software Defined Network)對 Hadoop 環境的影響 - 為了 Data Locality / Rack Aware 過去必須要靠人工設定
    6559 * 14:30pm '''Is Your Cluster a Leaning Tower of Pisa?''' - Michael Segel (Think Big Analytics)
    66    *
     60   * 笑話:醫學系二年級的學生最主要學到的是怎麼問病患問題!!因為好的診斷來自好的問題!!
     61   * (Think: 這裡舉的問題例子還真像 forum.hadoop.tw 常見的問題,結果要經過兩三次往返才能真正切入問題本身,有時不是叢集架構問題,但有時候還是習慣假設是環境的問題)
     62   * (Think: CHUG 的 Logo -> 放個台灣來設計個 Taiwan Hadoop User Group Logo)
     63   * (Think: 企業導入 Hadoop 的流程 Workflow . FAQs , Vendor Supply Chain , ..)
     64   * Different Type of Cluster - from "on promise" to "CAAS (Cluster as a Service)"
     65    * CAAS - Redundant Data Centers as an option (異地備援, CDN)
     66   * DR(Desaster Recovery)/BCP(?)
     67   * Golden Ratio -
     68     * CPU cores to Memory - 4~8 GB RAM per Core
     69     * 1+ Spindles (Hard Drives) per Core
     70     * > 4 drives 1GBe is not enough (Network)
     71     * '''According to Moore -> the optimal ratio will be re-evaluated.'''
     72   * Think about TCO (Total Cost of Ownership)!!
     73   * Using VMs :
     74     * PRO: Allow Multi-tendency
     75   * In furture - we expect to see more virtualization
     76     * Mesos / Spark - Berkly
     77     * YARN
     78     * Storm
     79   * Use VM to keep the ratio 'balance'!!
    6780 * 16:10pm '''Real-time Big Data Without Streaming''' - Ron Bodkin (Think Big Analytics)
    68  * 17:00pm '''Realtime Processing with Storm''' - Gabriel Eisbruch (mercadolibre.com), Luis Darío Simonassi (mercadolibre.com), Jonathan Leibiusky (mercadolibre.com)
     81   * 算是比較高階的架構問題,不同的即時性應用該採用怎樣的架構。
     82   * 覺得基本上元件就那幾樣(NoSQL, Index, Search, Streaming Server),但是後續更難的應該是把這些元件連接起來的方法(Ex.接頭)。
     83 * 17:00pm '''Realtime Processing with Storm''' - Gabriel Eisbruch (Mercadolibre.Com), Luis Darío Simonassi (MercadoLibre.Com), Jonathan Leibiusky (MercadoLibre.com)