close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": /usr/lib/python2.7/dist-packages/libsvn/_repos.so: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
- Timestamp:
-
Oct 31, 2012, 4:53:06 PM (13 years ago)
- Author:
-
jazz
- Comment:
-
--
Legend:
- Unmodified
- Added
- Removed
- Modified
-
v1
|
v2
|
|
5 | 5 | * 9:10am '''Hadoop: Thinking Big''' - John Schroeder (MapR Technologies) |
6 | 6 | * MapR breaks Terasort benchmark record on Google Compute Engine |
7 | | * 9:20am Plenary |
8 | | * '''Beyond Batch''' - Doug Cutting (Cloudera) |
| 7 | * 9:20am '''Beyond Batch''' - Doug Cutting (Cloudera) |
9 | 8 | * HBase: First Non-Batch Component |
10 | 9 | * Google Give US Map - 2012 Spanner Paper , 26 authors! |
11 | 10 | * Cloudera Impala (2010) -> Google Dremel (2010) : online queries !! |
12 | | * 9:30am Plenary |
13 | | * '''Cloud, Mobile and Big Data – How Analytics Provides Value to the Buzzwords''' - Paul Kent (SAS) |
| 11 | * 9:30am '''Cloud, Mobile and Big Data – How Analytics Provides Value to the Buzzwords''' - Paul Kent (SAS) |
14 | 12 | * 讓企業可以更即時地做出決策 - Action in Time |
15 | 13 | * Predicting Future outcomes |
16 | | * 9:35am Plenary |
17 | | * '''They Don't Teach You That In School''' - Cathy O'Neil, Julie Steele (O'Reilly Media, Inc.) |
| 14 | * 9:35am '''They Don't Teach You That In School''' - Cathy O'Neil, Julie Steele (O'Reilly Media, Inc.) |
18 | 15 | * What is the requirement of Data Scientist - Machine Learning, Statistics |
19 | 16 | * Feature Selection - Machine Learning for Ad. |
20 | | * 9:45am Plenary |
21 | | * '''From Traditional Database to Big Data Platform''' - Irfan Khan (SAP) |
22 | | * 9:50am Plenary |
23 | | * '''Of Rocket Ships and Washing Machines: Data Technology for People''' - Joe Hellerstein (Trifacta and UC Berkeley) |
| 17 | * 9:45am '''From Traditional Database to Big Data Platform''' - Irfan Khan (SAP) |
| 18 | * 9:50am '''Of Rocket Ships and Washing Machines: Data Technology for People''' - Joe Hellerstein (Trifacta and UC Berkeley) |
24 | 19 | * 就像洗碗機的發明,我們還在很早期的資料科學發展階段,因為八成的資料處理工作都在整理資料 - 80% work is in cleaning the data |
25 | 20 | * Develop productivity technology |
26 | 21 | * Shreddr - http://www.captricity.com |
27 | 22 | * Analytic Trifecta |
28 | | * 10:00am Plenary |
29 | | * '''Are We Really Winning the Information Revolution?''' - Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community) |
| 23 | * 10:00am '''Are We Really Winning the Information Revolution?''' - Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community) |
30 | 24 | * 我們骨子裡知道答案就在那一堆資料裡,然而現在我們有太多太多的資料了。 |
31 | 25 | * 資料太多,必須要透過選擇、考慮優先權,才有辦法真正從中得到洞見,做出正確的決策。 |
… |
… |
|
62 | 56 | * (Think: 這是需求的最開始規劃階段應該思考的問題, 考慮 MapReduce 跟 HDFS -> 多少計算、儲存,但是網路常常會被忽略 -> Switch 選擇與監控支援. It's all about SCALE!!) |
63 | 57 | * QoS 支援 - 這些問題都是在非常大型的環境裏面才會發生 |
64 | | * [http://www.openflow.org OpenFlow] (SDN, Software Defined Network)對 Hadoop 環境的影響 - 為了 Data Locality / Rack Aware 過去必須要靠人工設定 |
| 58 | * OpenFlow (SDN, Software Defined Network)對 Hadoop 環境的影響 - 為了 Data Locality / Rack Aware 過去必須要靠人工設定 |
65 | 59 | * 14:30pm '''Is Your Cluster a Leaning Tower of Pisa?''' - Michael Segel (Think Big Analytics) |
66 | | * |
| 60 | * 笑話:醫學系二年級的學生最主要學到的是怎麼問病患問題!!因為好的診斷來自好的問題!! |
| 61 | * (Think: 這裡舉的問題例子還真像 forum.hadoop.tw 常見的問題,結果要經過兩三次往返才能真正切入問題本身,有時不是叢集架構問題,但有時候還是習慣假設是環境的問題) |
| 62 | * (Think: CHUG 的 Logo -> 放個台灣來設計個 Taiwan Hadoop User Group Logo) |
| 63 | * (Think: 企業導入 Hadoop 的流程 Workflow . FAQs , Vendor Supply Chain , ..) |
| 64 | * Different Type of Cluster - from "on promise" to "CAAS (Cluster as a Service)" |
| 65 | * CAAS - Redundant Data Centers as an option (異地備援, CDN) |
| 66 | * DR(Desaster Recovery)/BCP(?) |
| 67 | * Golden Ratio - |
| 68 | * CPU cores to Memory - 4~8 GB RAM per Core |
| 69 | * 1+ Spindles (Hard Drives) per Core |
| 70 | * > 4 drives 1GBe is not enough (Network) |
| 71 | * '''According to Moore -> the optimal ratio will be re-evaluated.''' |
| 72 | * Think about TCO (Total Cost of Ownership)!! |
| 73 | * Using VMs : |
| 74 | * PRO: Allow Multi-tendency |
| 75 | * In furture - we expect to see more virtualization |
| 76 | * Mesos / Spark - Berkly |
| 77 | * YARN |
| 78 | * Storm |
| 79 | * Use VM to keep the ratio 'balance'!! |
67 | 80 | * 16:10pm '''Real-time Big Data Without Streaming''' - Ron Bodkin (Think Big Analytics) |
68 | | * 17:00pm '''Realtime Processing with Storm''' - Gabriel Eisbruch (mercadolibre.com), Luis Darío Simonassi (mercadolibre.com), Jonathan Leibiusky (mercadolibre.com) |
| 81 | * 算是比較高階的架構問題,不同的即時性應用該採用怎樣的架構。 |
| 82 | * 覺得基本上元件就那幾樣(NoSQL, Index, Search, Streaming Server),但是後續更難的應該是把這些元件連接起來的方法(Ex.接頭)。 |
| 83 | * 17:00pm '''Realtime Processing with Storm''' - Gabriel Eisbruch (Mercadolibre.Com), Luis Darío Simonassi (MercadoLibre.Com), Jonathan Leibiusky (MercadoLibre.com) |