14 | | Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a |
15 | | practical data center of that scale, it is a common case that I/Obound |
16 | | jobs and CPU-bound jobs, which demand different |
17 | | resources, run simultaneously in the same cluster. In the |
18 | | MapReduce framework, parallelization of these two kinds of job |
19 | | has not been concerned. In this paper, we give a new view of the |
20 | | MapReduce model, and classify the MapReduce workloads into |
21 | | three categories based on their CPU and I/O utilization. With |
22 | | workload classification, we design a new dynamic MapReduce |
23 | | workload predict mechanism, MR-Predict, which detects the |
24 | | workload type on the fly. We propose a Triple-Queue Scheduler |
25 | | based on the MR-Predict mechanism. The Triple-Queue |
26 | | scheduler could improve the usage of both CPU and disk I/O |
27 | | resources under heterogeneous workloads. And it could improve |
28 | | the Hadoop throughput by about 30% under heterogeneous |
29 | | workloads. |
| 16 | = 2. 設計一個高效能的雲端平台 = |
| 17 | An New Data Parallelism Approach with High Performace Clouds |
| 18 | * 宣稱設計更為簡化,因此效能較好 |
| 19 | * 號稱某些case比hadoop 快兩倍 |
| 21 | = 3. parallel closed cube 演算法 = |
| 22 | A Parallel Algorithm for Closed Cube Computation |
| 23 | * parallel closed cube 是個不容易瞭解的演算法,而作者設計了一個能用在MR平台下的parallel closed cube 演算法 |
| 24 | * 並宣稱實驗結果有得到好處 |
| 25 | |
| 26 | = 4. 用雲端運算處理衛星資料 = |
| 27 | Cloud Computing for Satellite Data Processing on High End Compute Clusters |
| 28 | * 用高檔設備透過Hadoop處理衛星資料 |
| 29 | * 此篇數據比較了 有用MapReduce 以及沒用的差別 (作者說程式沒有差很多) |
| 30 | |
| 31 | = 5. 一個整合計算與資料管理的系統 = |
| 32 | Clustera: An Integrated Computation And Data Management System |
| 33 | * 介紹一個資料管理系統,提供兩個特點 |
| 34 | * 特點一為有延展性並且有能力於掌控大範圍Job 資料,並用最小的sql查詢語法減少I/O |
| 35 | * 特點二為用最新的軟體建立區塊,如此可以瞭解在應用伺服器或關連資料庫內的效能、使用率等資料 |
| 36 | * 最後用 clustera 跟 Hadoop、 condor 比較 |
| 37 | |
| 38 | = 6.用sector做高效能資料探勘 = |
| 39 | Data Mining Using High Performance Data Clouds |
| 40 | |
| 41 | = 7.探勘日誌來偵測大範圍的系統問題 = |
| 42 | Detecting Large-Scale System Problems by Mining Console Logs |
| 43 | * 透過探勘log檔,來偵查出有可能出現的系統runtime problem |
| 44 | * 實驗於 Hadoop日誌與DarkStar線上遊戲 |
| 45 | |
| 46 | |
| 47 | = 8. Disco 的實驗論文 = |
| 48 | |
| 49 | = 9. Sphere 的論文 = |
| 50 | |
| 51 | = 10. 用Hadoop來算使用者習慣= |
| 52 | Extraction of User Profile Based on the Hadoop |
| 53 | * 架構與icas很相向,果然是阿路仔 |
| 54 | * 此篇用hadoop 來找使用者的習慣,其實只有在做map reduce 字數統計而已 |
| 55 | * 也有畫出單一台與多台hadoop的效能比較,由於他們只有80MB的資料,因此一台最快,三台最慢 |
| 56 | |
| 57 | |