Context Navigation

1119

Timestamp:: Nov 19, 2009, 6:23:21 PM (16 years ago)
Author:: waue
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

waue/2009/1119

-                      v3
+                      v4
 [[PageOutline]]
+ = 1.  =
+ * A Dynamic MapReduce Scheduler for Heterogeneous Workloads
+ = 1.  在異質環境下的動態排班系統 =
+ A Dynamic MapReduce Scheduler for Heterogeneous Workloads
+ * 基於異質環境下，建立三種排班方式
+ * 用模擬的方式，宣稱用在hadoop後能比不用快30%
+Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a
+practical data center of that scale, it is a common case that I/Obound
+jobs and CPU-bound jobs, which demand different
+resources, run simultaneously in the same cluster. In the
+MapReduce framework, parallelization of these two kinds of job
+has not been concerned. In this paper, we give a new view of the
+MapReduce model, and classify the MapReduce workloads into
+three categories based on their CPU and I/O utilization. With
+workload classification, we design a new dynamic MapReduce
+workload predict mechanism, MR-Predict, which detects the
+workload type on the fly. We propose a Triple-Queue Scheduler
+based on the MR-Predict mechanism. The Triple-Queue
+scheduler could improve the usage of both CPU and disk I/O
+resources under heterogeneous workloads. And it could improve
+the Hadoop throughput by about 30% under heterogeneous
+workloads.
+  = 2.  設計一個高效能的雲端平台 =
+An New Data Parallelism Approach with High Performace Clouds
+ * 宣稱設計更為簡化，因此效能較好
+ * 號稱某些case比hadoop 快兩倍
+ = 3.  parallel closed cube 演算法 =
+A Parallel Algorithm for Closed Cube Computation
+ *  parallel closed cube 是個不容易瞭解的演算法，而作者設計了一個能用在MR平台下的parallel closed cube 演算法
+ * 並宣稱實驗結果有得到好處
+ = 4. 用雲端運算處理衛星資料 =
+Cloud Computing for Satellite Data Processing on High End Compute Clusters
+ * 用高檔設備透過Hadoop處理衛星資料
+ * 此篇數據比較了 有用MapReduce 以及沒用的差別 （作者說程式沒有差很多）
+ = 5. 一個整合計算與資料管理的系統 =
+Clustera: An Integrated Computation And Data Management System
+ * 介紹一個資料管理系統，提供兩個特點
+   * 特點一為有延展性並且有能力於掌控大範圍Job 資料，並用最小的sql查詢語法減少I/O
+   * 特點二為用最新的軟體建立區塊，如此可以瞭解在應用伺服器或關連資料庫內的效能、使用率等資料
+ * 最後用 clustera 跟 Hadoop、 condor 比較
+ = 6.用sector做高效能資料探勘 =
+Data Mining Using High Performance Data Clouds
+ = 7.探勘日誌來偵測大範圍的系統問題 =
+Detecting Large-Scale System Problems by Mining Console Logs
+ * 透過探勘log檔，來偵查出有可能出現的系統runtime problem
+ * 實驗於 Hadoop日誌與DarkStar線上遊戲
+ = 8. Disco 的實驗論文 =
+ = 9. Sphere 的論文 =
+ = 10. 用Hadoop來算使用者習慣=
+Extraction of User Profile Based on the Hadoop
+ * 架構與icas很相向，果然是阿路仔
+ * 此篇用hadoop 來找使用者的習慣，其實只有在做map reduce 字數統計而已
+ * 也有畫出單一台與多台hadoop的效能比較，由於他們只有80MB的資料，因此一台最快，三台最慢