Changes between Version 3 and Version 4 of waue/2009/1119


Ignore:
Timestamp:
Nov 19, 2009, 6:23:21 PM (14 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue/2009/1119

    v3 v4  
    99[[PageOutline]]
    1010
    11  = 1.  =
    12  * A Dynamic MapReduce Scheduler for Heterogeneous Workloads
     11 = 1.  在異質環境下的動態排班系統 =
     12 A Dynamic MapReduce Scheduler for Heterogeneous Workloads
     13 * 基於異質環境下,建立三種排班方式
     14 * 用模擬的方式,宣稱用在hadoop後能比不用快30%
    1315
    14 Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a
    15 practical data center of that scale, it is a common case that I/Obound
    16 jobs and CPU-bound jobs, which demand different
    17 resources, run simultaneously in the same cluster. In the
    18 MapReduce framework, parallelization of these two kinds of job
    19 has not been concerned. In this paper, we give a new view of the
    20 MapReduce model, and classify the MapReduce workloads into
    21 three categories based on their CPU and I/O utilization. With
    22 workload classification, we design a new dynamic MapReduce
    23 workload predict mechanism, MR-Predict, which detects the
    24 workload type on the fly. We propose a Triple-Queue Scheduler
    25 based on the MR-Predict mechanism. The Triple-Queue
    26 scheduler could improve the usage of both CPU and disk I/O
    27 resources under heterogeneous workloads. And it could improve
    28 the Hadoop throughput by about 30% under heterogeneous
    29 workloads.
     16  = 2.  設計一個高效能的雲端平台 =
     17An New Data Parallelism Approach with High Performace Clouds
     18 * 宣稱設計更為簡化,因此效能較好
     19 * 號稱某些case比hadoop 快兩倍
    3020
     21 = 3.  parallel closed cube 演算法 =
     22A Parallel Algorithm for Closed Cube Computation
     23 *  parallel closed cube 是個不容易瞭解的演算法,而作者設計了一個能用在MR平台下的parallel closed cube 演算法
     24 * 並宣稱實驗結果有得到好處
     25
     26 = 4. 用雲端運算處理衛星資料 =
     27Cloud Computing for Satellite Data Processing on High End Compute Clusters
     28 * 用高檔設備透過Hadoop處理衛星資料
     29 * 此篇數據比較了 有用MapReduce 以及沒用的差別 (作者說程式沒有差很多)
     30
     31 = 5. 一個整合計算與資料管理的系統 =
     32Clustera: An Integrated Computation And Data Management System
     33 * 介紹一個資料管理系統,提供兩個特點
     34   * 特點一為有延展性並且有能力於掌控大範圍Job 資料,並用最小的sql查詢語法減少I/O
     35   * 特點二為用最新的軟體建立區塊,如此可以瞭解在應用伺服器或關連資料庫內的效能、使用率等資料
     36 * 最後用 clustera 跟 Hadoop、 condor 比較
     37 
     38 = 6.用sector做高效能資料探勘 =
     39Data Mining Using High Performance Data Clouds
     40
     41 = 7.探勘日誌來偵測大範圍的系統問題 =
     42Detecting Large-Scale System Problems by Mining Console Logs
     43 * 透過探勘log檔,來偵查出有可能出現的系統runtime problem
     44 * 實驗於 Hadoop日誌與DarkStar線上遊戲
     45 
     46 
     47 = 8. Disco 的實驗論文 =
     48 
     49 = 9. Sphere 的論文 =
     50 
     51 = 10. 用Hadoop來算使用者習慣=
     52Extraction of User Profile Based on the Hadoop
     53 * 架構與icas很相向,果然是阿路仔
     54 * 此篇用hadoop 來找使用者的習慣,其實只有在做map reduce 字數統計而已
     55 * 也有畫出單一台與多台hadoop的效能比較,由於他們只有80MB的資料,因此一台最快,三台最慢
     56 
     57