wiki:YMU110509
雲端運算技術與生物資訊應用
Cloud Computing Technologies and its Bioinformatics Applications

課程資訊 Course Info.

  • 上課時間: 2011/05/09 (一) ~ 2011/06/06 (一) 15:30 ~ 17:20
  • Date and Time: 15:30 to 17:20, from 9 May 2011 to 6 June 2011, every Monday
  • 上課地點: 陽明大學 圖資大樓 電腦教室 R401
  • Location: R401 PC Room, Library and Information Building, Natioanl Yang-Ming University
  • 系所公告:http://bmi.ym.edu.tw/wp/?p=4532

線上討論 Web IRC chatroom

課程大綱 Course Outline

時段
Date
分類
Section
課程內容 Topics 投影片
Slides
實作
Hands-On
補充資料
Notes
05/09 Introduction - 高速運算於生物資訊之應用
- HPC for Bioinformatics
- PC Cluster 101
- 平行運算程式的種類
- Parallel Programming Model
part-1
part-2
實作一 Intel 談多核心的重要性
關於 Amdahl’s Law
05/16 Introduction - Cloud Computing Architecture
- Introduction to Hadoop
- Hadoop Distributed File System
- Hands-on: HDFS commands
part-3
part-4
實作二
實作三
Hadoop 單機安裝(for Windows XP)
05/23 ---- 因校園連外網路品質不穩,順延一週授課
05/30 Hands-On - MapReduce Algorithm
- Hands-on: Running MapReduce Examples
- Hadoop 相關專案簡介
- Introduction to Hadoop Ecosystem
part-4
part-5
實作四
實作五
實作六
不同語言的 MapReduce 實作
06/06 ---- 端午節,順延一週授課
06/13 Hands-On - 大型網站架構與 HBase 分散式資料庫
- Large Scale Website and HBase distributed datastore
- Pig 簡介
- Introduction to Pig
part-5
part-6
實作七
實作八
1. bio4j - a bioinformatics graph based DB
2. sqoop
3. Hive - Hadoop 的資料倉儲(Datawarehouse)
06/20 Hands-On - Bioinformatics Apps using Hadoop
- Hands-on:
part-7 實作九
實作十
1. CloudBurst
2. Crossbow
3. Contrail

公用環境 Public Cluster

作業一 Homework 1

  • 題目:請嘗試將 實作五 的 WordCount2.java 改成逆向索引(Reverse Index) ReverseIndex.java。使 ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。以實作五最後的執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false),
  • Please try to modified WordCount2.java downloaded from Lab5. Rename it to ReverseIndex.java. Let ReverseIndex output as "Keyword <TAB> filename(separated by comma)". Try to run it by ignoring "\." and "\," pattern and case-insensitive.
  • 參考步驟:
    Here is the reference steps:
    $ wget http://hadoop.nchc.org.tw/WordCount2.java -O ReverseIndex.java
    $ vi ReverseIndex.java #### DO YOUR MODIFICATION - 修改對應的程式碼
    $ mkdir -p MyJava3
    $ javac -classpath hadoop-core.jar -d MyJava3 ReverseIndex.java
    $ jar -cvf reverseindex.jar -C MyJava3 .
    $ hadoop jar reverseindex.jar ReverseIndex -Dwordcount.case.sensitive=false lab6_input lab6_out4 -skip pattern.txt
    $ hadoop fs -cat lab6_out4/part-00000
    
  • 參考結果應該為:(路徑不限)
    The reference result should be as following:(no limitation for the format of "path")
    and     input2
    cloud   input1,input2
    course  input1,input2,input2
    enjoy   input2
    i       input1,input2
    like    input1,input2
    nctu    input1,input2
    this    input2
    we      input2
    
  • 繳交期限:2011年6月13日(一) 上午 11:59
  • Due date: 11:59 AM, Monday, June 13th, Year 2011
  • 繳交方式:將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。
  • Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw
  • 提示:
    Hint:
    • 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, IntWritable) 改成 (Text, Text)
    • Replace (Key,Value) pair from (Text, IntWritable) to (Text, Text)
  • 加分題:(Extra)
    • 試將出現次數統計加入結果,亦即參考結果如下:
      Try to add count of each file in the result, i.e. The reference result should be as following:
      and     input2(1)
      cloud   input1(1),input2(1)
      course  input1(1),input2(2)
      enjoy   input2(1)
      i       input1(1),input2(1)
      like    input1(1),input2(1)
      nctu    input1(1),input2(1)
      this    input2(1)
      we      input2(1)
      
  • 配分比例:
    • 標準題原始碼 Source Code:60%
    • 報告 Report :20%
      • 參考內容入下:Reference Items should be shown in your report
      • 封面 Cover : 姓名、學號 ( Your Name and ID )
      • 於 hadoop.nchc.org.tw 執行的擷圖(Screenshot of your program running on hadoop.nchc.org.tw)
      • 執行結果 The result of your program
    • 加分題:20%

學員背景

生物 C PerlR Java
qulqul O X O O O
@ne_ X O O O O
vincentt X O X X O
Rodney_ X O O O X
sunny X X X X O
wally O X O X X
clair X X O X X
Jason2 O O O O X
Angela O X O X X
chenf X O X X X
Yen-Kuang O X O X O
Eric X O O O X
Tony-Chang O X O O O
lcyang ? O X X X
Microarray ? O O O X
O 6 8 11 7 6
X 9 7 4 8 9

補充:資料整合(Data Integration)與資料倉儲(Datawarehouse)

Last modified 14 years ago Last modified on Jun 20, 2011, 1:41:46 PM

Attachments (8)