雲端運算技術與生物資訊應用
Cloud Computing Technologies and its Bioinformatics Applications
Cloud Computing Technologies and its Bioinformatics Applications
課程資訊 Course Info.
- 上課時間: 2011/05/09 (一) ~ 2011/06/06 (一) 15:30 ~ 17:20
- Date and Time: 15:30 to 17:20, from 9 May 2011 to 6 June 2011, every Monday
- 上課地點: 陽明大學 圖資大樓 電腦教室 R401
- Location: R401 PC Room, Library and Information Building, Natioanl Yang-Ming University
- 系所公告:http://bmi.ym.edu.tw/wp/?p=4532
線上討論 Web IRC chatroom
課程大綱 Course Outline
時段 Date | 分類 Section | 課程內容 Topics | 投影片 Slides | 實作 Hands-On | 補充資料 Notes |
05/09 | Introduction | - 高速運算於生物資訊之應用 - HPC for Bioinformatics - PC Cluster 101 - 平行運算程式的種類 - Parallel Programming Model | part-1 part-2 | 實作一 | Intel 談多核心的重要性 關於 Amdahl’s Law |
05/16 | Introduction | - Cloud Computing Architecture - Introduction to Hadoop - Hadoop Distributed File System - Hands-on: HDFS commands | part-3 part-4 | 實作二 實作三 | Hadoop 單機安裝(for Windows XP) |
05/23 | ---- | 因校園連外網路品質不穩,順延一週授課 | |||
05/30 | Hands-On | - MapReduce Algorithm - Hands-on: Running MapReduce Examples - Hadoop 相關專案簡介 - Introduction to Hadoop Ecosystem | part-4 part-5 | 實作四 實作五 實作六 | 不同語言的 MapReduce 實作 |
06/06 | ---- | 端午節,順延一週授課 | |||
06/13 | Hands-On | - 大型網站架構與 HBase 分散式資料庫 - Large Scale Website and HBase distributed datastore - Pig 簡介 - Introduction to Pig | part-5 part-6 | 實作七 實作八 | 1. bio4j - a bioinformatics graph based DB 2. sqoop 3. Hive - Hadoop 的資料倉儲(Datawarehouse) |
06/20 | Hands-On | - Bioinformatics Apps using Hadoop - Hands-on: | part-7 | 實作九 實作十 | 1. CloudBurst 2. Crossbow 3. Contrail |
公用環境 Public Cluster
- http://hadoop.nchc.org.tw - 實驗叢集入口網站
- http://hadoop.nchc.org.tw/ganglia - 實驗叢集負載狀態
- http://hadoop.nchc.org.tw:50030 - 實驗叢集正在執行與執行完畢的任務
- http://hadoop.nchc.org.tw:50070 - 實驗叢集的硬碟空間狀態
- http://hadoop.nchc.org.tw/hadoop-doc - Hadoop 相關說明文件
- http://hadoop.nchc.org.tw/hadoop-doc/api/index.html - Hadoop 0.20.2 javadoc 文件
- http://forum.hadoop.tw - 台灣 Hadoop 使用者討論區
作業一 Homework 1
- 題目:請嘗試將 實作五 的 WordCount2.java 改成逆向索引(Reverse Index) ReverseIndex.java。使 ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。以實作五最後的執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false),
- Please try to modified WordCount2.java downloaded from Lab5. Rename it to ReverseIndex.java. Let ReverseIndex output as "Keyword <TAB> filename(separated by comma)". Try to run it by ignoring "\." and "\," pattern and case-insensitive.
- 參考步驟:
Here is the reference steps:$ wget http://hadoop.nchc.org.tw/WordCount2.java -O ReverseIndex.java $ vi ReverseIndex.java #### DO YOUR MODIFICATION - 修改對應的程式碼 $ mkdir -p MyJava3 $ javac -classpath hadoop-core.jar -d MyJava3 ReverseIndex.java $ jar -cvf reverseindex.jar -C MyJava3 . $ hadoop jar reverseindex.jar ReverseIndex -Dwordcount.case.sensitive=false lab6_input lab6_out4 -skip pattern.txt $ hadoop fs -cat lab6_out4/part-00000
- 參考結果應該為:(路徑不限)
The reference result should be as following:(no limitation for the format of "path")and input2 cloud input1,input2 course input1,input2,input2 enjoy input2 i input1,input2 like input1,input2 nctu input1,input2 this input2 we input2
- 繳交期限:2011年6月13日(一) 上午 11:59
- Due date: 11:59 AM, Monday, June 13th, Year 2011
- 繳交方式:將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。
- Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw
- 提示:
Hint:- 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, IntWritable) 改成 (Text, Text)
- Replace (Key,Value) pair from (Text, IntWritable) to (Text, Text)
- 加分題:(Extra)
- 試將出現次數統計加入結果,亦即參考結果如下:
Try to add count of each file in the result, i.e. The reference result should be as following:and input2(1) cloud input1(1),input2(1) course input1(1),input2(2) enjoy input2(1) i input1(1),input2(1) like input1(1),input2(1) nctu input1(1),input2(1) this input2(1) we input2(1)
- 試將出現次數統計加入結果,亦即參考結果如下:
- 配分比例:
- 標準題原始碼 Source Code:60%
- 報告 Report :20%
- 參考內容入下:Reference Items should be shown in your report
- 封面 Cover : 姓名、學號 ( Your Name and ID )
- 於 hadoop.nchc.org.tw 執行的擷圖(Screenshot of your program running on hadoop.nchc.org.tw)
- 執行結果 The result of your program
- 加分題:20%
學員背景
生物 | C | Perl | R | Java | |
qulqul | O | X | O | O | O |
@ne_ | X | O | O | O | O |
vincentt | X | O | X | X | O |
Rodney_ | X | O | O | O | X |
sunny | X | X | X | X | O |
wally | O | X | O | X | X |
clair | X | X | O | X | X |
Jason2 | O | O | O | O | X |
Angela | O | X | O | X | X |
chenf | X | O | X | X | X |
Yen-Kuang | O | X | O | X | O |
Eric | X | O | O | O | X |
Tony-Chang | O | X | O | O | O |
lcyang | ? | O | X | X | X |
Microarray | ? | O | O | O | X |
O | 6 | 8 | 11 | 7 | 6 |
X | 9 | 7 | 4 | 8 | 9 |
補充:資料整合(Data Integration)與資料倉儲(Datawarehouse)
Last modified 14 years ago
Last modified on Jun 20, 2011, 1:41:46 PM
Attachments (8)
- part-1.pdf (1.4 MB) - added by jazz 14 years ago.
- part-2.pdf (186.5 KB) - added by jazz 14 years ago.
- part-3.pdf (2.5 MB) - added by jazz 14 years ago.
- part-4.pdf (2.4 MB) - added by jazz 14 years ago.
- part-5.pdf (1.3 MB) - added by jazz 14 years ago.
- part-6.pdf (1.2 MB) - added by jazz 14 years ago.
- 11-05-09_bio_cloud.log (3.8 KB) - added by jazz 14 years ago.
- part-7.pdf (2.4 MB) - added by jazz 14 years ago.