{{{ #!html
雲端運算技術與生物資訊應用
Cloud Computing Technologies and its Bioinformatics Applications
}}} [[PageOutline]] = 課程資訊 Course Info. = * 上課時間: 2011/05/09 (一) ~ 2011/06/06 (一) 15:30 ~ 17:20 * Date and Time: 15:30 to 17:20, from 9 May 2011 to 6 June 2011, every Monday * 上課地點: 陽明大學 圖資大樓 電腦教室 R401 * Location: R401 PC Room, Library and Information Building, Natioanl Yang-Ming University * 系所公告:http://bmi.ym.edu.tw/wp/?p=4532 = 線上討論 Web IRC chatroom = * http://webchat.freenode.net?channels=ymu110509&uio=d4 = 課程大綱 Course Outline = || 時段[[BR]]Date || 分類[[BR]]Section || 課程內容 Topics || 投影片[[BR]]Slides || 實作[[BR]]Hands-On || 補充資料[[BR]]Notes || || 05/09 || Introduction || - 高速運算於生物資訊之應用 [[BR]] - HPC for Bioinformatics [[BR]] - PC Cluster 101 [[BR]] - 平行運算程式的種類 [[BR]] - Parallel Programming Model || [raw-attachment:wiki:YMU110509:part-1.pdf part-1][[BR]][raw-attachment:wiki:YMU110509:part-2.pdf part-2] || [wiki:YMU110509/Lab1 實作一] || [http://www.zdnet.com.tw/white_board/intel/video-1.htm Intel 談多核心的重要性][[BR]][http://www.zdnet.com.tw/white_board/intel/video-2.htm 關於 Amdahl’s Law] || || 05/16 || Introduction || - Cloud Computing Architecture [[BR]] - Introduction to Hadoop [[BR]] - Hadoop Distributed File System [[BR]] - Hands-on: HDFS commands || [raw-attachment:wiki:YMU110509:part-3.pdf part-3][[BR]][raw-attachment:wiki:YMU110509:part-4.pdf part-4] || [wiki:YMU110509/Lab2 實作二][[BR]][wiki:YMU110509/Lab3 實作三] || [wiki:Hadoop4Win Hadoop 單機安裝(for Windows XP)] || || 05/23 || ---- || 因校園連外網路品質不穩,順延一週授課 || || || || 05/30 || Hands-On || - MapReduce Algorithm [[BR]] - Hands-on: Running MapReduce Examples [[BR]] - Hadoop 相關專案簡介[[BR]] - Introduction to Hadoop Ecosystem || [raw-attachment:wiki:YMU110509:part-4.pdf part-4][[BR]][raw-attachment:wiki:YMU110509:part-5.pdf part-5] || [wiki:YMU110509/Lab4 實作四][[BR]][wiki:YMU110509/Lab5 實作五][[BR]][wiki:YMU110509/Lab6 實作六] || [grid:wiki:jazz/09-04-14#MapReduce 不同語言的 MapReduce 實作] || || 06/06 || ---- || 端午節,順延一週授課 || || || || 06/13 || Hands-On || - 大型網站架構與 HBase 分散式資料庫 [[BR]] - Large Scale Website and HBase distributed datastore [[BR]] - Pig 簡介 [[BR]] - Introduction to Pig || [raw-attachment:wiki:YMU110509:part-5.pdf part-5][[BR]][raw-attachment:wiki:YMU110509:part-6.pdf part-6] || [wiki:YMU110509/Lab7 實作七][[BR]][wiki:YMU110509/Lab8 實作八] || [wiki:YMU110509/velvet 用 hadoop streaming 跑 velvet] || || 06/20 || Hands-On || - Bioinformatics Apps using Hadoop [[BR]] - Hands-on: || || || || = 公用環境 Public Cluster = * http://hadoop.nchc.org.tw - 實驗叢集入口網站 * http://hadoop.nchc.org.tw/ganglia - 實驗叢集負載狀態 * http://hadoop.nchc.org.tw:50030 - 實驗叢集正在執行與執行完畢的任務 * http://hadoop.nchc.org.tw:50070 - 實驗叢集的硬碟空間狀態 * http://hadoop.nchc.org.tw/hadoop-doc - Hadoop 相關說明文件 * http://hadoop.nchc.org.tw/hadoop-doc/api/index.html - Hadoop 0.20.2 javadoc 文件 * http://forum.hadoop.tw - 台灣 Hadoop 使用者討論區 = 作業一 Homework 1 = * 題目:請嘗試將 [wiki:YMU110509/Lab5 實作五] 的 WordCount2.java 改成逆向索引(Reverse Index) !ReverseIndex.java。使 !ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。以實作五最後的執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false), * Please try to modified WordCount2.java downloaded from [wiki:YMU110509/Lab5 Lab5]. Rename it to !ReverseIndex.java. Let !ReverseIndex output as "Keyword filename(separated by comma)". Try to run it by ignoring "\." and "\," pattern and case-insensitive. * 參考步驟:[[BR]]Here is the reference steps: {{{ #!sh $ wget http://hadoop.nchc.org.tw/WordCount2.java -O ReverseIndex.java $ vi ReverseIndex.java #### DO YOUR MODIFICATION - 修改對應的程式碼 $ mkdir -p MyJava3 $ javac -classpath hadoop-core.jar -d MyJava3 ReverseIndex.java $ jar -cvf reverseindex.jar -C MyJava3 . $ hadoop jar reverseindex.jar ReverseIndex -Dwordcount.case.sensitive=false lab6_input lab6_out4 -skip pattern.txt $ hadoop fs -cat lab6_out4/part-00000 }}} * 參考結果應該為:(路徑不限)[[BR]]The reference result should be as following:(no limitation for the format of "path") {{{ and input2 cloud input1,input2 course input1,input2,input2 enjoy input2 i input1,input2 like input1,input2 nctu input1,input2 this input2 we input2 }}} * 繳交期限:2011年6月13日(一) 上午 11:59 * Due date: 11:59 AM, Monday, June 13th, Year 2011 * 繳交方式:將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。 * Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw * 提示:[[BR]]Hint: * 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, !IntWritable) 改成 (Text, Text) * Replace (Key,Value) pair from (Text, !IntWritable) to (Text, Text) * 加分題:(Extra) * 試將出現次數統計加入結果,亦即參考結果如下:[[BR]]Try to add count of each file in the result, i.e. The reference result should be as following: {{{ and input2(1) cloud input1(1),input2(1) course input1(1),input2(2) enjoy input2(1) i input1(1),input2(1) like input1(1),input2(1) nctu input1(1),input2(1) this input2(1) we input2(1) }}} * 配分比例: * 標準題原始碼 Source Code:60% * 報告 Report :20% * 參考內容入下:Reference Items should be shown in your report * 封面 Cover : 姓名、學號 ( Your Name and ID ) * 於 hadoop.nchc.org.tw 執行的擷圖(Screenshot of your program running on hadoop.nchc.org.tw) * 執行結果 The result of your program * 加分題:20% = 學員背景 = || || 生物|| C || Perl||R|| Java|| || qulqul || O || X || O ||O|| O|| || @ne_ || X || O || O ||O|| O|| || vincentt || X || O || X ||X|| O|| || Rodney_ || X || O || O ||O|| X|| || sunny || X || X || X ||X|| O|| || wally__ || O || X || O ||X|| X|| || clair || X || X || O ||X|| X|| || Jason2 || O || O || O ||O|| X|| || Angela || O || X || O ||X|| X|| || chenf || X || O || X ||X|| X|| || Yen-Kuang || O || X || O ||X|| O|| || Eric || X || O || O ||O|| X|| || Tony-Chang || O || X || O ||O|| O|| || lcyang || ? || O || X ||X|| X|| || Microarray || ? || O || O ||O|| X|| || || || || || || || || O || 6 || 8 || 11 || 7 || 6 || || X || 9 || 7 || 4 || 8 || 9 || = 補充:資料整合(Data Integration)與資料倉儲(Datawarehouse) = * [http://en.wikipedia.org/wiki/Data_integration 維基百科:Data Integration] * [[Image(http://upload.wikimedia.org/wikipedia/en/d/db/Datawarehouse.png,width=480)]] * [[Image(http://upload.wikimedia.org/wikipedia/en/0/0c/Dataintegration.png,width=480)]]