= 2009-04-06 = * 演講準備 - 蒐集 HPC, Cloud Computing, Map/Reduce 與生物資訊相關運用參考資料 * [https://sourceforge.net/projects/octaveoncloud/ Octave as Cloud Service] - Octave 是一套類似 Matlab 的自由軟體,其實這個專案的 idea 挺不錯的,教學上如果多採用 Octave 的話,就不會造成學校或實驗室在 Matlab 授權/非法侵權的問題。當然我覺得如果熟悉網頁設計跟 VNC 這個服務應該不難做出來才對。 * [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=609&categoryID=88 Elasticfox] - Firefox Extension for Amazon EC2 * [http://www.debian.org/devel/wnpp/ Debian Work-Needing and Prospective Packages] - 提報 Debian 打包需求的做法 == Clustering / Distributed Computing / Bioinformatics == * [https://sourceforge.net/search/index.php?words=trove%3A(252)+AND+trove%3A(141)&type_of_search=soft&pmode=0&filtermanip=1&gfload=&gfnew=&inex=1&sortselect=has_file%3Atrue®istration_date__0=&trove__225=456&trove__274=369&trove__160=584&trove__199=426&trove__13=14&trove__1=534&trove__6=7&trove__496=499&newfilter=Apply 同時歸類在 Bioinformatics 與 Clustering 又有檔案可以下載的 SourceForge 專案] * [http://lucene.apache.org/java/2_3_1/queryparsersyntax.html SourceForge 搜尋關鍵字的語法是用 Lucene] * [http://hubris.ucsd.edu/dapper/ Dapper] - The Distributed and Parallel Program Execution Runtime * 在 !SourceForge 的 Clustering Topic 軟體中,Ranking 算蠻高的。 * [http://proactive.inria.fr/ ProActive] - Open Source middleware (OW2 consortium) for parallel, distributed, multi-core computing. - 法國的一個專案,目的似乎在簡化建立平行/分散/多核心叢集的工作 * [http://gridder.sourceforge.net/ Gridder] - 基於 Globus 的 Web Service Portlet ... 跟 Grid Portal 很像 :p * [http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/ glideinWMS] - The glidein based WMS == Open Source : Bioinformatics == * [http://libseq.sourceforge.net/ C++ Bio Sequence Library] - C++ 版本的序列分析函式庫 * [http://biospice.sourceforge.net/ Bio-SPICE] - Biological Simulation Program for Intra- and Inter-Cellular Evaluation * 在電子電機領域,SPICE 是非常重要的一套模擬軟體。Bio-SPICE 應該是一套希望在生物方面佔有一席之地的模擬軟體。 * [http://bioera.net/ BioEra] is DSP visual designer that can be used to create interfaces between human being and a machine with using bio signals like EEG, QEEG, HEG, EMG, ECG, GSR, EOG, visual/sound entrainment and others. - BioEra 跟腦科學應該有蠻強的關聯性,主要功能是在做訊號處理。 * [https://sourceforge.net/projects/ncbiviewer/ NCBI Viewer] - [http://ncbiviewer.bravehost.com/ 新的官方網站] - NCBI * 從 [http://www.bioinformatics.org/ Bioinformatics Organization] 又找到好幾個生物資訊的自由軟體專案 * [http://www.open-bio.org/ Open Bioinformatics Foundation] * [http://www.bioperl.org/ BioPerl] * [http://biopython.org/ Biopython] * [http://biophp.org/ BioPHP] - PHP for Bioinformatics * [http://www.bioruby.org/ BioRuby] - Open source bioinformatics library for Ruby * [http://biojava.org/ BioJava] == Hadoop / !MapReduce == * 關於 !MapReduce: * [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2297&categoryID=265 Introduction to Amazon Elastic MapReduce] * [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2294&categoryID=265 Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming] - 用 Hadoop 做相似度分析,可應用在生物資訊領域。 * [http://hadoop.apache.org/core/docs/current/cn/mapred_tutorial.html Hadoop Map/Reduce 教程] * [http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-99.html Improving MapReduce Performance in Heterogeneous Environments] - Map/Reduce 在異質環境下的效能改良研究 * [http://labs.google.com/papers/mapreduce.html MapReduce: Simplified Data Processing on Large Clusters] - Google 的 Map/Reduce 論文 * [http://wiki.apache.org/hadoop/AmazonEC2 Hadoop Wiki 上關於 Amazon EC2 的使用說明] * [限制] Hadoop 0.18.3 不支援 Stream 下的數值排序 - [https://issues.apache.org/jira/browse/HADOOP-2302 Streaming should provide an option for numerical sort of keys] == Cloud Computing and Science == * 去年在 [wiki:jazz/09-01-10 eScience 2008] 看的一些演講錄影,現在論文也已經可以在 [http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4736722 IEEE Xplore] 上找到了。 * 這兩篇是講述將 !MapReduce 運用在生物資訊領域的實例。 * [http://www.cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz.pdf BlastReduce: High Performance Short Read Mapping with MapReduce] * [http://www.ece.rutgers.edu/~parashar/Classes/08-09/ece572/readings/cloudblast-escience-08.pdf CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications] * 跟虛擬化、生物影像、生物資訊、Mircoarray有關: * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736753&isnumber=4736722 BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment] * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736757&isnumber=4736722 An Extensible, Scalable Architecture for Managing Bioinformatics Data and Analyses] * [http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4736722&arnumber=4736800&count=180&index=70 Imagery Data Mining: The IDEC Experiment] * [http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4736722&arnumber=4736741&count=180&index=11 Lowering the Barriers to Cancer Imaging] * 有幾篇則跟 !MapReduce 有關,如: * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736760&isnumber=4736722 MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms] * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736768&isnumber=4736722 MapReduce for Data Intensive Scientific Analyses] * [http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4736722&arnumber=4736879&count=180&index=149 MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS] - 拿 Map/Reduce 來處理 GIS 的 Workflow * 跟虛擬化有關的 * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736771&isnumber=4736722 Contextualization: Providing One-Click Virtual Clusters] * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736759&isnumber=4736722 Characterizing User-level Network Virtualization: Performance, Overheads and Limits] * 跟生物統計 / R 相關 * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736775&isnumber=4736722 Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud-Ready Computational Open Platform] * 有講 Web 2.0 與生物資訊的關係 * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736827&isnumber=4736722 BioMashups: the new world of exploratory bioinformatics?] * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736842&isnumber=4736722 Bio-Sense: A System for Supporting Sharing and Exploration in Bioinformatics Using SemanticWeb Services] * 其他有趣的主題 * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736849&isnumber=4736722 An Educator's Perspective on Cyberinfrastructure] - 站在教育者的立場來看 Cyberinfrastructure,他們是如何在教室裡運用這些資源呢?? * [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4736747&isnumber=4736722 Scalable Semantics – the Silver Lining of Cloud Computing] - 純講 Cloud Computing 的商業未來 * [http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4736722&arnumber=4736810&count=180&index=80 Mobile e-Lab: A Mobile Personalized Virtual Research Computing Environment] - Mobile Computing 相關 * '''[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4736722&arnumber=4736854&count=180&index=124 Why Good Software Sometimes Dies – And How to Save It]''' - 這一篇我覺得最有趣,為什麼好的軟體有時候會死掉呢?!在商業競爭的環境下,很多軟體叫好但是不叫座,往往重視行銷的'次級品'反而大行其道,佔據非常高比例的市場佔有率。 * [http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4736722&arnumber=4736855&count=180&index=125 Towards Making BOINC and EGEE Interoperable] - 講述如何整合 BOINC (Desktop Grid) 與 EGEE (Service Grid) 的做法 * 此外,在 !SourceForge 上也有一些應用專案: * [http://apps.sourceforge.net/mediawiki/cloudburst-bio/ CloudBurst] - [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2272&categoryID=263 Amazon 上的 CloudBurst 專案介紹] * 當然我最關切的是"[http://blog.wired.com/wiredscience/2008/12/massive-amounts.html Amazon 提供公共資料庫]"的舉動對整個學術生態所造成的影響。此舉跟國網中心提供科學資料庫的定位十分相似,雖然 Amazon 在台灣區推廣有本土化方面的阻撓,但是如果真的有心朝國際化發展的話,台灣的學生應該要多學著使用這些服務,去做更大型的運算才對。[http://news.ycombinator.com/item?id=543069 這邊的討論]提供了許多參考連結。