= 2011-05-12 =
== 資安 ==
* http://www.virustotal.com/ - 把檔案放上去做各種掃毒軟體的掃描結果總和
== NTU CMLab ==
* 關於台大 CMLab 的幾個有趣的論文:
* [http://www.cmlab.csie.ntu.edu.tw/roi/ A Collaborative Benchmark for Region of Interest Detection Algorithms]
* [[Image(jazz/11-05-12:ROI.png)]]
* [http://www.cmlab.csie.ntu.edu.tw/TilingSlideshow/ Tiling Slideshow] - Audiovisual Slideshow: Present Your Journey by Photos - 根據音樂節奏、加入相片的特徵分群(依時間、相似度、橫向直向等)、運算重點感興趣的區域、自動套入樣板。蠻有趣的一個研究,對於要把旅行相片或者是婚紗相片變成影片,真是個不錯的工具。可以看[http://www.cmlab.csie.ntu.edu.tw/TilingSlideshow/slideshow.wmv 展示影片]比較有感覺。
== Hadoop HDFSProxy ==
* http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfsproxy.html
* [[Image(http://hadoop.apache.org/hdfs/docs/r0.21.0/images/hdfsproxy-server.jpg)]]
== Hadoop ==
* [http://www.mellanox.com/content/pages.php?pg=web_2_0 The Mellanox Solution for Web 2.0 and Big Data] - mellanox 用硬體去加速 Hadoop 與 Memcached
{{{
#!html
Mellanox Hadoop-Direct accelerates Hadoop networks and improves
the scaling of Hadoop clusters executing data analytics intensive
applications. A novel data moving protocol, which uses RDMA in
combination with an efficient merge-sort algorithm, enables Hadoop
clusters based on Mellanox InfiniBand and 10GbE with RoCE (RDMA
over Converged Ethernet) adapter cards to efficiently move data
between servers, accelerating the Hadoop framework.
}}}
* [http://www.netapp.com/us/products/storage-systems/e5400/e5400.html NetApp e5400] - !NetApp 針對 Big Data 應用(Ex. Hadoop)強化 IOPS
* [http://www.snaplogic.com/solutions/bigdata/ SnapLogic 的 SnapReduce] - 這間公司目標想把 Hadoop 變成更簡單,設計了圖形化介面來作 Map/Reduce 工作的規劃。
* [http://www.youtube.com/watch?v=J9fSPwHT8o8 SnapReduce 的展示影片]
* [[Image(jazz/11-05-12:SnapReduce.png)]]
* HStack - http://hstack.org/ - 由 Adobe 的人弄的自由軟體專案
* 整合 Hadoop + HBase + !ZooKeeper + Puppet
* 程式碼 - https://github.com/hstack/puppet
* [http://hstack.org/hbase-performance-testing HBase Performance Testing]
* [[Image(http://hstack.org/wp-content/uploads/2010/04/gets_300_threads.png)]]
* [[Image(http://hstack.org/wp-content/uploads/2010/04/gets_zoomed_300_threads.png)]]
== Virtualization ==
* http://www.linux-kvm.org/page/Management_Tools - KVM 的管理介面
* http://libvirt.org/apps.html - Applications using libvirt - 各種用 libvirt 做的應用,包含管理介面
== libvirt & opennebula ==
* [[Image(http://opennebula.org/_media/documentation:rel1.4:arch.jpg)]]
* [http://opennebula.org/documentation:archives:rel1.4:libvirtapi 從 OpenNebula 的 Libvirt API 1.4 文件] 可以看到兩者的關聯性。
== Data Science ==
* ([wiki:jazz/11-04-06 2011-04-06])
* [http://www.information-management.com/blogs/data_science_integration_statistics_databases-10020194-1.html Data Science – Part 1]
* [http://www.information-management.com/blogs/data_science_BI_analytics_big_data_visualizations-10020259-1.html Data Science – Part 2]
* [http://radar.oreilly.com/2010/06/what-is-data-science.html What is data science?] - Analysis: The future belongs to the companies and people that turn data into products.
* [http://tdwi.org/Articles/2011/01/05/Rise-of-Data-Science.aspx?p=1 The Rise of Data Science]
== R packages & RHIPE & !BioConductor ==
* [http://cran.r-project.org/web/packages/bigmemory/ bigmemory]: Manage massive matrices with shared memory and memory-mapped files
* 先前跑 microarray 的時候,容易遇到記憶體上限,這是一個解法。
* http://www.bigmemory.org/
* <官網> [http://www.stat.purdue.edu/~sguha/rhipe/ RHIPE - R and Hadoop Integrated Processing Environment]
* [http://www.sdtimes.com/link/34792 RHIPE combines Hadoop and the R analytics language]
* [http://bioconductor.org/help/course-materials/2010/BioC2010/ BioC 2010] 有 !BioConductor 跟 R 的教學,特別是 Efficient R Programming 的部份。
* <書> [http://oreilly.com/catalog/9780596804787 Data Mashups in R]
* <書> [http://flowingdata.com/book/ Visualize This: The FlowingData Guide to Design, Visualization, and Statistics]
== reCAPTCHA & phpBB2 ==
* http://code.google.com/intl/zh-TW/apis/recaptcha/docs/phpbb.html