= 2011-05-12 = == 資安 == * http://www.virustotal.com/ - 把檔案放上去做各種掃毒軟體的掃描結果總和 == NTU CMLab == * 關於台大 CMLab 的幾個有趣的論文: * [http://www.cmlab.csie.ntu.edu.tw/roi/ A Collaborative Benchmark for Region of Interest Detection Algorithms] * [[Image(ROI.png)]] * [http://www.cmlab.csie.ntu.edu.tw/TilingSlideshow/ Tiling Slideshow] - Audiovisual Slideshow: Present Your Journey by Photos - 根據音樂節奏、加入相片的特徵分群(依時間、相似度、橫向直向等)、運算重點感興趣的區域、自動套入樣板。蠻有趣的一個研究,對於要把旅行相片或者是婚紗相片變成影片,真是個不錯的工具。可以看[http://www.cmlab.csie.ntu.edu.tw/TilingSlideshow/slideshow.wmv 展示影片]比較有感覺。 == Hadoop HDFSProxy == * http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfsproxy.html * [[Image(http://hadoop.apache.org/hdfs/docs/r0.21.0/images/hdfsproxy-server.jpg)]] == Hadoop == * [http://www.mellanox.com/content/pages.php?pg=web_2_0 The Mellanox Solution for Web 2.0 and Big Data] - mellanox 用硬體去加速 Hadoop 與 Memcached {{{ #!html Mellanox Hadoop-Direct accelerates Hadoop networks and improves the scaling of Hadoop clusters executing data analytics intensive applications. A novel data moving protocol, which uses RDMA in combination with an efficient merge-sort algorithm, enables Hadoop clusters based on Mellanox InfiniBand and 10GbE with RoCE (RDMA over Converged Ethernet) adapter cards to efficiently move data between servers, accelerating the Hadoop framework. }}} * [http://www.netapp.com/us/products/storage-systems/e5400/e5400.html NetApp e5400] - !NetApp 針對 Big Data 應用(Ex. Hadoop)強化 IOPS * [http://www.snaplogic.com/solutions/bigdata/ SnapLogic 的 SnapReduce] - 這間公司目標想把 Hadoop 變成更簡單,設計了圖形化介面來作 Map/Reduce 工作的規劃。 * [http://www.youtube.com/watch?v=J9fSPwHT8o8 SnapReduce 的展示影片] * [[Image(SnapReduce.png)]] * HStack - http://hstack.org/ - 由 Adobe 的人弄的自由軟體專案 * 整合 Hadoop + HBase + !ZooKeeper + Puppet * 程式碼 - https://github.com/hstack/puppet * [http://hstack.org/hbase-performance-testing HBase Performance Testing] * [[Image(http://hstack.org/wp-content/uploads/2010/04/gets_300_threads.png)]] * [[Image(http://hstack.org/wp-content/uploads/2010/04/gets_zoomed_300_threads.png)]] == Virtualization == * http://www.linux-kvm.org/page/Management_Tools - KVM 的管理介面 * http://libvirt.org/apps.html - Applications using libvirt - 各種用 libvirt 做的應用,包含管理介面 == libvirt & opennebula == * [[Image(http://opennebula.org/_media/documentation:rel1.4:arch.jpg)]] * [http://opennebula.org/documentation:archives:rel1.4:libvirtapi 從 OpenNebula 的 Libvirt API 1.4 文件] 可以看到兩者的關聯性。 == Data Science == * [http://www.information-management.com/blogs/data_science_integration_statistics_databases-10020194-1.html Data Science – Part 1] * [http://www.information-management.com/blogs/data_science_BI_analytics_big_data_visualizations-10020259-1.html Data Science – Part 2] * [http://radar.oreilly.com/2010/06/what-is-data-science.html What is data science?] - Analysis: The future belongs to the companies and people that turn data into products. * [http://tdwi.org/Articles/2011/01/05/Rise-of-Data-Science.aspx?p=1 The Rise of Data Science] == R packages & RHIPE & !BioConductor == * [http://cran.r-project.org/web/packages/bigmemory/ bigmemory]: Manage massive matrices with shared memory and memory-mapped files * 先前跑 microarray 的時候,容易遇到記憶體上限,這是一個解法。 * http://www.bigmemory.org/ * <官網> [http://www.stat.purdue.edu/~sguha/rhipe/ RHIPE - R and Hadoop Integrated Processing Environment] * [http://www.sdtimes.com/link/34792 RHIPE combines Hadoop and the R analytics language] * [http://bioconductor.org/help/course-materials/2010/BioC2010/ BioC 2010] 有 !BioConductor 跟 R 的教學,特別是 Efficient R Programming 的部份。 * <書> [http://oreilly.com/catalog/9780596804787 Data Mashups in R] * <書> [http://flowingdata.com/book/ Visualize This: The FlowingData Guide to Design, Visualization, and Statistics] == reCAPTCHA & phpBB2 == * http://code.google.com/intl/zh-TW/apis/recaptcha/docs/phpbb.html