Version 11 (modified by jazz, 14 years ago) (diff) |
---|
2010-11-24
Hadoop & GPFS
- IBM Builds on Hadoop with New Storage Architecture - IBM 讓 GPFS 可以跟 HDFS 合作,提供更高可用度的儲存服務。稱之為『General Parallel File System-Shared Nothing Cluster (GPFS-SNC)』
- IBM發表新儲存架構 GPFS-SNC
Public Large Data Set 公用大型資料集
- 維基百科 - http://en.wikipedia.org/wiki/Wikipedia:Database_download
- 亞馬遜提供的公用資料 - http://aws.amazon.com/publicdatasets/
- 包括基因資料(Genome, Ex. 1000 Genome Project)
- http://www.statmt.org/europarl/
- http://www.opendatacenteralliance.org/
- Data.gov - 美國公部門的資料 - 台灣應該可以上 檔案管理局 (2010-11-08)
- http://ckan.org/
- http://stat-computing.org/dataexpo/2009/ - 飛航紀錄 - 1987-2008 years of airlines performance data. 1GB, 12M records.
- http://kdd.ics.uci.edu/ - UCI Knowledge Discovery in Databases Archive - The data sets are categorized neatly. These are very useful for many machine-learning exercises with the use of Hadoop/Mahout?.
- A list of about 70 data sets compiled for Open Data Day: http://www.opendataday.org/wiki/Data
- A list of 70 datasets from technology review: https://www.technologyreview.com/blog/arxiv/26097/
- Stanford large network database collection: http://snap.stanford.edu/data/index.html
- Digging into data dataset list: http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx
- Many Eyes from IBM: http://www-958.ibm.com/software/data/cognos/manyeyes/
- Data360: http://www.data360.org/index.aspx
- UN Data Explorer: http://data.un.org/Explorer.aspx
- OECD Data Sets: http://stats.oecd.org/index.aspx
- Freebase data dumps: http://wiki.freebase.com/wiki/Data_dumps
- http://infochimps.org - They have over a billion tweets.