| 1 | ([wiki:jazz/10-11-24 2010-11-24]) |
| 2 | == Public Large Data Set 公用大型資料集 == |
| 3 | |
| 4 | * 維基百科 - http://en.wikipedia.org/wiki/Wikipedia:Database_download |
| 5 | * 亞馬遜提供的公用資料 - http://aws.amazon.com/publicdatasets/ |
| 6 | * 包括基因資料(Genome, Ex. 1000 Genome Project) |
| 7 | * http://www.statmt.org/europarl/ |
| 8 | * http://www.opendatacenteralliance.org/ |
| 9 | * [http://www.data.gov/raw/92 Data.gov] - 美國公部門的資料 - 台灣應該可以上 [http://www.archives.gov.tw/ 檔案管理局] ([wiki:jazz/10-11-08 2010-11-08]) |
| 10 | * http://ckan.org/ |
| 11 | * http://stat-computing.org/dataexpo/2009/ - 飛航紀錄 - 1987-2008 years of airlines performance data. 1GB, 12M records. |
| 12 | * http://kdd.ics.uci.edu/ - UCI Knowledge Discovery in Databases Archive - The data sets are categorized neatly. These are very useful for many machine-learning exercises with the use of Hadoop / Mahout. |
| 13 | * A list of about 70 data sets compiled for Open Data Day: http://www.opendataday.org/wiki/Data |
| 14 | * A list of 70 datasets from technology review: https://www.technologyreview.com/blog/arxiv/26097/ |
| 15 | * Stanford large network database collection: http://snap.stanford.edu/data/index.html |
| 16 | * Digging into data dataset list: http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx |
| 17 | * Many Eyes from IBM: http://www-958.ibm.com/software/data/cognos/manyeyes/ |
| 18 | * Data360: http://www.data360.org/index.aspx |
| 19 | * UN Data Explorer: http://data.un.org/Explorer.aspx |
| 20 | * OECD Data Sets: http://stats.oecd.org/index.aspx |
| 21 | * Freebase data dumps: http://wiki.freebase.com/wiki/Data_dumps |
| 22 | * http://infochimps.org - They have over a billion tweets. |
| 23 | * http://snap.stanford.edu/data/ - 史丹佛大學的一些資料集 |
| 24 | * http://data.stackexchange.com/ |
| 25 | * http://ogdisdk.cloudapp.net/ - Open Government Data Initiative (OGDI) |
| 26 | * http://data.worldbank.org/ - 世界銀行的一些資料集 |
| 27 | * [http://creativecommons.org.tw/blog/archives/000237.html 公部門資訊的自由取得與再次使用] |
| 28 | * http://opengovernmentdata.org/ |
| 29 | * http://www.opengovdata.org/ |
| 30 | * http://data.gov.uk/ |
| 31 | * http://OpenFlights.org - a crowd-sourced database of airlines, flights, airports etc. |
| 32 | * ([wiki:jazz/11-09-13 2011-09-13]) |
| 33 | * [http://www.jtv.com.tw/AppStar/ App Star 高手爭霸戰] |
| 34 | * [http://www.opendata.tw/open-data/tpe-od-platform/ 臺北市政府 vs Data.Taipei] |
| 35 | * http://taipeiogdi.cloudapp.net/DataCatalog/DataSetList - 台北市公開政府資料平臺-開發人員專區資料服務(OGDI) |
| 36 | * cloudapp.net 這個網域是 Microsoft Azure 平台在用的~看樣子台北市政府的開放平台是改自 http://ogdisdk.cloudapp.net/ |
| 37 | * (2012-01-17) [http://www.findbestopensource.com/article-detail/free-large-data-corpus 10 sites to get the large data set or data corpus for free] |