Public Large Data Set 公用大型資料集
- 維基百科 - http://en.wikipedia.org/wiki/Wikipedia:Database_download
- 亞馬遜提供的公用資料 - http://aws.amazon.com/publicdatasets/
- 包括基因資料(Genome, Ex. 1000 Genome Project)
- http://www.statmt.org/europarl/
- http://www.opendatacenteralliance.org/
- Data.gov - 美國公部門的資料 - 台灣應該可以上 檔案管理局 (2010-11-08)
- http://ckan.org/
- http://stat-computing.org/dataexpo/2009/ - 飛航紀錄 - 1987-2008 years of airlines performance data. 1GB, 12M records.
- http://kdd.ics.uci.edu/ - UCI Knowledge Discovery in Databases Archive - The data sets are categorized neatly. These are very useful for many machine-learning exercises with the use of Hadoop / Mahout.
- A list of about 70 data sets compiled for Open Data Day: http://www.opendataday.org/wiki/Data
- A list of 70 datasets from technology review: https://www.technologyreview.com/blog/arxiv/26097/
- Stanford large network database collection: http://snap.stanford.edu/data/index.html
- Digging into data dataset list: http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx
- Many Eyes from IBM: http://www-958.ibm.com/software/data/cognos/manyeyes/
- Data360: http://www.data360.org/index.aspx
- UN Data Explorer: http://data.un.org/Explorer.aspx
- OECD Data Sets: http://stats.oecd.org/index.aspx
- Freebase data dumps: http://wiki.freebase.com/wiki/Data_dumps
- http://infochimps.org - They have over a billion tweets.
- http://snap.stanford.edu/data/ - 史丹佛大學的一些資料集
- http://data.stackexchange.com/
- http://ogdisdk.cloudapp.net/ - Open Government Data Initiative (OGDI)
- http://data.worldbank.org/ - 世界銀行的一些資料集
- 公部門資訊的自由取得與再次使用
- http://opengovernmentdata.org/
- http://www.opengovdata.org/
- http://data.gov.uk/
- http://OpenFlights.org - a crowd-sourced database of airlines, flights, airports etc.
- (2011-09-13)
- App Star 高手爭霸戰
- 臺北市政府 vs Data.Taipei
- http://taipeiogdi.cloudapp.net/DataCatalog/DataSetList - 台北市公開政府資料平臺-開發人員專區資料服務(OGDI)
- cloudapp.net 這個網域是 Microsoft Azure 平台在用的~看樣子台北市政府的開放平台是改自 http://ogdisdk.cloudapp.net/
- (2012-01-17) 10 sites to get the large data set or data corpus for free
- (2013-08-21) Datasets for Data Mining, Analytics and Knowledge Discovery
- http://data.g0v.tw/
- (2013-09-25) Datasets released by Google
- (2014-01-19)
- http://www.opendata500.com
- http://www.bigdata-startups.com/public-data
- http://www.quandl.com
- http://www.kdnuggets.com/2011/02/free-public-datasets.html
- http://www.pewresearch.org/data/download-datasets
- http://www.rdatamining.com/resources/data
- http://freegisdata.rtwilson.com - Free GIS Dataset
- http://datahub.io
- http://datamarket.azure.com/browse/data - Windows Azure Marketplace 上的資料集
- http://www.githubarchive.org/ - 如果想分析 github ,會是一個好的資料集 :) (2014-02-05)
- http://gdeltproject.org/ - Global Database of Events, Language, and Tone (2014-06-03)
- Yahoo 釋出 Flickr 資料集 (2014-07-03)
- 必須登入 Yahoo 帳號,連到 Yahoo WebScope 網站申請
- I3 - Yahoo Flickr Creative Commons 100M(14G) (Hosted in AWS)
- 必須有美國學校的信箱(*@*.edu)才能申請 XD
- Some Datasets Available on the Web (2008~2009)
- 20 Big Data Repositories You Should Check Out (2014-09-24)
- https://archive.ics.uci.edu/ml/datasets.html ( 2015-07-20 )
- http://labrosa.ee.columbia.edu/millionsong/ ( 2015-07-20 )
- https://open.fda.gov/ ( 2015-07-20 )
- https://quickdraw.withgoogle.com/data - Google 公開的 Auto Draw 資料集 - 機器學習好威!隨手幾筆,Google「AutoDraw」就知道你要畫 (2018-03-22)
- https://www.kaggle.com/datasets - Kaggle 競賽使用的資料集 (2018-03-30)
- https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/ - T-Drive trajectory data sample (2011-08)
Last modified 2 years ago
Last modified on Sep 5, 2022, 1:24:01 PM