wiki:jazz/NoSQL

Version 39 (modified by jazz, 14 years ago) (diff)

--

NoSQL

Histroy

Definition

Primer

  • Is NoSQL only for BigData? (2010-11-01) - 這篇文章提到 NoSQL 使用的三個適當時機,至少要滿足其中兩個。
        * Velocity - The data needs to be processed in a timely manner
        * Volume - The amount of data is large (this fits the traditional Big Data use case)
        * Variety - The data varies and is not a good fit for a fixed schema
    

Papers and Websites

趨勢分析 Trend Observation

  • 當然縱使 MongoDB 與 CouchDB 已經有很高的排行榜,跟 SQLite 比起來,還是差很多。 從 Google Trends 的搜尋趨勢可以看到 SQLite > Google Gears > CouchDB 的趨勢,象徵著 distributed database 的影響力尚未普及。 (2009-09-23)

Comparison

Distributed Not Distributed (responsibility on client)
* Amazon Dynamo
* Amazon S3
* Scalaris
* Voldemort
* CouchDB (thru Lounge)
* Riak
* MongoDB (in alpha)
* BigTable
* Cassandra
* HyperTable
* HBase
* Redis
* Tokyo Tyrant
* MemcacheDB
* Amazon SimpleDB

Open Source Projects

  • gears-dblib - A simple abstraction on top of the Database object in Gears (2009-09-23)
  • orient - NoSQL document database light, portable and fast. Supports ACID Tx, Indexes, asynch queries, SQL layer, clustering, etc (2009-09-23)

NoSQL : HBase

  • 11:10–12:00 Upcoming improvements for HBase - Andrew Purtell (Trend Micro) (2010-04-24)
    • Big Data -> Medium Data 都需要
    • Cloud Computing - Scale Free
    • Disk Seek time remains nearly constant -> Index(B-Tree), Seek (RMDB) 慢!!
    • No distributed transactions, no complex locking, no waits or deadlocks
    • 不要用 Spreadsheet 的想法看待 HBase, 或許可以用 Tag 的想法去看待它。
    • HBase 跟 BigTable 都是 CP 架構(注重 Consistancy 與 Partition Tolerance,根據 CAP Theorem 因此無法確保 Avaibility,寧可服務中斷也要資料正確!!)
    • HDFS-200 (working append) 將在 HBase 0.20.5 加入支援資料持續遞增的功能。
    • ACID ? - atomicity, consistency, isolation, durability
    • 新功能:
      • 跨資料中心備份 - 透過 Log Ship HBASE-1295 : Multi data center replication
      • 安全性強化 - 支援 authentication, authorization,Yahoo! 寫了很多新的安全性支援,包括 Kerberos 認證、Data isolation at the HDFS layer、Secure RPC。因此必須新增角色來作存取控管(Access Control Role)
      • Coprocessor - 靈感來自於 BigTable 的新功能 Coprocessor,加入 RegionObservor (需要再花點時間看清楚用途!!)

NoSQL : Cassandra

  • Cassandra - a highly scalable, eventually consistent, distributed, structured key-value store. (2009-09-18)
    • Looking to the future with Cassandra
    • Cassandra was open sourced by Facebook in 2008
    • Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's BigTable.
  • 09:00–09:50 nosql cassandra - Gasol (Pixnet) (2010-04-24)
    • http://cassandra.apache.org/
    • 具備副本機制,優先存在記憶體中,後續寫入 commit log 中。採取完全平等的分散式架構,沒有 Hadoop NameNode 單點失效問題(Single Point of Failure)

Use Case

Installation

NoSQL : MongoDB

NoSQL : CouchDB

NoSQL : Voldemort

  • Voldemort - a distributed key-value storage system (2009-09-18)
    • used at LinkedIn for certain high-scalability storage problems

NoSQL : Redis

NoSQL : Neo4j

  • Graph database - 用來儲存關係的 Graph
  • http://neo4j.org/
  • Neo4j has been in commercial development for 10 years and in production for over 7 years. - 哇!還開發蠻久了耶~
  • 目前看到生物資訊的資料庫有用它 - http://bio4j.com/ - Bio4j is a bioinformatics graph based DB including most data available in UniProt? (SwissProt? + Trembl) plus Gene Ontology (GO) and UniRef? (50,90,100).

NoSQL : Trinity

  • Trinity - a graph database and computation platform over distributed memory cloud. 由微軟開發的 Graph Database。
  • 文中提到 Trinity 與 Neo4j, HyperGraphDB, InfiniteGraph, 以及 Google 的 Pregel(2010-10-20)

Use Case

Attachments (27)