Changes between Version 1 and Version 2 of waue/2010/0128


Ignore:
Timestamp:
Jan 29, 2010, 2:06:11 PM (14 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue/2010/0128

    v1 v2  
    1313HBase is primarily a sorted distributed hash map, but it does support secondary keys through a contrib package called Transactional HBase. The secondary keys are provided by a component called TableIndexed.
    1414
     15[http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200908.mbox/%3C73d592f60908180339j3231f7f5w688260109ab3463@mail.gmail.com%3E]
     16
     17{{{
     18#!text
     19> If you're using the default key generator
     20> (org.apache.hadoop.hbase.client.tableindexed.SimpleIndexKeyGenerator),
     21> it actually appends the base table row key for you.  So even though
     22> the column value may be the same for multiple rows, the secondary
     23> index table will still have 1 row for each row with the value in the
     24> original table.  Here is relevant method from SimpleIndexKeyGenerator:
     25>
     26>  public byte[] createIndexKey(byte[] rowKey, Map<byte[], byte[]> columns) {
     27>    return Bytes.add(columns.get(column), rowKey);
     28>  }
     29>
     30> So, say you have a table "mytable", with the columns:
     31>    info:keycol       (say this is the one you want to index)
     32>    info:col2
     33>    info:col3
     34>
     35> If you define your table with the index specification -- new
     36> IndexSpecification("keycol", Bytes.toBytes("info:keycol")) -- then
     37> HBase will create the secondary index table named "mytable-by_keycol".
     38>
     39> Then, say you add the following rows to "mytable":
     40>
     41> "row1":  info:keycol="one", info:col2="abc", info:col3="def"
     42> "row2":  info:keycol="one", info:col2="ghi", info:col3="jkl"
     43>
     44> At this point, your index table ("mytable-by_keycol") will have the
     45> following rows:
     46>
     47> "onerow1": info:keycol="one", __INDEX__:ROW="row1"
     48> "onerow2": info:keycol="one", __INDEX__:ROW="row2"
     49>
     50> So you wind up with 2 rows in the index table (with unique row keys)
     51> pointing back at the original table rows, even though we've only
     52> stored a single distinct value for info:keycol.
     53>
     54> To access the rows by the secondary index to create a scanner using
     55> IndexedTable.getIndexedScanner(...).  I don't think there's support
     56> for using the indexes when performing a random read with
     57> HTable.getRow()/HTable.get().  (But maybe I'm wrong?)
     58}}}
     59
     60Hadoop HBase數據導入和索引測試
     61[http://bbs.telewiki.cn/blog/qianling/entry/hadoop_hbase%E6%95%B0%E6%8D%AE%E5%AF%BC%E5%85%A5%E5%92%8C%E7%B4%A2%E5%BC%95%E6%B5%8B%E8%AF%95]
    1562
    1663