Context Navigation

0128

Timestamp:: Jan 29, 2010, 2:06:11 PM (15 years ago)
Author:: waue
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

waue/2010/0128

-                      v1
+                      v2
 HBase is primarily a sorted distributed hash map, but it does support secondary keys through a contrib package called Transactional HBase. The secondary keys are provided by a component called TableIndexed.
+[http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200908.mbox/%3C73d592f60908180339j3231f7f5w688260109ab3463@mail.gmail.com%3E]
+{{{
+#!text
+> If you're using the default key generator
+> (org.apache.hadoop.hbase.client.tableindexed.SimpleIndexKeyGenerator),
+> it actually appends the base table row key for you.  So even though
+> the column value may be the same for multiple rows, the secondary
+> index table will still have 1 row for each row with the value in the
+> original table.  Here is relevant method from SimpleIndexKeyGenerator:
+>
+>  public byte[] createIndexKey(byte[] rowKey, Map<byte[], byte[]> columns) {
+>    return Bytes.add(columns.get(column), rowKey);
+>  }
+>
+> So, say you have a table "mytable", with the columns:
+>    info:keycol       (say this is the one you want to index)
+>    info:col2
+>    info:col3
+>
+> If you define your table with the index specification -- new
+> IndexSpecification("keycol", Bytes.toBytes("info:keycol")) -- then
+> HBase will create the secondary index table named "mytable-by_keycol".
+>
+> Then, say you add the following rows to "mytable":
+>
+> "row1":  info:keycol="one", info:col2="abc", info:col3="def"
+> "row2":  info:keycol="one", info:col2="ghi", info:col3="jkl"
+>
+> At this point, your index table ("mytable-by_keycol") will have the
+> following rows:
+>
+> "onerow1": info:keycol="one", __INDEX__:ROW="row1"
+> "onerow2": info:keycol="one", __INDEX__:ROW="row2"
+>
+> So you wind up with 2 rows in the index table (with unique row keys)
+> pointing back at the original table rows, even though we've only
+> stored a single distinct value for info:keycol.
+>
+> To access the rows by the secondary index to create a scanner using
+> IndexedTable.getIndexedScanner(...).  I don't think there's support
+> for using the indexes when performing a random read with
+> HTable.getRow()/HTable.get().  (But maybe I'm wrong?)
+}}}
+Hadoop HBase數據導入和索引測試
+[http://bbs.telewiki.cn/blog/qianling/entry/hadoop_hbase%E6%95%B0%E6%8D%AE%E5%AF%BC%E5%85%A5%E5%92%8C%E7%B4%A2%E5%BC%95%E6%B5%8B%E8%AF%95]