| 15 | [http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200908.mbox/%3C73d592f60908180339j3231f7f5w688260109ab3463@mail.gmail.com%3E] |
| 16 | |
| 17 | {{{ |
| 18 | #!text |
| 19 | > If you're using the default key generator |
| 20 | > (org.apache.hadoop.hbase.client.tableindexed.SimpleIndexKeyGenerator), |
| 21 | > it actually appends the base table row key for you. So even though |
| 22 | > the column value may be the same for multiple rows, the secondary |
| 23 | > index table will still have 1 row for each row with the value in the |
| 24 | > original table. Here is relevant method from SimpleIndexKeyGenerator: |
| 25 | > |
| 26 | > public byte[] createIndexKey(byte[] rowKey, Map<byte[], byte[]> columns) { |
| 27 | > return Bytes.add(columns.get(column), rowKey); |
| 28 | > } |
| 29 | > |
| 30 | > So, say you have a table "mytable", with the columns: |
| 31 | > info:keycol (say this is the one you want to index) |
| 32 | > info:col2 |
| 33 | > info:col3 |
| 34 | > |
| 35 | > If you define your table with the index specification -- new |
| 36 | > IndexSpecification("keycol", Bytes.toBytes("info:keycol")) -- then |
| 37 | > HBase will create the secondary index table named "mytable-by_keycol". |
| 38 | > |
| 39 | > Then, say you add the following rows to "mytable": |
| 40 | > |
| 41 | > "row1": info:keycol="one", info:col2="abc", info:col3="def" |
| 42 | > "row2": info:keycol="one", info:col2="ghi", info:col3="jkl" |
| 43 | > |
| 44 | > At this point, your index table ("mytable-by_keycol") will have the |
| 45 | > following rows: |
| 46 | > |
| 47 | > "onerow1": info:keycol="one", __INDEX__:ROW="row1" |
| 48 | > "onerow2": info:keycol="one", __INDEX__:ROW="row2" |
| 49 | > |
| 50 | > So you wind up with 2 rows in the index table (with unique row keys) |
| 51 | > pointing back at the original table rows, even though we've only |
| 52 | > stored a single distinct value for info:keycol. |
| 53 | > |
| 54 | > To access the rows by the secondary index to create a scanner using |
| 55 | > IndexedTable.getIndexedScanner(...). I don't think there's support |
| 56 | > for using the indexes when performing a random read with |
| 57 | > HTable.getRow()/HTable.get(). (But maybe I'm wrong?) |
| 58 | }}} |
| 59 | |
| 60 | Hadoop HBase數據導入和索引測試 |
| 61 | [http://bbs.telewiki.cn/blog/qianling/entry/hadoop_hbase%E6%95%B0%E6%8D%AE%E5%AF%BC%E5%85%A5%E5%92%8C%E7%B4%A2%E5%BC%95%E6%B5%8B%E8%AF%95] |