{{{ #!html
HBase Primary Table v.s. Indexed Table
Indexed Table = Secondary Index = Transactional HBase
}}} [[PageOutline]] = what is secondary index (distinguished to primary index) = Explain what is primary and secondary index. When you activate an object say ODS / DSO, the system automatically generate an index based on the key fields and this is primary index. In addition if you wish to create more indexes , then they are called secondary indexes. The primary index is distinguished from the secondary indexes of a table. The primary index contains the key fields of the table and a pointer to the non-key fields of the table. The primary index is created automatically when the table is created in the database. You can also create further indexes on a table. These are called secondary indexes. This is necessary if the table is frequently accessed in a way that does not take advantage of the sorting of the primary index for the access. Different indexes on the same table are distinguished with a three-place index identifier. = secondary index on hbase = [http://kdpeterson.net/blog/2009/09/using-hbase-tableindexed-from-thrift-with-unique-keys.html] HBase is primarily a sorted distributed hash map, but it does support secondary keys through a contrib package called Transactional HBase. The secondary keys are provided by a component called TableIndexed. = hbase's secondary index detail = [http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200908.mbox/%3C73d592f60908180339j3231f7f5w688260109ab3463@mail.gmail.com%3E] {{{ #!text > If you're using the default key generator > (org.apache.hadoop.hbase.client.tableindexed.SimpleIndexKeyGenerator), > it actually appends the base table row key for you. So even though > the column value may be the same for multiple rows, the secondary > index table will still have 1 row for each row with the value in the > original table. Here is relevant method from SimpleIndexKeyGenerator: > > public byte[] createIndexKey(byte[] rowKey, Map columns) { > return Bytes.add(columns.get(column), rowKey); > } > > So, say you have a table "mytable", with the columns: > info:keycol (say this is the one you want to index) > info:col2 > info:col3 > > If you define your table with the index specification -- new > IndexSpecification("keycol", Bytes.toBytes("info:keycol")) -- then > HBase will create the secondary index table named "mytable-by_keycol". > > Then, say you add the following rows to "mytable": > > "row1": info:keycol="one", info:col2="abc", info:col3="def" > "row2": info:keycol="one", info:col2="ghi", info:col3="jkl" > > At this point, your index table ("mytable-by_keycol") will have the > following rows: > > "onerow1": info:keycol="one", __INDEX__:ROW="row1" > "onerow2": info:keycol="one", __INDEX__:ROW="row2" > > So you wind up with 2 rows in the index table (with unique row keys) > pointing back at the original table rows, even though we've only > stored a single distinct value for info:keycol. > > To access the rows by the secondary index to create a scanner using > IndexedTable.getIndexedScanner(...). I don't think there's support > for using the indexes when performing a random read with > HTable.getRow()/HTable.get(). (But maybe I'm wrong?) }}} = Secondary indexes in HBase 程式碼說明 (v 0.19 ) = == conf/hbase-site.xml == {{{ #!text $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following property to it. hbase.regionserver.class org.apache.hadoop.hbase.ipc.IndexedRegionInterface hbase.regionserver.impl org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer }}} == Adding secondary index while creating table == {{{ #!java HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); HTableDescriptor desc = new HTableDescriptor("test_table"); desc.addFamily(new HColumnDescriptor("columnfamily1:")); desc.addFamily(new HColumnDescriptor("columnfamily2:")); desc.addIndex(new IndexSpecification("column1", Bytes.toBytes("columnfamily1:column1"))); desc.addIndex(new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.createTable(desc); }}} == Adding index in an existing table == {{{ #!java HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.addIndex(Bytes.toBytes("test_table"), new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); }}} == Deleting existing index from a table == {{{ #!java HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.removeIndex(Bytes.toBytes("test_table"), "column2"); }}} == read from a secondary index, get a scanner for the index and scan through the data == {{{ #!java HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); // You need to specify which columns to get Scanner scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, null, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2") }); for (RowResult rowResult : scanner) { String value1 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()); String value2 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column2")).getValue()); System.out.println(value1 + ", " + value2); } table.close(); }}} == get a scanner to a subset of the rows specify a column filter == {{{ #!java ColumnValueFilter filter = new ColumnValueFilter(Bytes.toBytes("columnfamily1:column1"), CompareOp.LESS, Bytes.toBytes("value1-10")); scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, filter, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2")); for (RowResult rowResult : scanner) { String value1 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()); String value2 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column2")).getValue()); System.out.println(value1 + ", " + value2); } }}} == Secondary indexes in HBase 完整程式碼 == [http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html] {{{ #!java import java.io.IOException; import java.util.Date; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Scanner; import org.apache.hadoop.hbase.client.tableindexed.IndexSpecification; import org.apache.hadoop.hbase.client.tableindexed.IndexedTable; import org.apache.hadoop.hbase.client.tableindexed.IndexedTableAdmin; import org.apache.hadoop.hbase.filter.ColumnValueFilter; import org.apache.hadoop.hbase.filter.ColumnValueFilter.CompareOp; import org.apache.hadoop.hbase.io.BatchUpdate; import org.apache.hadoop.hbase.io.RowResult; import org.apache.hadoop.hbase.util.Bytes; public class SecondaryIndexTest { public void writeToTable() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); String row = "test_row"; BatchUpdate update = null; for (int i = 0; i < 100; i++) { update = new BatchUpdate(row + i); update.put("columnfamily1:column1", Bytes.toBytes("value1-" + i)); update.put("columnfamily1:column2", Bytes.toBytes("value2-" + i)); table.commit(update); } table.close(); } public void readAllRowsFromSecondaryIndex() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); Scanner scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, null, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2") }); for (RowResult rowResult : scanner) { System.out.println(Bytes.toString( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()) + ", " + Bytes.toString(rowResult.get( Bytes.toBytes("columnfamily1:column2")).getValue() )); } table.close(); } public void readFilteredRowsFromSecondaryIndex() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); ColumnValueFilter filter = new ColumnValueFilter(Bytes.toBytes("columnfamily1:column1"), CompareOp.LESS, Bytes.toBytes("value1-40")); Scanner scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, filter, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2") }); for (RowResult rowResult : scanner) { System.out.println(Bytes.toString( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()) + ", " + Bytes.toString(rowResult.get( Bytes.toBytes("columnfamily1:column2")).getValue() )); } table.close(); } public void createTableWithSecondaryIndexes() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); HTableDescriptor desc = new HTableDescriptor("test_table"); desc.addFamily(new HColumnDescriptor("columnfamily1:column1")); desc.addFamily(new HColumnDescriptor("columnfamily1:column2")); desc.addIndex(new IndexSpecification("column1", Bytes.toBytes("columnfamily1:column1"))); desc.addIndex(new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); if (admin.tableExists(Bytes.toBytes("test_table"))) { if (admin.isTableEnabled("test_table")) { admin.disableTable(Bytes.toBytes("test_table")); } admin.deleteTable(Bytes.toBytes("test_table")); } if (admin.tableExists(Bytes.toBytes("test_table-column1"))) { if (admin.isTableEnabled("test_table-column1")) { admin.disableTable(Bytes.toBytes("test_table-column1")); } admin.deleteTable(Bytes.toBytes("test_table-column1")); } admin.createTable(desc); } public void addSecondaryIndexToExistingTable() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.addIndex(Bytes.toBytes("test_table"), new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); } public void removeSecondaryIndexToExistingTable() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.removeIndex(Bytes.toBytes("test_table"), "column2"); } public static void main(String[] args) throws IOException { SecondaryIndexTest test = new SecondaryIndexTest(); test.createTableWithSecondaryIndexes(); test.writeToTable(); test.addSecondaryIndexToExistingTable(); test.removeSecondaryIndexToExistingTable(); test.readAllRowsFromSecondaryIndex(); test.readFilteredRowsFromSecondaryIndex(); System.out.println("Done!"); } } }}} = Hadoop HBase數據導入和索引測試 = [http://bbs.telewiki.cn/blog/qianling/entry/hadoop_hbase%E6%95%B0%E6%8D%AE%E5%AF%BC%E5%85%A5%E5%92%8C%E7%B4%A2%E5%BC%95%E6%B5%8B%E8%AF%95]