what is secondary index (distinguished to primary index)
indexing with another field which is not primary key.
內容與原本table 相似的另一張table,但key 不同,利於排列內容
如: table1 (primary)
name price description 1 apple 10 xxx 2 orig 5 ooo 3 banana 15 vvvv 4 tomato 8 uuuu
使用者想依售價做排序,則可以建立 secondary index table
price name description 5 orig ooo 8 tomato uuuu 10 apple xxx 15 banana vvvv
Explain what is primary and secondary index
When you activate an object say ODS / DSO, the system automatically generate an index based on the key fields and this is primary index.
In addition if you wish to create more indexes , then they are called secondary indexes.
The primary index is distinguished from the secondary indexes of a table. The primary index contains the key fields of the table and a pointer to the non-key fields of the table. The primary index is created automatically when the table is created in the database.
You can also create further indexes on a table. These are called secondary indexes. This is necessary if the table is frequently accessed in a way that does not take advantage of the sorting of the primary index for the access. Different indexes on the same table are distinguished with a three-place index identifier.
secondary index on hbase
http://kdpeterson.net/blog/2009/09/using-hbase-tableindexed-from-thrift-with-unique-keys.html
HBase is primarily a sorted distributed hash map, but it does support secondary keys through a contrib package called Transactional HBase. The secondary keys are provided by a component called TableIndexed?.
hbase's secondary index detail
> If you're using the default key generator > (org.apache.hadoop.hbase.client.tableindexed.SimpleIndexKeyGenerator), > it actually appends the base table row key for you. So even though > the column value may be the same for multiple rows, the secondary > index table will still have 1 row for each row with the value in the > original table. Here is relevant method from SimpleIndexKeyGenerator: > > public byte[] createIndexKey(byte[] rowKey, Map<byte[], byte[]> columns) { > return Bytes.add(columns.get(column), rowKey); > } > > So, say you have a table "mytable", with the columns: > info:keycol (say this is the one you want to index) > info:col2 > info:col3 > > If you define your table with the index specification -- new > IndexSpecification("keycol", Bytes.toBytes("info:keycol")) -- then > HBase will create the secondary index table named "mytable-by_keycol". > > Then, say you add the following rows to "mytable": > > "row1": info:keycol="one", info:col2="abc", info:col3="def" > "row2": info:keycol="one", info:col2="ghi", info:col3="jkl" > > At this point, your index table ("mytable-by_keycol") will have the > following rows: > > "onerow1": info:keycol="one", __INDEX__:ROW="row1" > "onerow2": info:keycol="one", __INDEX__:ROW="row2" > > So you wind up with 2 rows in the index table (with unique row keys) > pointing back at the original table rows, even though we've only > stored a single distinct value for info:keycol. > > To access the rows by the secondary index to create a scanner using > IndexedTable.getIndexedScanner(...). I don't think there's support > for using the indexes when performing a random read with > HTable.getRow()/HTable.get(). (But maybe I'm wrong?)
Secondary indexes in HBase 程式碼說明 (v 0.19 )
conf/hbase-site.xml
$HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following property to it. <property> <name>hbase.regionserver.class</name> <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value> </property> <property> <name>hbase.regionserver.impl</name> <value> org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer </value> </property>
Adding secondary index while creating table
HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); HTableDescriptor desc = new HTableDescriptor("test_table"); desc.addFamily(new HColumnDescriptor("columnfamily1:")); desc.addFamily(new HColumnDescriptor("columnfamily2:")); desc.addIndex(new IndexSpecification("column1", Bytes.toBytes("columnfamily1:column1"))); desc.addIndex(new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.createTable(desc);
Adding index in an existing table
HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.addIndex(Bytes.toBytes("test_table"), new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2")));
Deleting existing index from a table
HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.removeIndex(Bytes.toBytes("test_table"), "column2");
read from a secondary index, get a scanner for the index and scan through the data
HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); // You need to specify which columns to get Scanner scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, null, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2") }); for (RowResult rowResult : scanner) { String value1 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()); String value2 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column2")).getValue()); System.out.println(value1 + ", " + value2); } table.close();
get a scanner to a subset of the rows specify a column filter
ColumnValueFilter filter = new ColumnValueFilter(Bytes.toBytes("columnfamily1:column1"), CompareOp.LESS, Bytes.toBytes("value1-10")); scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, filter, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2")); for (RowResult rowResult : scanner) { String value1 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()); String value2 = new String( rowResult.get(Bytes.toBytes("columnfamily1:column2")).getValue()); System.out.println(value1 + ", " + value2); }
Secondary indexes in HBase 完整程式碼
http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html
import java.io.IOException; import java.util.Date; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Scanner; import org.apache.hadoop.hbase.client.tableindexed.IndexSpecification; import org.apache.hadoop.hbase.client.tableindexed.IndexedTable; import org.apache.hadoop.hbase.client.tableindexed.IndexedTableAdmin; import org.apache.hadoop.hbase.filter.ColumnValueFilter; import org.apache.hadoop.hbase.filter.ColumnValueFilter.CompareOp; import org.apache.hadoop.hbase.io.BatchUpdate; import org.apache.hadoop.hbase.io.RowResult; import org.apache.hadoop.hbase.util.Bytes; public class SecondaryIndexTest { public void writeToTable() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); String row = "test_row"; BatchUpdate update = null; for (int i = 0; i < 100; i++) { update = new BatchUpdate(row + i); update.put("columnfamily1:column1", Bytes.toBytes("value1-" + i)); update.put("columnfamily1:column2", Bytes.toBytes("value2-" + i)); table.commit(update); } table.close(); } public void readAllRowsFromSecondaryIndex() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); Scanner scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, null, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2") }); for (RowResult rowResult : scanner) { System.out.println(Bytes.toString( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()) + ", " + Bytes.toString(rowResult.get( Bytes.toBytes("columnfamily1:column2")).getValue() )); } table.close(); } public void readFilteredRowsFromSecondaryIndex() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table")); ColumnValueFilter filter = new ColumnValueFilter(Bytes.toBytes("columnfamily1:column1"), CompareOp.LESS, Bytes.toBytes("value1-40")); Scanner scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW, null, filter, new byte[][] { Bytes.toBytes("columnfamily1:column1"), Bytes.toBytes("columnfamily1:column2") }); for (RowResult rowResult : scanner) { System.out.println(Bytes.toString( rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue()) + ", " + Bytes.toString(rowResult.get( Bytes.toBytes("columnfamily1:column2")).getValue() )); } table.close(); } public void createTableWithSecondaryIndexes() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); HTableDescriptor desc = new HTableDescriptor("test_table"); desc.addFamily(new HColumnDescriptor("columnfamily1:column1")); desc.addFamily(new HColumnDescriptor("columnfamily1:column2")); desc.addIndex(new IndexSpecification("column1", Bytes.toBytes("columnfamily1:column1"))); desc.addIndex(new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); if (admin.tableExists(Bytes.toBytes("test_table"))) { if (admin.isTableEnabled("test_table")) { admin.disableTable(Bytes.toBytes("test_table")); } admin.deleteTable(Bytes.toBytes("test_table")); } if (admin.tableExists(Bytes.toBytes("test_table-column1"))) { if (admin.isTableEnabled("test_table-column1")) { admin.disableTable(Bytes.toBytes("test_table-column1")); } admin.deleteTable(Bytes.toBytes("test_table-column1")); } admin.createTable(desc); } public void addSecondaryIndexToExistingTable() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.addIndex(Bytes.toBytes("test_table"), new IndexSpecification("column2", Bytes.toBytes("columnfamily1:column2"))); } public void removeSecondaryIndexToExistingTable() throws IOException { HBaseConfiguration conf = new HBaseConfiguration(); conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml")); IndexedTableAdmin admin = null; admin = new IndexedTableAdmin(conf); admin.removeIndex(Bytes.toBytes("test_table"), "column2"); } public static void main(String[] args) throws IOException { SecondaryIndexTest test = new SecondaryIndexTest(); test.createTableWithSecondaryIndexes(); test.writeToTable(); test.addSecondaryIndexToExistingTable(); test.removeSecondaryIndexToExistingTable(); test.readAllRowsFromSecondaryIndex(); test.readFilteredRowsFromSecondaryIndex(); System.out.println("Done!"); } }