wiki:waue/2010/0128

Version 4 (modified by waue, 11 years ago) (diff)

--

HBase Primary Table v.s. Indexed Table
Indexed Table = Secondary Index = Transactional HBase

what is secondary index (distinguished to primary index)

Explain what is primary and secondary index.

When you activate an object say ODS / DSO, the system automatically generate an index based on the key fields and this is primary index.

In addition if you wish to create more indexes , then they are called secondary indexes.

The primary index is distinguished from the secondary indexes of a table. The primary index contains the key fields of the table and a pointer to the non-key fields of the table. The primary index is created automatically when the table is created in the database.

You can also create further indexes on a table. These are called secondary indexes. This is necessary if the table is frequently accessed in a way that does not take advantage of the sorting of the primary index for the access. Different indexes on the same table are distinguished with a three-place index identifier.

secondary index on hbase

http://kdpeterson.net/blog/2009/09/using-hbase-tableindexed-from-thrift-with-unique-keys.html

HBase is primarily a sorted distributed hash map, but it does support secondary keys through a contrib package called Transactional HBase. The secondary keys are provided by a component called TableIndexed?.

hbase's secondary index detail

http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200908.mbox/%3C73d592f60908180339j3231f7f5w688260109ab3463@mail.gmail.com%3E

> If you're using the default key generator
> (org.apache.hadoop.hbase.client.tableindexed.SimpleIndexKeyGenerator),
> it actually appends the base table row key for you.  So even though
> the column value may be the same for multiple rows, the secondary
> index table will still have 1 row for each row with the value in the
> original table.  Here is relevant method from SimpleIndexKeyGenerator:
>
>  public byte[] createIndexKey(byte[] rowKey, Map<byte[], byte[]> columns) {
>    return Bytes.add(columns.get(column), rowKey);
>  }
>
> So, say you have a table "mytable", with the columns:
>    info:keycol       (say this is the one you want to index)
>    info:col2
>    info:col3
>
> If you define your table with the index specification -- new
> IndexSpecification("keycol", Bytes.toBytes("info:keycol")) -- then
> HBase will create the secondary index table named "mytable-by_keycol".
>
> Then, say you add the following rows to "mytable":
>
> "row1":  info:keycol="one", info:col2="abc", info:col3="def"
> "row2":  info:keycol="one", info:col2="ghi", info:col3="jkl"
>
> At this point, your index table ("mytable-by_keycol") will have the
> following rows:
>
> "onerow1": info:keycol="one", __INDEX__:ROW="row1"
> "onerow2": info:keycol="one", __INDEX__:ROW="row2"
>
> So you wind up with 2 rows in the index table (with unique row keys)
> pointing back at the original table rows, even though we've only
> stored a single distinct value for info:keycol.
>
> To access the rows by the secondary index to create a scanner using
> IndexedTable.getIndexedScanner(...).  I don't think there's support
> for using the indexes when performing a random read with
> HTable.getRow()/HTable.get().  (But maybe I'm wrong?)

Secondary indexes in HBase 程式碼說明 (v 0.19 )

conf/hbase-site.xml

$HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following property to it.

    <property>
        <name>hbase.regionserver.class</name>
        <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value>
    </property>

    <property>
        <name>hbase.regionserver.impl</name>
        <value>
        org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
        </value>
    </property>

Adding secondary index while creating table

    HBaseConfiguration conf = new HBaseConfiguration();
    conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

    HTableDescriptor desc = new HTableDescriptor("test_table");

    desc.addFamily(new HColumnDescriptor("columnfamily1:"));
    desc.addFamily(new HColumnDescriptor("columnfamily2:"));

    desc.addIndex(new IndexSpecification("column1",
        Bytes.toBytes("columnfamily1:column1")));
    desc.addIndex(new IndexSpecification("column2",
        Bytes.toBytes("columnfamily1:column2")));

    IndexedTableAdmin admin = null;
    admin = new IndexedTableAdmin(conf);

    admin.createTable(desc);

Adding index in an existing table

    HBaseConfiguration conf = new HBaseConfiguration();
    conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

    IndexedTableAdmin admin = null;
    admin = new IndexedTableAdmin(conf);

    admin.addIndex(Bytes.toBytes("test_table"), new IndexSpecification("column2",
    Bytes.toBytes("columnfamily1:column2")));

Deleting existing index from a table

   HBaseConfiguration conf = new HBaseConfiguration();
    conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

    IndexedTableAdmin admin = null;
    admin = new IndexedTableAdmin(conf);

    admin.removeIndex(Bytes.toBytes("test_table"), "column2");

read from a secondary index, get a scanner for the index and scan through the data

   HBaseConfiguration conf = new HBaseConfiguration();
    conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

    IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table"));

    // You need to specify which columns to get
    Scanner scanner = table.getIndexedScanner("column1",
        HConstants.EMPTY_START_ROW, null, null, new byte[][] {
        Bytes.toBytes("columnfamily1:column1"),
        Bytes.toBytes("columnfamily1:column2") });

    for (RowResult rowResult : scanner) {
        String value1 = new String(
            rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue());
        String value2 = new String(
            rowResult.get(Bytes.toBytes("columnfamily1:column2")).getValue());
        System.out.println(value1 + ", " + value2);
    }

    table.close();

get a scanner to a subset of the rows specify a column filter

    ColumnValueFilter filter =
        new ColumnValueFilter(Bytes.toBytes("columnfamily1:column1"),
        CompareOp.LESS, Bytes.toBytes("value1-10"));

    scanner = table.getIndexedScanner("column1", HConstants.EMPTY_START_ROW,
        null, filter, new byte[][] { Bytes.toBytes("columnfamily1:column1"),
        Bytes.toBytes("columnfamily1:column2"));

    for (RowResult rowResult : scanner) {
        String value1 = new String(
            rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue());
        String value2 = new String(
            rowResult.get(Bytes.toBytes("columnfamily1:column2")).getValue());
        System.out.println(value1 + ", " + value2);
    }

Secondary indexes in HBase 完整程式碼

http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html

import java.io.IOException;
import java.util.Date;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.client.tableindexed.IndexSpecification;
import org.apache.hadoop.hbase.client.tableindexed.IndexedTable;
import org.apache.hadoop.hbase.client.tableindexed.IndexedTableAdmin;
import org.apache.hadoop.hbase.filter.ColumnValueFilter;
import org.apache.hadoop.hbase.filter.ColumnValueFilter.CompareOp;
import org.apache.hadoop.hbase.io.BatchUpdate;
import org.apache.hadoop.hbase.io.RowResult;
import org.apache.hadoop.hbase.util.Bytes;

public class SecondaryIndexTest {
    public void writeToTable() throws IOException {
        HBaseConfiguration conf = new HBaseConfiguration();
        conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

        IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table"));

        String row = "test_row";
        BatchUpdate update = null;

        for (int i = 0; i < 100; i++) {
            update = new BatchUpdate(row + i);
            update.put("columnfamily1:column1", Bytes.toBytes("value1-" + i));
            update.put("columnfamily1:column2", Bytes.toBytes("value2-" + i));
            table.commit(update);
        }

        table.close();
    }

    public void readAllRowsFromSecondaryIndex() throws IOException {
        HBaseConfiguration conf = new HBaseConfiguration();
        conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

        IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table"));

        Scanner scanner = table.getIndexedScanner("column1",
            HConstants.EMPTY_START_ROW, null, null, new byte[][] {
            Bytes.toBytes("columnfamily1:column1"),
                Bytes.toBytes("columnfamily1:column2") });

        for (RowResult rowResult : scanner) {
            System.out.println(Bytes.toString(
                rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue())
                + ", " + Bytes.toString(rowResult.get(
                Bytes.toBytes("columnfamily1:column2")).getValue()
                ));
        }

        table.close();
    }

    public void readFilteredRowsFromSecondaryIndex() throws IOException {
        HBaseConfiguration conf = new HBaseConfiguration();
        conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

        IndexedTable table = new IndexedTable(conf, Bytes.toBytes("test_table"));

        ColumnValueFilter filter = 
            new ColumnValueFilter(Bytes.toBytes("columnfamily1:column1"), 
            CompareOp.LESS, Bytes.toBytes("value1-40"));

        Scanner scanner = table.getIndexedScanner("column1", 
            HConstants.EMPTY_START_ROW, null, filter, 
            new byte[][] { Bytes.toBytes("columnfamily1:column1"),
                Bytes.toBytes("columnfamily1:column2")
            });

        for (RowResult rowResult : scanner) {
            System.out.println(Bytes.toString(
                rowResult.get(Bytes.toBytes("columnfamily1:column1")).getValue())
                + ", " + Bytes.toString(rowResult.get(
                Bytes.toBytes("columnfamily1:column2")).getValue()
                ));
        }

        table.close();
    }

    public void createTableWithSecondaryIndexes() throws IOException {
        HBaseConfiguration conf = new HBaseConfiguration();
        conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

        HTableDescriptor desc = new HTableDescriptor("test_table");

        desc.addFamily(new HColumnDescriptor("columnfamily1:column1"));
        desc.addFamily(new HColumnDescriptor("columnfamily1:column2"));

        desc.addIndex(new IndexSpecification("column1",
            Bytes.toBytes("columnfamily1:column1")));
        desc.addIndex(new IndexSpecification("column2",
            Bytes.toBytes("columnfamily1:column2")));

        IndexedTableAdmin admin = null;
        admin = new IndexedTableAdmin(conf);

        if (admin.tableExists(Bytes.toBytes("test_table"))) {
            if (admin.isTableEnabled("test_table")) {
                admin.disableTable(Bytes.toBytes("test_table"));
            }

            admin.deleteTable(Bytes.toBytes("test_table"));
        }

        if (admin.tableExists(Bytes.toBytes("test_table-column1"))) {
            if (admin.isTableEnabled("test_table-column1")) {
                admin.disableTable(Bytes.toBytes("test_table-column1"));
            }

            admin.deleteTable(Bytes.toBytes("test_table-column1"));
        }

        admin.createTable(desc);
    }

    public void addSecondaryIndexToExistingTable() throws IOException {
        HBaseConfiguration conf = new HBaseConfiguration();
        conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

        IndexedTableAdmin admin = null;
        admin = new IndexedTableAdmin(conf);

        admin.addIndex(Bytes.toBytes("test_table"), 
            new IndexSpecification("column2", 
            Bytes.toBytes("columnfamily1:column2")));
    }

    public void removeSecondaryIndexToExistingTable() throws IOException {
        HBaseConfiguration conf = new HBaseConfiguration();
        conf.addResource(new Path("/opt/hbase-0.19.3/conf/hbase-site.xml"));

        IndexedTableAdmin admin = null;
        admin = new IndexedTableAdmin(conf);

        admin.removeIndex(Bytes.toBytes("test_table"), "column2");
    }

    public static void main(String[] args) throws IOException {
        SecondaryIndexTest test = new SecondaryIndexTest();

        test.createTableWithSecondaryIndexes();
        test.writeToTable();
        test.addSecondaryIndexToExistingTable();
        test.removeSecondaryIndexToExistingTable();
        test.readAllRowsFromSecondaryIndex();
        test.readFilteredRowsFromSecondaryIndex();

        System.out.println("Done!");
    }
}

Hadoop HBase數據導入和索引測試

http://bbs.telewiki.cn/blog/qianling/entry/hadoop_hbase%E6%95%B0%E6%8D%AE%E5%AF%BC%E5%85%A5%E5%92%8C%E7%B4%A2%E5%BC%95%E6%B5%8B%E8%AF%95