{{{ #!html
Hive
}}} [[PageOutline]] = 簡介 = * Hive 是由facebook所捐贈給hadoop的項目 * 功能類似前一天測試的cloudbase * 更簡便的一點是,Hive 已經整合在 Hadoop 0.19.1的版本內(但 0.20.0 卻又不見了),幾乎不用特別座設定,並且也有提供自己的交互查詢模式,就不用特別在安裝其他東西了 * 感覺上操作起來有點像hbase !! * 缺點是:沒有help,不知道到底有什麼語法可用 = 安裝 = 在 hadoop 0.19.1 的 contrib 內已有 hive 的專案,先匯入環境變數後在進到此資料夾執行hive就可以使用了 {{{ $ export HADOOP=/opt/hadoop/ $ export HIVE_HOME=/opt/hadoop/contrib/hive/ $ cd /opt/hadoop/contrib/hive $ bin/hive hive> }}} = 測試 = {{{ hive> CREATE TABLE pokes (foo INT, bar STRING); }}} 輸出如: {{{ #!sh OK Time taken: 0.251 seconds }}} * 執行 {{{ hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); }}} 輸出如: {{{ #!sh OK Time taken: 0.106 seconds }}} * 執行 {{{ hive> SHOW TABLES; }}} 輸出如: {{{ #!sh OK invites pokes Time taken: 0.107 seconds }}} * 執行 {{{ hive> DESCRIBE invites; }}} 輸出如: {{{ #!sh OK foo int bar string ds string Time taken: 0.151 seconds }}} * 執行 {{{ hive> ALTER TABLE pokes ADD COLUMNS (new_col INT); }}} 輸出如: {{{ #!sh OK Time taken: 0.117 seconds }}} * 執行 {{{ hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment'); }}} 輸出如: {{{ #!sh OK Time taken: 0.152 seconds }}} * 執行 {{{ hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes; }}} 輸出如: {{{ #!sh Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv1.txt Loading data to table pokes OK Time taken: 0.288 seconds }}} * 執行 {{{ hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15'); }}} 輸出如: {{{ #!sh Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv2.txt Loading data to table invites partition {ds=2008-08-15} OK Time taken: 0.524 seconds }}} * 執行 {{{ hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08'); }}} 輸出如: {{{ #!sh Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv3.txt Loading data to table invites partition {ds=2008-08-08} OK Time taken: 0.406 seconds }}} * 執行 {{{ hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a; }}} 輸出如: {{{ #!sh Total MapReduce jobs = 1 Starting Job = job_200902261245_0002, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0002 Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0002 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% Ended Job = job_200902261245_0002 Moving data to: /tmp/hdfs_out OK Time taken: 18.551 seconds }}} * 執行 {{{ hive> select count(1) from pokes; }}} 輸出如: {{{ #!sh Total MapReduce jobs = 2 Number of reducers = 1 In order to change numer of reducers use: set mapred.reduce.tasks = Starting Job = job_200902261245_0003, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0003 Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0003 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% map = 100%, reduce =17% map = 100%, reduce =100% Ended Job = job_200902261245_0003 Starting Job = job_200902261245_0004, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0004 Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0004 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% map = 100%, reduce =100% Ended Job = job_200902261245_0004 OK 500 Time taken: 57.285 seconds }}} * 執行 {{{ hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a; }}} 輸出如: {{{ #!sh Total MapReduce jobs = 1 Starting Job = job_200902261245_0005, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0005 Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0005 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% Ended Job = job_200902261245_0005 Moving data to: /tmp/hdfs_out OK Time taken: 18.349 seconds }}} * 執行 {{{ hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(1) FROM invites a; }}} 輸出如: {{{ #!sh Total MapReduce jobs = 2 Number of reducers = 1 In order to change numer of reducers use: set mapred.reduce.tasks = Starting Job = job_200902261245_0006, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0006 Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0006 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% map = 100%, reduce =17% map = 100%, reduce =100% Ended Job = job_200902261245_0006 Starting Job = job_200902261245_0007, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0007 Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0007 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% map = 100%, reduce =17% map = 100%, reduce =100% Ended Job = job_200902261245_0007 Moving data to: /tmp/reg_5 OK Time taken: 70.956 seconds }}} = 資料庫內的資料 = * 存在於 hdfs://user/hive/warehouse 內,一個table為一個資料夾