首页 > 技术文章 > Sqoop导入mysql数据到Hbase

ggzone 2016-01-30 13:04 原文

sqoop import --driver com.mysql.jdbc.Driver --connect "jdbc:mysql://11.143.18.29:3306/db_1"  --username root --password root --query "select rowkey,dtTime from tb1  where 1=1 and \$CONDITIONS" --hbase-table tb1--column-family t --hbase-row-key id  -split-by dtTime -m 2

导入大表时一定要注意查看数据是否分布均匀,否则很容易出现java heap space错误

sqoop导入mysql数据到hbase时,如果带有--hbase-create-table选项出现如下错误,据说是兼容问题:

16/01/24 13:46:33 INFO zookeeper.ClientCnxn: Session establishment complete on server slave1/192.168.2.80:2181, sessionid = 0x252720e0fac0008, negotiated timeout = 40000

16/01/24 13:46:33 INFO mapreduce.HBaseImportJob: Creating missing HBase table exchangeshopflow

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
    at org.apache.sqoop.mapreduce.HBaseImportJob.jobSetup(HBaseImportJob.java:222)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
    at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:729)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

解决方法:
手动建hbase表,然后使用sqoop导入

包含自定义field:

sqoop import --driver com.mysql.jdbc.Driver --connect "jdbc:mysql://192.168.2.80:3306/db_1"  --username root --password root --query "select concat(concat(md5(concat(PID,vid)),UNIX_TIMESTAMP(dtTime)), 100000+right(\`__#alibaba_rds_row_id#__\`,5)) as id,dttime,vid,pid,zoneid,type,count,reason,\`__#alibaba_rds_row_id#__\` from achi where  \$CONDITIONS" --hbase-table achi --column-family a --hbase-row-key id  -split-by \`__#alibaba_rds_row_id#__\`

直接导入:

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://192.168.2.80:3306/db_1"  --username root --password root --table achi --hbase-table achi --column-family a --hbase-row-key dtTime  -split-by dtTime --hbase-create-table

定时增量导入

sqoop job --create t11 -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://192.168.2.80:3306/db_1"  --username root --password root --table t2 --hbase-table t007117 --column-family a --hbase-row-key id  -split-by id --hbase-create-table --incremental append --check-column id  --last-value 0;

定时任务

sqoop job --create t117 -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://192.168.2.80:3306/db_1"  --username root --password root --query "select concat(concat(md5(concat(PID,vid)),UNIX_TIMESTAMP(dtTime)), 100000+right(id,5)) as key_id,t.*  from t2 as t where \$CONDITIONS" --hbase-table t007117 --column-family a --hbase-row-key id  -split-by id --hbase-create-table --incremental append --check-column id  --last-value 0;

使用–query时,–incremental append –check-column id –last-value 0;不起作用

推荐阅读