首页 > 技术文章 > DataX 实战案例 -- 使用datax实现将hdfs数据导入到mysql表中

tenic 2021-06-19 00:23 原文

  • 需求: 将hdfs上数据文件 user.txt 导入到mysql数据库的user表中。

  • 1、创建作业的配置文件(json格式)

    • 查看配置模板,执行脚本命令
      [hadoop@hadoop03 ~]$ cd /bigdata/install/datax
      [hadoop@hadoop03 datax]$ python bin/datax.py -r hdfsreader -w mysqlwriter
      
      DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
      Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
      
      Please refer to the hdfsreader document:
           https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md 
      
      Please refer to the mysqlwriter document:
           https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 
       
      Please save the following configuration as a json file and  use
           python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
      to run the job.
      
      {
          "job": {
              "content": [
                  {
                      "reader": {
                          "name": "hdfsreader", 
                          "parameter": {
                          "column": [], 
                          "defaultFS": "", 
                              "encoding": "UTF-8", 
                              "fieldDelimiter": ",", 
                              "fileType": "orc", 
                              "path": ""
                          }
                      }, 
                      "writer": {
                          "name": "mysqlwriter", 
                          "parameter": {
                              "column": [], 
                              "connection": [
                                  {
                                      "jdbcUrl": "", 
                                      "table": []
                                  }
                              ], 
                              "password": "", 
                              "preSql": [], 
                              "session": [], 
                              "username": "", 
                              "writeMode": ""
                          }
                      }
                  }
              ], 
              "setting": {
                  "speed": {
                      "channel": ""
                  }
              }
          }
      }
      
    • 其中hdfsreader插件文档
  • 2、根据模板写配置文件

    • 进入到 /bigdata/install/datax/job 目录,然后创建配置文件 hdfs2mysql.json, 文件内容如下:
      {
          "job": {
              "setting": {
                  "speed": {
                       "channel":1
                  }
              },
              "content": [
                  {
                      "reader": {
                          "name": "hdfsreader",
                          "parameter": {
                          "defaultFS": "hdfs://hadoop01:8020",
                              "path": "/user.txt",                  
                              "fileType": "text",
                              "encoding": "UTF-8",
                              "fieldDelimiter": "\t",
                              "column": [
                                     {
                                      "index": 0,
                                      "type": "long"
                                     },
                                     {
                                      "index": 1,
                                      "type": "string"
                                     },
                                     {
                                      "index": 2,
                                      "type": "long"
                                     }
                              ]
                            }
                        },
                     "writer": {
                          "name": "mysqlwriter",
                          "parameter": {
                              "writeMode": "insert",
                              "username": "root",
                              "password": "123456",
                              "column": [
                                  "id",
                                  "name",
                                  "age"
                              ],
                              "preSql": [
                                  "delete from user"
                              ],
                              "connection": [
                                  {
                                      "jdbcUrl": "jdbc:mysql://hadoop02:3306/datax?useUnicode=true&characterEncoding=utf-8",
                                      "table": [
                                          "user"
                                      ]
                                  }
                              ]
                          }
                      }
                  }
              ]
          }
      }
      
      
  • 3、准备HDFS上测试数据文件 user.txt

    • user.txt文件内容如下
      1	zhangsan  20
      2	lisi  29
      3	wangwu  25
      4	zhaoliu  35
      5	kobe  40
      
    • 文件中每列字段通过\t 制表符进行分割,上传文件到hdfs上
      [hadoop@hadoop03 ~]$ hdfs dfs -put user.txt /
      
  • 4、创建目标表

    mysql> create table datax.user(id int,name varchar(20),age int);
    
  • 5、启动DataX

    [hadoop@hadoop03 ~]$ cd /bigdata/install/datax
    [hadoop@hadoop03 bin]$ python bin/datax.py job/hdfs2mysql.json 
    
  • 6、观察控制台输出结果

    同步结束,显示日志如下:
    
    任务启动时刻                    : 2021-06-18 12:02:47
    任务结束时刻                    : 2021-06-18 12:02:58
    任务总计耗时                    :                 11s
    任务平均流量                    :                4B/s
    记录写入速度                    :              0rec/s
    读出记录总数                    :                   5
    读写失败总数                    :                   0
    
  • 7、查看user表数据

推荐阅读