首页 > 技术文章 > HDFS命令整理持续添加

youngchaolin 2020-02-12 19:16 原文

hdfs中有很多常用命令,持续记录一下。

基本命令 

基本命令就是hadoop fs开头或hdfs dfs开头,两者效果相同,可以通过'hadoop fs -help 命令'或'hdfs dfs -help 命令'来查看具体命令的解释。

[hadoop@node01 ~]$ hadoop fs
Usage: hadoop fs [generic options]
    [-appendToFile <localsrc> ... <dst>]
    [-cat [-ignoreCrc] <src> ...]
    [-checksum <src> ...]
    [-chgrp [-R] GROUP PATH...]
    [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
    [-chown [-R] [OWNER][:[GROUP]] PATH...]
    [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
    [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    [-count [-q] [-h] [-v] [-x] <path> ...]
    [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
    [-createSnapshot <snapshotDir> [<snapshotName>]]
    [-deleteSnapshot <snapshotDir> <snapshotName>]
    [-df [-h] [<path> ...]]
    [-du [-s] [-h] [-x] <path> ...]
    [-expunge]
    [-find <path> ... <expression> ...]
    [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    [-getfacl [-R] <path>]
    [-getfattr [-R] {-n name | -d} [-e en] <path>]
    [-getmerge [-nl] <src> <localdst>]
    [-help [cmd ...]]
    [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
    [-mkdir [-p] <path> ...]
    [-moveFromLocal <localsrc> ... <dst>]
    [-moveToLocal <src> <localdst>]
    [-mv <src> ... <dst>]
    [-put [-f] [-p] [-l] <localsrc> ... <dst>]
    [-renameSnapshot <snapshotDir> <oldName> <newName>]
    [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
    [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
    [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
    [-setfattr {-n name [-v value] | -x name} <path>]
    [-setrep [-R] [-w] <rep> <path> ...]
    [-stat [format] <path> ...]
    [-tail [-f] <file>]
    [-test -[defsz] <path>]
    [-text [-ignoreCrc] <src> ...]
    [-touchz <path> ...]
    [-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

You have new mail in /var/spool/mail/root

比较常用的通用选项就是-ls、-put、-get、-cat、-rm、-mkdir等。

(1)设置副本数,可以使用如下命令设置临时生效,永久生效需要在hdfs-site.xml中配置。

# 5个副本
[hadoop@node01 ~]$ hadoop fs -setrep -R 5 /readme.txt
20/02/12 19:11:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Replication 5 set: /readme.txt
You have new mail in /var/spool/mail/root
# 默认3,变成5
[hadoop@node01 ~]$ hadoop fs -ls 5 /readme.txt
20/02/12 19:11:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `5': No such file or directory
-rw-r--r--   5 hadoop supergroup         36 2019-10-23 22:58 /readme.txt

...TOADD

hdfs和getconf使用

(1)获取namenode节点名称

[hadoop@node01 ~]$ hdfs getconf -namenodes
20/02/12 18:50:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
node01

(2)获取hdfs最小块信息

[hadoop@node01 ~]$ hdfs getconf -confKey dfs.namenode.fs-limits.min-block-size
20/02/12 18:51:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1048576

(3)获取hdfs的namenode的RPC地址

[hadoop@node01 ~]$ hdfs getconf -nnRpcAddresses
20/02/12 18:52:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
node01:8020

...TOADD

hdfs和dfsadmin使用

这个结合使用将获取超级权限。

[hadoop@node01 ~]$ hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
    [-report [-live] [-dead] [-decommissioning]]
    [-safemode <enter | leave | get | wait>]
    [-saveNamespace]
    [-rollEdits]
    [-restoreFailedStorage true|false|check]
    [-refreshNodes]
    [-setQuota <quota> <dirname>...<dirname>]
    [-clrQuota <dirname>...<dirname>]
    [-setSpaceQuota <quota> <dirname>...<dirname>]
    [-clrSpaceQuota <dirname>...<dirname>]
    [-finalizeUpgrade]
    [-rollingUpgrade [<query|prepare|finalize>]]
    [-refreshServiceAcl]
    [-refreshUserToGroupsMappings]
    [-refreshSuperUserGroupsConfiguration]
    [-refreshCallQueue]
    [-refresh <host:ipc_port> <key> [arg1..argn]
    [-reconfig <datanode|...> <host:ipc_port> <start|status|properties>]
    [-printTopology]
    [-refreshNamenodes datanode_host:ipc_port]
    [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
    [-setBalancerBandwidth <bandwidth in bytes per second>]
    [-fetchImage <local directory>]
    [-allowSnapshot <snapshotDir>]
    [-disallowSnapshot <snapshotDir>]
    [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
    [-getDatanodeInfo <datanode_host:ipc_port>]
    [-metasave filename]
    [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
    [-listOpenFiles]
    [-help [cmd]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

You have new mail in /var/spool/mail/root

(1)查看当前模式,是否是安全模式,进入安全模式后,需手动离开安全模式

# 查看解释
[hadoop@node01 ~]$ hdfs dfsadmin -help safemode
-safemode <enter|leave|get|wait>:  Safe mode maintenance command.
        Safe mode is a Namenode state in which it
            1.  does not accept changes to the name space (read-only)
            2.  does not replicate or delete blocks.
        Safe mode is entered automatically at Namenode startup, and
        leaves safe mode automatically when the configured minimum
        percentage of blocks satisfies the minimum replication
        condition.  Safe mode can also be entered manually, but then
        it can only be turned off manually as well.
# 进入安全模式
[hadoop@node01 ~]$ hdfs dfsadmin -safemode enter
20/02/12 18:58:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is ON
You have new mail in /var/spool/mail/root
# 离开安全模式
[hadoop@node01 ~]$ hdfs dfsadmin -safemode leave
20/02/12 18:58:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF
# 查看状态
[hadoop@node01 ~]$ hdfs dfsadmin -safemode get
20/02/12 18:59:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF

(2)可以配合hdfs dfsadmin完成文件快照的创建。

文件快照:

① 可以对目录和整个hdfs系统做快照

②无意删除重要文件后,可以通过快照恢复

③快照不会拷贝block数据,只对block列表和文件大小做快照

允许快照

# 应用在目录
[root@hadoop01 ~]# hdfs dfsadmin -allowSnapshot /testSnapshot 20/02/16 21:35:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Allowing snaphot on /testSnapshot succeeded

创建快照

# 创建快照,快照名mysnapshot
[root@hadoop01 ~]# hdfs dfs -createSnapshot /testSnapshot mysnapshot 20/02/16 21:37:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /testSnapshot/.snapshot/mysnapshot

查看快照

# 查看快照,快照在隐藏目录.snapshot下
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - root supergroup 0 2020-02-16 21:37 /testSnapshot/.snapshot/mysnapshot

重命名快照

[root@hadoop01 ~]# hdfs dfs -renameSnapshot /testSnapshot mysnapshot newnameSnapshot
20/02/16 21:38:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot
20/02/16 21:39:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x   - root supergroup          0 2020-02-16 21:37 /testSnapshot/.snapshot/newnameSnapshot

模拟误删文件

[root@hadoop01 ~]# hadoop fs -rm /testSnapshot/log.txt
20/02/16 21:39:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/16 21:39:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://hadoop01:9000/testSnapshot/log.txt' to trash at: hdfs://hadoop01:9000/user/root/.Trash/Current

通过快照恢复

# 确认删除了lot.txt文件
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot 20/02/16 21:40:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 恢复文件在快照中
[root@hadoop01
~]# hadoop fs -ls /testSnapshot/.snapshot/newnameSnapshot 20/02/16 21:40:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 12 2020-02-16 21:31 /testSnapshot/.snapshot/newnameSnapshot/log.txt
# 恢复快照 [root@hadoop01
~]# hadoop fs -cp /testSnapshot/.snapshot/newnameSnapshot/log.txt /testSnapshot 20/02/16 21:43:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 21:43:14 WARN hdfs.DFSClient: DFSInputStream has been closed already
# 文件恢复 [root@hadoop01
~]# hadoop fs -ls /testSnapshot 20/02/16 21:43:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 12 2020-02-16 21:43 /testSnapshot/log.txt

快照删除

# 删除快照
[root@hadoop01 ~]# hdfs dfs -deleteSnapshot /testSnapshot newnameSnapshot 20/02/16 21:57:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 快照已删除
[root@hadoop01
~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:57:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

...TOADD

hdfs和fsck使用

查看命令提示,可以根据它获取一些关键信息。

[hadoop@node01 ~]$ hdfs fsck
Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-maintenance]
    <path>    start checking from this path
    -move    move corrupted files to /lost+found
    -delete    delete corrupted files
    -files    print out files being checked
    -openforwrite    print out files opened for write
    -includeSnapshots    include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
    -list-corruptfileblocks    print out list of missing blocks and files they belong to
    -blocks    print out block report
    -locations    print out locations for every block
    -racks    print out network topology for data-node locations

    -maintenance    print out maintenance state node details
    -blockId    print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

Please Note:
    1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
    2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots.

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

(1)显示hdfs块信息

[hadoop@node01 ~]$ hdfs fsck  /readme.txt -files -blocks -locations
20/02/12 19:03:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&path=%2Freadme.txt
FSCK started by hadoop (auth:SIMPLE) from /192.168.200.100 for path /readme.txt at Wed Feb 12 19:03:01 CST 2020
/readme.txt 36 bytes, 1 block(s):  OK
# blockpool信息,副本在数据节点的信息
0. BP-1783492158-192.168.200.100-1571483500510:blk_1073741846_1022 len=36 Live_repl=3 [DatanodeInfoWithStorage[192.168.200.100:50010,DS-76c283ae-9025-4959-98bc-69d064c3f3ef,DISK], DatanodeInfoWithStorage[192.168.200.120:50010,DS-467b3182-8386-4beb-ad37-14392958e81a,DISK], DatanodeInfoWithStorage[192.168.200.110:50010,DS-ba33d23f-c8b4-4d4d-bd64-78a86e6dc2ac,DISK]]

Status: HEALTHY
 Total size:    36 B
 Total dirs:    0
 Total files:    1
 Total symlinks:        0
 Total blocks (validated):    1 (avg. block size 36 B)
 Minimally replicated blocks:    1 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1
FSCK ended at Wed Feb 12 19:03:01 CST 2020 in 4 milliseconds

# 健康
The filesystem under path '/readme.txt' is HEALTHY

(2)查看是否有文件损坏

[hadoop@node01 ~]$ hdfs fsck -list-corruptfileblocks
20/02/12 19:13:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

...TOADD

其他命令

(1)检查本地安装的压缩库

[hadoop@node01 ~]$ hadoop checknative
20/02/12 19:06:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop:  false
zlib:    false
snappy:  false
lz4:     false
bzip2:   false
openssl: false

(2)格式化节点名称,一般只在初次搭建集群使用

hadoop namenode -format

(3)执行jar包

hadoop jar jar包名 类的全路径名 [参数]

(4)小文件压缩和解压缩

小文件治理,一般有打成har包和使用sequence file两种方式,如果是打成har包或解压缩,需要用到如下命令。

# 先准备好文件
[root@hadoop01 /home]# hadoop fs -ls -R /archive
20/02/16 10:41:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 /archive/th1
-rw-r--r--   1 root supergroup          0 2020-02-16 10:40 /archive/th1/1.txt
-rw-r--r--   1 root supergroup          0 2020-02-16 10:41 /archive/th1/2.txt
drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 /archive/th2
-rw-r--r--   1 root supergroup          0 2020-02-16 10:41 /archive/th2/3.txt
-rw-r--r--   1 root supergroup          0 2020-02-16 10:41 /archive/th2/4.txt
# 打成har包命名提示
# -p 指定父目录
# -r 指定副本数
# src 子目录
# 打包后路径
[root@hadoop01 /home]# hadoop archive
archive -archiveName <NAME>.har -p <parent path> [-r <replication factor>]<src>* <dest>

Invalid usage.
# 打包 ,/archive下的th1和th2都打包
[root@hadoop01 /home]# hadoop archive -archiveName test.har -p /archive -r 1 th1 th2 /outhar
20/02/16 10:44:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/16 10:44:05 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
20/02/16 10:44:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
20/02/16 10:44:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
20/02/16 10:44:07 INFO mapreduce.JobSubmitter: number of splits:1
20/02/16 10:44:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1581820473891_0001
20/02/16 10:44:08 INFO impl.YarnClientImpl: Submitted application application_1581820473891_0001
20/02/16 10:44:08 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1581820473891_0001/
20/02/16 10:44:08 INFO mapreduce.Job: Running job: job_1581820473891_0001
20/02/16 10:44:16 INFO mapreduce.Job: Job job_1581820473891_0001 running in uber mode : true
20/02/16 10:44:16 INFO mapreduce.Job:  map 0% reduce 0%
20/02/16 10:44:19 INFO mapreduce.Job:  map 100% reduce 100%
20/02/16 10:44:19 INFO mapreduce.Job: Job job_1581820473891_0001 completed successfully
20/02/16 10:44:19 INFO mapreduce.Job: Counters: 52
    File System Counters
        FILE: Number of bytes read=1014
        FILE: Number of bytes written=1537
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1196
        HDFS: Number of bytes written=257101
        HDFS: Number of read operations=67
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=15
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=820
        Total time spent by all reduces in occupied slots (ms)=422
        TOTAL_LAUNCHED_UBERTASKS=2
        NUM_UBER_SUBMAPS=1
        NUM_UBER_SUBREDUCES=1
        Total time spent by all map tasks (ms)=820
        Total time spent by all reduce tasks (ms)=422
        Total vcore-seconds taken by all map tasks=820
        Total vcore-seconds taken by all reduce tasks=422
        Total megabyte-seconds taken by all map tasks=839680
        Total megabyte-seconds taken by all reduce tasks=432128
    Map-Reduce Framework
        Map input records=7
        Map output records=7
        Map output bytes=471
        Map output materialized bytes=491
        Input split bytes=116
        Combine input records=0
        Combine output records=0
        Reduce input groups=7
        Reduce shuffle bytes=491
        Reduce input records=7
        Reduce output records=0
        Spilled Records=14
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=126
        CPU time spent (ms)=1200
        Physical memory (bytes) snapshot=516005888
        Virtual memory (bytes) snapshot=5989122048
        Total committed heap usage (bytes)=262676480
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=467
    File Output Format Counters
        Bytes Written=0
# 查看archive文件
[root@hadoop01 /home]# hdfs dfs -ls -R har:///outhar/test.har
20/02/16 10:44:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/16 10:44:49 WARN hdfs.DFSClient: DFSInputStream has been closed already
drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th1
-rw-r--r--   1 root supergroup          0 2020-02-16 10:40 har:///outhar/test.har/th1/1.txt
-rw-r--r--   1 root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th1/2.txt
drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th2
-rw-r--r--   1 root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th2/3.txt
-rw-r--r--   1 root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th2/4.txt
# 解压缩,顺序解压
[root@hadoop01 /home]# hdfs dfs -cp har:///outhar/test.har/th1 hdfs:/unarchive1
20/02/16 10:45:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
# 查看已解压ok
[root@hadoop01 /home]# hadoop fs -ls /unarchive1
20/02/16 10:45:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 root supergroup          0 2020-02-16 10:45 /unarchive1/1.txt
-rw-r--r--   1 root supergroup          0 2020-02-16 10:45 /unarchive1/2.txt
# 加压缩方式,并行mr解压
[root@hadoop01 /home]# hadoop distcp har:///outhar/test.har/th2 hdfs:/unarchive2
20/02/16 10:46:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/16 10:46:26 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[har:/outhar/test.har/th2], targetPath=hdfs:/unarchive2, targetPathExists=false, preserveRawXattrs=false}
20/02/16 10:46:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
20/02/16 10:46:27 WARN hdfs.DFSClient: DFSInputStream has been closed already
20/02/16 10:46:27 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
20/02/16 10:46:27 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
20/02/16 10:46:27 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
20/02/16 10:46:28 INFO mapreduce.JobSubmitter: number of splits:1
20/02/16 10:46:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1581820473891_0002
20/02/16 10:46:28 INFO impl.YarnClientImpl: Submitted application application_1581820473891_0002
20/02/16 10:46:28 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1581820473891_0002/
20/02/16 10:46:28 INFO tools.DistCp: DistCp job-id: job_1581820473891_0002
20/02/16 10:46:28 INFO mapreduce.Job: Running job: job_1581820473891_0002
20/02/16 10:46:35 INFO mapreduce.Job: Job job_1581820473891_0002 running in uber mode : true
20/02/16 10:46:35 INFO mapreduce.Job:  map 0% reduce 0%
20/02/16 10:46:36 INFO mapreduce.Job:  map 100% reduce 0%
20/02/16 10:46:37 INFO mapreduce.Job: Job job_1581820473891_0002 completed successfully
20/02/16 10:46:37 INFO mapreduce.Job: Counters: 35
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=0
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1077
        HDFS: Number of bytes written=126136
        HDFS: Number of read operations=78
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=1451
        Total time spent by all reduces in occupied slots (ms)=0
        TOTAL_LAUNCHED_UBERTASKS=1
        NUM_UBER_SUBMAPS=1
        Total time spent by all map tasks (ms)=1451
        Total vcore-seconds taken by all map tasks=1451
        Total megabyte-seconds taken by all map tasks=1485824
    Map-Reduce Framework
        Map input records=3
        Map output records=0
        Input split bytes=135
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=51
        CPU time spent (ms)=560
        Physical memory (bytes) snapshot=155303936
        Virtual memory (bytes) snapshot=2993799168
        Total committed heap usage (bytes)=25538560
    File Input Format Counters
        Bytes Read=461
    File Output Format Counters
        Bytes Written=0
    org.apache.hadoop.tools.mapred.CopyMapper$Counter
        BYTESCOPIED=0
        BYTESEXPECTED=0
        COPY=3
# 解压ok
[root@hadoop01 /home]# hadoop fs -ls /unarchive2
20/02/16 10:47:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 root supergroup          0 2020-02-16 10:46 /unarchive2/3.txt
-rw-r--r--   1 root supergroup          0 2020-02-16 10:46 /unarchive2/4.txt

 

 

以上,持续添加中。

推荐阅读