hdfs中有很多常用命令,持续记录一下。
基本命令
基本命令就是hadoop fs开头或hdfs dfs开头,两者效果相同,可以通过'hadoop fs -help 命令'或'hdfs dfs -help 命令'来查看具体命令的解释。
[hadoop@node01 ~]$ hadoop fs Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] [-v] [-x] <path> ...] [-cp [-f] [-p | -p[topax]] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-x] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] <src> <localdst>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touchz <path> ...] [-usage [cmd ...]] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] You have new mail in /var/spool/mail/root
比较常用的通用选项就是-ls、-put、-get、-cat、-rm、-mkdir等。
(1)设置副本数,可以使用如下命令设置临时生效,永久生效需要在hdfs-site.xml中配置。
# 5个副本 [hadoop@node01 ~]$ hadoop fs -setrep -R 5 /readme.txt 20/02/12 19:11:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Replication 5 set: /readme.txt You have new mail in /var/spool/mail/root # 默认3,变成5 [hadoop@node01 ~]$ hadoop fs -ls 5 /readme.txt 20/02/12 19:11:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ls: `5': No such file or directory -rw-r--r-- 5 hadoop supergroup 36 2019-10-23 22:58 /readme.txt
...TOADD
hdfs和getconf使用
(1)获取namenode节点名称
[hadoop@node01 ~]$ hdfs getconf -namenodes 20/02/12 18:50:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable node01
(2)获取hdfs最小块信息
[hadoop@node01 ~]$ hdfs getconf -confKey dfs.namenode.fs-limits.min-block-size 20/02/12 18:51:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1048576
(3)获取hdfs的namenode的RPC地址
[hadoop@node01 ~]$ hdfs getconf -nnRpcAddresses 20/02/12 18:52:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable node01:8020
...TOADD
hdfs和dfsadmin使用
这个结合使用将获取超级权限。
[hadoop@node01 ~]$ hdfs dfsadmin Usage: hdfs dfsadmin Note: Administrative commands can only be run as the HDFS superuser. [-report [-live] [-dead] [-decommissioning]] [-safemode <enter | leave | get | wait>] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-setQuota <quota> <dirname>...<dirname>] [-clrQuota <dirname>...<dirname>] [-setSpaceQuota <quota> <dirname>...<dirname>] [-clrSpaceQuota <dirname>...<dirname>] [-finalizeUpgrade] [-rollingUpgrade [<query|prepare|finalize>]] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-refresh <host:ipc_port> <key> [arg1..argn] [-reconfig <datanode|...> <host:ipc_port> <start|status|properties>] [-printTopology] [-refreshNamenodes datanode_host:ipc_port] [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]] [-setBalancerBandwidth <bandwidth in bytes per second>] [-fetchImage <local directory>] [-allowSnapshot <snapshotDir>] [-disallowSnapshot <snapshotDir>] [-shutdownDatanode <datanode_host:ipc_port> [upgrade]] [-getDatanodeInfo <datanode_host:ipc_port>] [-metasave filename] [-triggerBlockReport [-incremental] <datanode_host:ipc_port>] [-listOpenFiles] [-help [cmd]] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] You have new mail in /var/spool/mail/root
(1)查看当前模式,是否是安全模式,进入安全模式后,需手动离开安全模式
# 查看解释 [hadoop@node01 ~]$ hdfs dfsadmin -help safemode -safemode <enter|leave|get|wait>: Safe mode maintenance command. Safe mode is a Namenode state in which it 1. does not accept changes to the name space (read-only) 2. does not replicate or delete blocks. Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition. Safe mode can also be entered manually, but then it can only be turned off manually as well. # 进入安全模式 [hadoop@node01 ~]$ hdfs dfsadmin -safemode enter 20/02/12 18:58:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Safe mode is ON You have new mail in /var/spool/mail/root # 离开安全模式 [hadoop@node01 ~]$ hdfs dfsadmin -safemode leave 20/02/12 18:58:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Safe mode is OFF # 查看状态 [hadoop@node01 ~]$ hdfs dfsadmin -safemode get 20/02/12 18:59:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Safe mode is OFF
(2)可以配合hdfs dfsadmin完成文件快照的创建。
文件快照:
① 可以对目录和整个hdfs系统做快照
②无意删除重要文件后,可以通过快照恢复
③快照不会拷贝block数据,只对block列表和文件大小做快照
允许快照
# 应用在目录
[root@hadoop01 ~]# hdfs dfsadmin -allowSnapshot /testSnapshot 20/02/16 21:35:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Allowing snaphot on /testSnapshot succeeded
创建快照
# 创建快照,快照名mysnapshot
[root@hadoop01 ~]# hdfs dfs -createSnapshot /testSnapshot mysnapshot 20/02/16 21:37:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /testSnapshot/.snapshot/mysnapshot
查看快照
# 查看快照,快照在隐藏目录.snapshot下
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - root supergroup 0 2020-02-16 21:37 /testSnapshot/.snapshot/mysnapshot
重命名快照
[root@hadoop01 ~]# hdfs dfs -renameSnapshot /testSnapshot mysnapshot newnameSnapshot 20/02/16 21:38:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:39:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - root supergroup 0 2020-02-16 21:37 /testSnapshot/.snapshot/newnameSnapshot
模拟误删文件
[root@hadoop01 ~]# hadoop fs -rm /testSnapshot/log.txt 20/02/16 21:39:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 21:39:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://hadoop01:9000/testSnapshot/log.txt' to trash at: hdfs://hadoop01:9000/user/root/.Trash/Current
通过快照恢复
# 确认删除了lot.txt文件
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot 20/02/16 21:40:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 恢复文件在快照中
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot/newnameSnapshot 20/02/16 21:40:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 12 2020-02-16 21:31 /testSnapshot/.snapshot/newnameSnapshot/log.txt
# 恢复快照 [root@hadoop01 ~]# hadoop fs -cp /testSnapshot/.snapshot/newnameSnapshot/log.txt /testSnapshot 20/02/16 21:43:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 21:43:14 WARN hdfs.DFSClient: DFSInputStream has been closed already
# 文件恢复 [root@hadoop01 ~]# hadoop fs -ls /testSnapshot 20/02/16 21:43:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 12 2020-02-16 21:43 /testSnapshot/log.txt
快照删除
# 删除快照
[root@hadoop01 ~]# hdfs dfs -deleteSnapshot /testSnapshot newnameSnapshot 20/02/16 21:57:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 快照已删除
[root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:57:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...TOADD
hdfs和fsck使用
查看命令提示,可以根据它获取一些关键信息。
[hadoop@node01 ~]$ hdfs fsck Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-maintenance] <path> start checking from this path -move move corrupted files to /lost+found -delete delete corrupted files -files print out files being checked -openforwrite print out files opened for write -includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it -list-corruptfileblocks print out list of missing blocks and files they belong to -blocks print out block report -locations print out locations for every block -racks print out network topology for data-node locations -maintenance print out maintenance state node details -blockId print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc) Please Note: 1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status 2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots. Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions]
(1)显示hdfs块信息
[hadoop@node01 ~]$ hdfs fsck /readme.txt -files -blocks -locations 20/02/12 19:03:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&path=%2Freadme.txt FSCK started by hadoop (auth:SIMPLE) from /192.168.200.100 for path /readme.txt at Wed Feb 12 19:03:01 CST 2020 /readme.txt 36 bytes, 1 block(s): OK # blockpool信息,副本在数据节点的信息 0. BP-1783492158-192.168.200.100-1571483500510:blk_1073741846_1022 len=36 Live_repl=3 [DatanodeInfoWithStorage[192.168.200.100:50010,DS-76c283ae-9025-4959-98bc-69d064c3f3ef,DISK], DatanodeInfoWithStorage[192.168.200.120:50010,DS-467b3182-8386-4beb-ad37-14392958e81a,DISK], DatanodeInfoWithStorage[192.168.200.110:50010,DS-ba33d23f-c8b4-4d4d-bd64-78a86e6dc2ac,DISK]] Status: HEALTHY Total size: 36 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 1 (avg. block size 36 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Wed Feb 12 19:03:01 CST 2020 in 4 milliseconds # 健康 The filesystem under path '/readme.txt' is HEALTHY
(2)查看是否有文件损坏
[hadoop@node01 ~]$ hdfs fsck -list-corruptfileblocks 20/02/12 19:13:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2F The filesystem under path '/' has 0 CORRUPT files
...TOADD
其他命令
(1)检查本地安装的压缩库
[hadoop@node01 ~]$ hadoop checknative 20/02/12 19:06:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Native library checking: hadoop: false zlib: false snappy: false lz4: false bzip2: false openssl: false
(2)格式化节点名称,一般只在初次搭建集群使用
hadoop namenode -format
(3)执行jar包
hadoop jar jar包名 类的全路径名 [参数]
(4)小文件压缩和解压缩
小文件治理,一般有打成har包和使用sequence file两种方式,如果是打成har包或解压缩,需要用到如下命令。
# 先准备好文件 [root@hadoop01 /home]# hadoop fs -ls -R /archive 20/02/16 10:41:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable drwxr-xr-x - root supergroup 0 2020-02-16 10:41 /archive/th1 -rw-r--r-- 1 root supergroup 0 2020-02-16 10:40 /archive/th1/1.txt -rw-r--r-- 1 root supergroup 0 2020-02-16 10:41 /archive/th1/2.txt drwxr-xr-x - root supergroup 0 2020-02-16 10:41 /archive/th2 -rw-r--r-- 1 root supergroup 0 2020-02-16 10:41 /archive/th2/3.txt -rw-r--r-- 1 root supergroup 0 2020-02-16 10:41 /archive/th2/4.txt # 打成har包命名提示 # -p 指定父目录 # -r 指定副本数 # src 子目录 # 打包后路径 [root@hadoop01 /home]# hadoop archive archive -archiveName <NAME>.har -p <parent path> [-r <replication factor>]<src>* <dest> Invalid usage. # 打包 ,/archive下的th1和th2都打包 [root@hadoop01 /home]# hadoop archive -archiveName test.har -p /archive -r 1 th1 th2 /outhar 20/02/16 10:44:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 10:44:05 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032 20/02/16 10:44:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032 20/02/16 10:44:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032 20/02/16 10:44:07 INFO mapreduce.JobSubmitter: number of splits:1 20/02/16 10:44:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1581820473891_0001 20/02/16 10:44:08 INFO impl.YarnClientImpl: Submitted application application_1581820473891_0001 20/02/16 10:44:08 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1581820473891_0001/ 20/02/16 10:44:08 INFO mapreduce.Job: Running job: job_1581820473891_0001 20/02/16 10:44:16 INFO mapreduce.Job: Job job_1581820473891_0001 running in uber mode : true 20/02/16 10:44:16 INFO mapreduce.Job: map 0% reduce 0% 20/02/16 10:44:19 INFO mapreduce.Job: map 100% reduce 100% 20/02/16 10:44:19 INFO mapreduce.Job: Job job_1581820473891_0001 completed successfully 20/02/16 10:44:19 INFO mapreduce.Job: Counters: 52 File System Counters FILE: Number of bytes read=1014 FILE: Number of bytes written=1537 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1196 HDFS: Number of bytes written=257101 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=15 Job Counters Launched map tasks=1 Launched reduce tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=820 Total time spent by all reduces in occupied slots (ms)=422 TOTAL_LAUNCHED_UBERTASKS=2 NUM_UBER_SUBMAPS=1 NUM_UBER_SUBREDUCES=1 Total time spent by all map tasks (ms)=820 Total time spent by all reduce tasks (ms)=422 Total vcore-seconds taken by all map tasks=820 Total vcore-seconds taken by all reduce tasks=422 Total megabyte-seconds taken by all map tasks=839680 Total megabyte-seconds taken by all reduce tasks=432128 Map-Reduce Framework Map input records=7 Map output records=7 Map output bytes=471 Map output materialized bytes=491 Input split bytes=116 Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=491 Reduce input records=7 Reduce output records=0 Spilled Records=14 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=126 CPU time spent (ms)=1200 Physical memory (bytes) snapshot=516005888 Virtual memory (bytes) snapshot=5989122048 Total committed heap usage (bytes)=262676480 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=467 File Output Format Counters Bytes Written=0 # 查看archive文件 [root@hadoop01 /home]# hdfs dfs -ls -R har:///outhar/test.har 20/02/16 10:44:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 10:44:49 WARN hdfs.DFSClient: DFSInputStream has been closed already drwxr-xr-x - root supergroup 0 2020-02-16 10:41 har:///outhar/test.har/th1 -rw-r--r-- 1 root supergroup 0 2020-02-16 10:40 har:///outhar/test.har/th1/1.txt -rw-r--r-- 1 root supergroup 0 2020-02-16 10:41 har:///outhar/test.har/th1/2.txt drwxr-xr-x - root supergroup 0 2020-02-16 10:41 har:///outhar/test.har/th2 -rw-r--r-- 1 root supergroup 0 2020-02-16 10:41 har:///outhar/test.har/th2/3.txt -rw-r--r-- 1 root supergroup 0 2020-02-16 10:41 har:///outhar/test.har/th2/4.txt # 解压缩,顺序解压 [root@hadoop01 /home]# hdfs dfs -cp har:///outhar/test.har/th1 hdfs:/unarchive1 20/02/16 10:45:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already 20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already 20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already # 查看已解压ok [root@hadoop01 /home]# hadoop fs -ls /unarchive1 20/02/16 10:45:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 root supergroup 0 2020-02-16 10:45 /unarchive1/1.txt -rw-r--r-- 1 root supergroup 0 2020-02-16 10:45 /unarchive1/2.txt # 加压缩方式,并行mr解压 [root@hadoop01 /home]# hadoop distcp har:///outhar/test.har/th2 hdfs:/unarchive2 20/02/16 10:46:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 10:46:26 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[har:/outhar/test.har/th2], targetPath=hdfs:/unarchive2, targetPathExists=false, preserveRawXattrs=false} 20/02/16 10:46:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032 20/02/16 10:46:27 WARN hdfs.DFSClient: DFSInputStream has been closed already 20/02/16 10:46:27 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb 20/02/16 10:46:27 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor 20/02/16 10:46:27 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032 20/02/16 10:46:28 INFO mapreduce.JobSubmitter: number of splits:1 20/02/16 10:46:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1581820473891_0002 20/02/16 10:46:28 INFO impl.YarnClientImpl: Submitted application application_1581820473891_0002 20/02/16 10:46:28 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1581820473891_0002/ 20/02/16 10:46:28 INFO tools.DistCp: DistCp job-id: job_1581820473891_0002 20/02/16 10:46:28 INFO mapreduce.Job: Running job: job_1581820473891_0002 20/02/16 10:46:35 INFO mapreduce.Job: Job job_1581820473891_0002 running in uber mode : true 20/02/16 10:46:35 INFO mapreduce.Job: map 0% reduce 0% 20/02/16 10:46:36 INFO mapreduce.Job: map 100% reduce 0% 20/02/16 10:46:37 INFO mapreduce.Job: Job job_1581820473891_0002 completed successfully 20/02/16 10:46:37 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=0 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1077 HDFS: Number of bytes written=126136 HDFS: Number of read operations=78 HDFS: Number of large read operations=0 HDFS: Number of write operations=10 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=1451 Total time spent by all reduces in occupied slots (ms)=0 TOTAL_LAUNCHED_UBERTASKS=1 NUM_UBER_SUBMAPS=1 Total time spent by all map tasks (ms)=1451 Total vcore-seconds taken by all map tasks=1451 Total megabyte-seconds taken by all map tasks=1485824 Map-Reduce Framework Map input records=3 Map output records=0 Input split bytes=135 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=51 CPU time spent (ms)=560 Physical memory (bytes) snapshot=155303936 Virtual memory (bytes) snapshot=2993799168 Total committed heap usage (bytes)=25538560 File Input Format Counters Bytes Read=461 File Output Format Counters Bytes Written=0 org.apache.hadoop.tools.mapred.CopyMapper$Counter BYTESCOPIED=0 BYTESEXPECTED=0 COPY=3 # 解压ok [root@hadoop01 /home]# hadoop fs -ls /unarchive2 20/02/16 10:47:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 root supergroup 0 2020-02-16 10:46 /unarchive2/3.txt -rw-r--r-- 1 root supergroup 0 2020-02-16 10:46 /unarchive2/4.txt
以上,持续添加中。