首页 > 解决方案 > 为什么名称节点只分配两个块而不是 3,尽管复制器设置设置为 3?

问题描述

我注意到这种行为,即使我的块复制设置设置为 3,在从客户端上传期间,名称节点有时会分配 2 个块,有时会分配 3 个块。有没有办法一直强制执行 3 个块?我发现dfs.replication.minHadoop 版本 2.7.3 中已弃用该属性。

我也可以在我的 hdfs-client 上设置它,还是需要在客户端、namenode 和 snamenode 上设置它并重新启动 nn 和 sn?

在我的hdfs-site.xml中,我在 Namenode、Snamenode 和 hdfs-client 机器(本地机器)上都将它设置为 3。

 <property>
      <name>dfs.replication</name>
      <value>3</value>
  </property>

Hadoop版本信息:

> hadoop version
20/08/26 10:57:36 DEBUG util.VersionInfo: version: 2.7.3
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0

我在设置时看到了相同的行为dfs.replication=2,有时只分配 1 个块用于写入,有时 2 个。

顺便说一句,我正在使用 fsck 命令检查块和位置

> hdfs fsck /tmp/file1.txt -files -locations -blocks

更新#2

> hdfs fsck /tmp/25082020_test/88.txt -files -locations -blocks
FSCK started by sharad.mishra (auth:SIMPLE) from /10.3.61.108 for path /tmp/25082020_test/88.txt at Thu Aug 27 09:30:29 EDT 2020
/tmp/25082020_test/88.txt 40 bytes, 1 block(s):  OK
0. BP-378822342-x.x.x.x-1515189431494:blk_1141207020_67468539 len=40 repl=2 [DatanodeInfoWithStorage[x.x.x.x:50010,DS-9eca5bb6-5d91-400c-8d59-ea0ed44a330d,DISK], DatanodeInfoWithStorage[x.x.x.x:50010,DS-d4b510b6-a6aa-4139-b9d0-64576ef2de6f,DISK]]

Status: HEALTHY
 Total size:    40 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 40 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          42
 Number of racks:               2
FSCK ended at Thu Aug 27 09:30:29 EDT 2020 in 1 milliseconds


The filesystem under path '/tmp/25082020_test/88.txt' is HEALTHY

更新#4

在复制过程中,namenode 只分配了一个块(DatanodeInfoStorage)并且不是以机架感知方式,但是第二个块以机架感知方式异步复制

❯ hdfs dfs -Ddfs.replication=2 -copyFromLocal file1.txt 

/tmp/25082020_test/85.txt
20/08/26 14:08:16 DEBUG util.Shell: setsid is not available on this machine. So not using it.
20/08/26 14:08:16 DEBUG util.Shell: setsid exited with exit code 0
20/08/26 14:08:16 DEBUG conf.Configuration: parsing URL jar:file:/Users/sharad.mishra/Library/hadoop/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar!/core-default.xml
20/08/26 14:08:16 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@4b952a2d
20/08/26 14:08:16 DEBUG conf.Configuration: parsing URL file:/Users/sharad.mishra/Library/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml
20/08/26 14:08:16 DEBUG conf.Configuration: parsing input stream java.io.BufferedInputStream@528931cf
20/08/26 14:08:16 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)])
20/08/26 14:08:16 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)])
20/08/26 14:08:16 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, valueName=Time, value=[GetGroups])
20/08/26 14:08:16 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
20/08/26 14:08:16 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
20/08/26 14:08:16 DEBUG security.Groups: Creating new Groups object
20/08/26 14:08:16 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
20/08/26 14:08:16 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
20/08/26 14:08:16 DEBUG util.NativeCodeLoader: java.library.path=/Users/sharad.mishra/Library/hadoop/hadoop-2.7.3/lib/native
20/08/26 14:08:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/26 14:08:16 DEBUG util.PerformanceAdvisory: Falling back to shell based
20/08/26 14:08:16 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
20/08/26 14:08:16 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
20/08/26 14:08:16 DEBUG security.UserGroupInformation: hadoop login
20/08/26 14:08:16 DEBUG security.UserGroupInformation: hadoop login commit
20/08/26 14:08:16 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: sharad.mishra
20/08/26 14:08:16 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: sharad.mishra" with name sharad.mishra
20/08/26 14:08:16 DEBUG security.UserGroupInformation: User entry: "sharad.mishra"
20/08/26 14:08:16 DEBUG security.UserGroupInformation: UGI loginUser:sharad.mishra (auth:SIMPLE)
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
20/08/26 14:08:16 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/26 14:08:16 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/26 14:08:16 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://sample-nameservice
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
20/08/26 14:08:16 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
20/08/26 14:08:16 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@70e9c95d
20/08/26 14:08:16 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:17 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
20/08/26 14:08:17 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
20/08/26 14:08:17 DEBUG ipc.Client: The ping interval is 60000 ms.
20/08/26 14:08:17 DEBUG ipc.Client: Connecting to sample-hw-namenode.casalemedia.com/x.x.x.220:8020
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra: starting, having connections 1
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #0
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #0
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 149ms
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #1
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #1
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 39ms
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #2
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #2
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 38ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: /tmp/25082020_test/85.txt._COPYING_: masked=rw-r--r--
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #3
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #3
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: create took 39ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: computePacketChunkSize: src=/tmp/25082020_test/85.txt._COPYING_, chunkSize=516, chunksPerPacket=126, packetSize=65016
20/08/26 14:08:17 DEBUG hdfs.LeaseRenewer: Lease renewer daemon for [DFSClient_NONMAPREDUCE_1494735345_1] with renew id 1 started
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #4
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #4
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 36ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: DFSClient writeChunk allocating new packet seqno=0, src=/tmp/25082020_test/85.txt._COPYING_, packetSize=65016, chunksPerPacket=126, bytesCurBlock=0
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Queued packet 0
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Queued packet 1
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Allocating new block
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Waiting for ack for: 1
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #5
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #5
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: addBlock took 43ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: pipeline = DatanodeInfoWithStorage[x.x.x.231:50010,DS-9eca5bb6-5d91-400c-8d59-ea0ed44a330d,DISK]
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Connecting to datanode x.x.x.231:50010
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Send buf size 131072
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #6
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #6
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getServerDefaults took 35ms
20/08/26 14:08:17 DEBUG sasl.SaslDataTransferClient: SASL client skipping handshake in unsecured configuration for addr = /x.x.x.231, datanodeId = DatanodeInfoWithStorage[x.x.x.231:50010,DS-9eca5bb6-5d91-400c-8d59-ea0ed44a330d,DISK]
20/08/26 14:08:17 DEBUG hdfs.DFSClient: DataStreamer block BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 40
20/08/26 14:08:18 DEBUG hdfs.DFSClient: DFSClient seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
20/08/26 14:08:18 DEBUG hdfs.DFSClient: DataStreamer block BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573 sending packet packet seqno: 1 offsetInBlock: 40 lastPacketInBlock: true lastByteOffsetInBlock: 40
20/08/26 14:08:18 DEBUG hdfs.DFSClient: DFSClient seqno: 1 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
20/08/26 14:08:18 DEBUG hdfs.DFSClient: Closing old block BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573
20/08/26 14:08:18 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #7
20/08/26 14:08:18 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #7
20/08/26 14:08:18 DEBUG ipc.ProtobufRpcEngine: Call: complete took 37ms
20/08/26 14:08:18 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #8
20/08/26 14:08:19 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #8
20/08/26 14:08:19 DEBUG ipc.ProtobufRpcEngine: Call: rename took 875ms
20/08/26 14:08:19 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:19 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:19 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:19 DEBUG ipc.Client: Stopping client
20/08/26 14:08:19 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra: closed
20/08/26 14:08:19 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra: stopped, remaining connections 0

fsck 命令的输出

❯ hdfs fsck /tmp/25082020_test/85.txt -files -locations -blocks
20/08/27 13:01:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/27 13:01:38 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn1.  Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/27 13:01:38 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn2.  Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/27 13:01:38 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Connecting to namenode via http://sample-hw-namenode.casalemedia.com:50070/fsck?ugi=sharad.mishra&files=1&locations=1&blocks=1&path=%2Ftmp%2F25082020_test%2F85.txt
FSCK started by sharad.mishra (auth:SIMPLE) from /10.3.61.108 for path /tmp/25082020_test/85.txt at Thu Aug 27 13:01:38 EDT 2020
/tmp/25082020_test/85.txt 40 bytes, 1 block(s):  OK
0. BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573 len=40 repl=2 [DatanodeInfoWithStorage[x.x.x.231:50010,DS-05f16460-cb85-41e3-98e1-f6f7366b2738,DISK], DatanodeInfoWithStorage[10.7.24.197:50010,DS-fa4ebf78-9bfc-404f-b1d6-909098c0b394,DISK]]

Status: HEALTHY
 Total size:    40 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 40 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          42
 Number of racks:               2
FSCK ended at Thu Aug 27 13:01:38 EDT 2020 in 0 milliseconds


The filesystem under path '/tmp/25082020_test/85.txt' is HEALTHY

标签: hadoophdfsreplication

解决方案


推荐阅读