用户
 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
查看: 1315|回复: 0

Hadoop 2.6.0 中遇到的HDFS bug 以及修复patch

[复制链接]

394

主题

412

帖子

2065

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
2065

活跃会员热心会员推广达人宣传达人灌水之王突出贡献优秀版主荣誉管理论坛元老

发表于 2016-4-22 11:42:32 | 显示全部楼层 |阅读模式



https://issues.apache.org/jira/browse/HDFS-7489  [已经解决]
BUG 特点
Incorrect locking in FsVolumeList#checkDirs can hang datanodes
Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes hanging their heartbeat and requests from clients. After some digging, I identified the culprit as being the checkDiskError() triggered by catching IOExceptions (in our case, SocketExceptions being triggered on one datanode by ReplicaAlreadyExistsExceptions on another datanode).
Thread dumps reveal that the checkDiskErrors() thread is holding a lock on the FsVolumeList:
[Bash shell] 纯文本查看 复制代码
"Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable [0x00007f4e2f855000]
   java.lang.Thread.State: RUNNABLE
        at java.io.UnixFileSystem.list(Native Method)
        at java.io.File.list(File.java:973)
        at java.io.File.listFiles(File.java:1051)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
        - locked <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
        at java.lang.Thread.run(Thread.java:662)
Other things would then lock the FsDatasetImpl while waiting for the FsVolumeList, e.g.:
[Bash shell] 纯文本查看 复制代码
"DataXceiver for client  at /10.10.0.52:46643 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]" daemon prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry [0x00007f4e3106d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
        - waiting to lock <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
        - locked <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:662)
That lock on the FsDatasetImpl then causes other threads to block:
[Bash shell] 纯文本查看 复制代码
"Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for monitor entry [0x00007f4e33390000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
        - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
        at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)

"DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, [DISK]file:/data/2/dfs/dn/, [DISK]file:/data/3/dfs/dn/, [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, [DISK]file:/data/6/dfs/dn/, [DISK]file:/data/7/dfs/dn/, [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, [DISK]file:/data/10/dfs/dn/, [DISK]file:/data/11/dfs/dn/, [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, [DISK]file:/data/14/dfs/dn/, [DISK]file:/data/15/dfs/dn/, [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, [DISK]file:/data/18/dfs/dn/, [DISK]file:/data/19/dfs/dn/, [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, [DISK]file:/data/22/dfs/dn/, [DISK]file:/data/23/dfs/dn/, [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, [DISK]file:/data/26/dfs/dn/, [DISK]file:/data/27/dfs/dn/, [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, [DISK]file:/data/30/dfs/dn/, [DISK]file:/data/31/dfs/dn/, [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, [DISK]file:/data/34/dfs/dn/, [DISK]file:/data/35/dfs/dn/]]  heartbeating to bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020" daemon prio=10 tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry [0x00007f4e361be000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
        - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
        - locked <0x000000063b182d80> (a java.lang.Object)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
        at java.lang.Thread.run(Thread.java:662)

"DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at /10.10.0.169:44229 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry [0x00007f4e2a448000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
        - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:662)
This occurs, I believe, because FsVolumeList#checkDirs() locks the entire FsVolumeList to prevent multiple simultaneous checkDirs() calls. On a slow system under high IO or with many disks, the checkDirs() call can take dozens of seconds or longer (empirically).
I think this can be improved by holding a separate mutex limited to checkDirs (must like datanode.checkDiskErrors() does) and only locking the full FsVolumeList if needed to update the volume list if any disks are removed. I'll attach a patch that does this.
We're running this patch in production and it's working as expected – a lock is held locally instead of on the entire FsVolumeList while checkDirs() is running:
[Bash shell] 纯文本查看 复制代码
"Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable [0x00007f4d45391000]
   java.lang.Thread.State: RUNNABLE
        at java.io.UnixFileSystem.createDirectory(Native Method)
        at java.io.File.mkdir(File.java:1157)
        at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
        - locked <0x000000063b24f540> (a java.lang.Object)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
        at java.lang.Thread.run(Thread.java:662)
、Feedback would be appreciated!










https://issues.apache.org/jira/browse/HDFS-7999 [已经解决]
BUG 特点
FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time

I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit.

The heartbeat thread stack:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)

  • waiting to lock <0x00000007b01428c0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
  • locked <0x00000007b0140ed0> (a java.lang.Object)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
    at java.lang.Thread.run(Thread.java:662)

The DataXceiver thread holds the dataset lock:
"DataXceiver for client at XXXXX" daemon prio=10 tid=0x00007f14041e6480 nid=0x52bc in Object.wait() [0x00007f11d78f7000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1194)
locked <0x00000007a33b85d8> (a org.apache.hadoop.util.Daemon)
at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231)
locked <0x00000007b01428c0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:179)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:662)





https://issues.apache.org/jira/browse/YARN-3197  [已经解决]
BUG 特点
Confusing log generated by CapacityScheduler
2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...


https://issues.apache.org/jira/browse/HDFS-7060 [截止本贴,正在修复中]
BUG 特点
Avoid taking locks when sending heartbeats from the DataNode


回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

站长推荐 上一条 /4 下一条

返回顶部