首页 > 解决方案 > 同步块中等待语句的奇怪错误

问题描述

Worker.java 文件的摘录:

public class Worker extends Thread{

        public void run(){
                // Worker Thread periodically does its job.

       Master.getInstance().decrementNumOfWorkingWorkers();
        // This is the reporting part of the thread.
        // Aimed to wait other threads finish their job.
                synchronized (Master.getInstance().allFinished) {
            while (  Master.getInstance().getNumOfWorkingWorkers() > 0) {
                try {
                    Master.getInstance().allFinished.wait();
                } catch (Exception e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
            Main.printSync("Worker Thread-" + getPId() + " worked on");
        }
        }
        }

这是来自 Master.Java:

import java.util.LinkedList;
import java.util.Timer;
import java.util.TimerTask;

public class Master extends Timer {

    AllFinished allFinished;
    int day;
    public TimerTask task;
    LinkedList<Worker> Workers;
    private static Master instance = null;
    int numOfWorkingWorkers = 0;

    public class AllFinished
    {

    }

    public class PeriodicIncrement extends TimerTask {
        // Complete this class

        public void run() {

            Main.printSync("Day " + day + ":");
            Main.printSync("Queue: " + TaskQueue.getInstance().ConvertToString());

            day++;
            for (int i = 0; i < Workers.size(); i++) {

                synchronized (Workers.get(i)) {
                    Workers.get(i).notify();
                }
            }

            if (0 == numOfWorkingWorkers) {

                synchronized (allFinished) {
                    allFinished.notifyAll();
                }
                cancel(); // Terminate the timer thread
            }
        }
    }

    private Master(LinkedList<Worker> Workers) {
        super();
        this.task = new PeriodicIncrement();
        day = 0;
        allFinished = new AllFinished();
        this.Workers = Workers;
        numOfWorkingWorkers = this.Workers.size();
        this.schedule(task, 100, 100);
    }

}

对于具有 4 个工作线程的测试,在我在 Worker.java 中添加摘录部分之前,一切都很好。然后,为了在所有工人完成后报告每个工人的行动,我添加了那部分。算法非常简单。当工作人员完成其工作时,它会检查 TaskQueue 和 ProductOwner 中是否有任何工作。如果没有,它会中断其循环,然后在 Master 中减少 1 个活动工作线程计数器,然后在 Master 的 AllFinished 字段上调用 ​​wait。PeriodicIncrement 的 run() 方法检查这个计数器,如果它是 0(意味着所有工作人员都完成了他们的工作),它会在 AllFinished 上调用 notifyAll() 。

问题是,有时两个线程正在进入 Worker.java 中的摘录代码块,但其余线程从未进入,因此活动工作线程计数器从未减为 0,我的程序从未完成。如果我只是注释掉Worker.java中摘录的部分,除了随机整理和报告,一切都很好。我的意思是摘录的部分似乎有问题。

你能帮我找出来吗?

标签: javamultithreading

解决方案


在这么长时间没有使用低级并发原语之后,调试起来很有趣。根本原因的诀窍是使用JDK 提供的jstack工具。

╭───courtino ~
╰➤  sudo jstack -l 63978
2020-03-29 21:26:01
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.231-b11 mixed mode):

"DestroyJavaVM" #18 prio=5 os_prio=31 tid=0x00007ffa91491000 nid=0x1803 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Timer-0" #17 prio=5 os_prio=31 tid=0x00007ffa91f36800 nid=0x5903 waiting for monitor entry [0x0000700009aa4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.amazon.adnumsmissionmanagerservice.homework.ScrumMaster$PeriodicIncrement.run(ScrumMaster.java:42)
- waiting to lock <0x000000076bc70db8> (a com.amazon.adnumsmissionmanagerservice.homework.Programmer)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

"Programmer-4" #15 prio=5 os_prio=31 tid=0x00007ffa92cc6800 nid=0x5603 in Object.wait() [0x000070000989e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.completeTasksUntilNoneAvailable(Programmer.java:230)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.work(Programmer.java:165)
- locked <0x000000076bc71a58> (a com.amazon.adnumsmissionmanagerservice.homework.Programmer)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.run(Programmer.java:241)

"Programmer-3" #14 prio=5 os_prio=31 tid=0x00007ffa923af800 nid=0x5503 in Object.wait() [0x000070000979b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.work(Programmer.java:176)
- locked <0x000000076bcc8ac0> (a com.amazon.adnumsmissionmanagerservice.homework.ScrumMaster$AllFinished)
- locked <0x000000076bc70db8> (a com.amazon.adnumsmissionmanagerservice.homework.Programmer)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.run(Programmer.java:241)

"Programmer-2" #13 prio=5 os_prio=31 tid=0x00007ffa92c25000 nid=0x3f03 in Object.wait() [0x0000700009698000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.work(Programmer.java:176)
- locked <0x000000076bcc8ac0> (a com.amazon.adnumsmissionmanagerservice.homework.ScrumMaster$AllFinished)
- locked <0x000000076bc5fca0> (a com.amazon.adnumsmissionmanagerservice.homework.Programmer)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.run(Programmer.java:241)

"Programmer-1" #12 prio=5 os_prio=31 tid=0x00007ffa91fab800 nid=0x4203 in Object.wait() [0x0000700009595000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.completeTasksUntilNoneAvailable(Programmer.java:230)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.work(Programmer.java:165)
- locked <0x000000076bc43c80> (a com.amazon.adnumsmissionmanagerservice.homework.Programmer)
at com.amazon.adnumsmissionmanagerservice.homework.Programmer.run(Programmer.java:241)

"Service Thread" #11 daemon prio=9 os_prio=31 tid=0x00007ffa91f30800 nid=0x4403 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #10 daemon prio=9 os_prio=31 tid=0x00007ffa9227c000 nid=0x3c03 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #9 daemon prio=9 os_prio=31 tid=0x00007ffa9227b000 nid=0x4603 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #8 daemon prio=9 os_prio=31 tid=0x00007ffa92272800 nid=0x4803 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"JDWP Command Reader" #7 daemon prio=10 os_prio=31 tid=0x00007ffa9200f000 nid=0x3a03 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"JDWP Event Helper Thread" #6 daemon prio=10 os_prio=31 tid=0x00007ffa91019800 nid=0x4a03 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"JDWP Transport Listener: dt_socket" #5 daemon prio=10 os_prio=31 tid=0x00007ffa9181a000 nid=0x4b07 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=31 tid=0x00007ffa9180d800 nid=0x3603 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=31 tid=0x00007ffa91002000 nid=0x3003 in Object.wait() [0x0000700008b77000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076ab08ed8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x000000076ab08ed8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

"Reference Handler" #2 daemon prio=10 os_prio=31 tid=0x00007ffa92006800 nid=0x2e03 in Object.wait() [0x0000700008a74000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076ab06c00> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x000000076ab06c00> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"VM Thread" os_prio=31 tid=0x00007ffa90843000 nid=0x2d03 runnable

"GC task thread#0 (ParallelGC)" os_prio=31 tid=0x00007ffa91001800 nid=0x2307 runnable

"GC task thread#1 (ParallelGC)" os_prio=31 tid=0x00007ffa91801800 nid=0x2a03 runnable

"GC task thread#2 (ParallelGC)" os_prio=31 tid=0x00007ffa91802000 nid=0x5303 runnable

"GC task thread#3 (ParallelGC)" os_prio=31 tid=0x00007ffa91802800 nid=0x5203 runnable

"VM Periodic Task Thread" os_prio=31 tid=0x00007ffa91357800 nid=0x3d03 waiting on condition

JNI global references: 2236

几点观察:

  • 线程Timer-0(这是你的周期性任务)处于 BLOCKED 状态,等待 lock 0x000000076bc70db8,这是一个实例Programmer
  • 有4个程序员:
    • 他们中的 2 人仍在做一些工作并持有一把类型的锁Programmer(他们实际上是在自己身上持有一把锁)
    • 其他 2 名程序员已经完成并持有两把锁:一把锁在自己身上,一把锁在AllFinished. Programmer-3是这种线程的一个例子。

由于周期性任务在通知它之前尝试获取锁Programmer-3,它必须等待Programmer-3释放自己的锁,它不能这样做,因为它正在等待所有任务完成。僵局!

您的程序员锁定自己的原因是:

public synchronized void work()

这会将整个work方法放入监视器所属的同步块中this。由于Programmer该类是无状态的,并且它大部分工作不与其他线程交互,因此您实际上可以同步该work方法的一小部分。因此,您需要进行两项更改:

  • synchronized从签名中删除work
  • 将调用同步到方法wait内部work

    synchronized (this) {
        wait();
    }
    

它可以提供的一个教训是,在使用同步块时,您总是希望同步尽可能少的代码。任何不需要同步的东西都应该在块之外,以最大限度地提高并行性(块中发生的一切都是顺序的),可能会减少需要锁的频率(可能存在允许您跳过锁获取,如果你把它放在最低级别,这样会减少同步的开销),并且在某些情况下,像这样,避免死锁。


推荐阅读