首页 > 解决方案 > 从通过 xargs 启动的多个进程写入同一个 fifo 管道会导致行丢失

问题描述

我有一个脚本,我可以在其中并行化作业执行,同时监控进度。我使用xargs一个命名的 fifo 管道来执行此操作。我的问题是我虽然xargs表现良好,但写入管道的一些行丢失了。知道问题是什么吗?

例如以下脚本(基本上是我的带有虚拟数据的脚本)将产生以下输出并在最后挂起等待那些丢失的行:

$ bash test2.sh 
Progress: 0 of 99
DEBUG: Processed data 0 in separate process
Progress: 1 of 99
DEBUG: Processed data 1 in separate process
Progress: 2 of 99
DEBUG: Processed data 2 in separate process
Progress: 3 of 99
DEBUG: Processed data 3 in separate process
Progress: 4 of 99
DEBUG: Processed data 4 in separate process
Progress: 5 of 99
DEBUG: Processed data 5 in separate process
DEBUG: Processed data 6 in separate process
DEBUG: Processed data 7 in separate process
DEBUG: Processed data 8 in separate process
Progress: 6 of 99
DEBUG: Processed data 9 in separate process
Progress: 7 of 99
##### Script is hanging here (Could happen for any line) #####
#!/bin/bash
clear

printStateInLoop() {
  local pipe="$1"
  local total="$2"
  local finished=0

  echo "Progress: $finished of $total"
  while true; do
    if [ $finished -ge $total ]; then
      break
    fi

    let finished++
    read line <"$pipe"
      # In final script I would need to do more than just logging
    echo "Progress: $finished of $total"
  done
}

processData() {
  local number=$1
  local pipe=$2

  sleep 1 # Work needs time
  echo "$number" >"$pipe"
  echo "DEBUG: Processed data $number in separate process"
}
export -f processData

process() {
  TMP_DIR=$(mktemp -d)
  PROGRESS_PIPE="$TMP_DIR/progress-pipe"
  mkfifo "$PROGRESS_PIPE"

  DATA_VECTOR=($(seq 0 1 99)) # A bunch of data
  printf '%s\0' "${DATA_VECTOR[@]}" | xargs -0 --max-args=1 --max-procs=5 -I {} bash -c "processData \$@ \"$PROGRESS_PIPE\"" _ {} &

  printStateInLoop "$PROGRESS_PIPE" ${#DATA_VECTOR[@]}
}

process
rm -Rf "$TMP_DIR"

另一篇文章中,我得到了切换到while read line; do … done < "$pipe"(下面的函数)的建议,而不是while true; do … read line < "$pipe" … done在读取的每一行时不关闭管道。这降低了问题的频率,但它仍然会发生:缺少某些行,有时会出现xargs: bash: terminated by signal 13.

printStateInLoop() {
  local pipe="$1"
  local total="$2"
  local finished=0

  echo "Progress: $finished of $total"
  while [ $finished -lt $total ]; do
    while read line; do
      let finished++
      # In final script I would need to do more than just logging
      echo "Progress: $finished of $total"
    done <"$pipe"
  done
}

SO 上的很多人建议使用并行pv来执行此操作。遗憾的是,这些工具在非常有限的目标平台上不可用。相反,我的脚本基于xargs.

标签: bashshellconcurrencyxargsmkfifo

解决方案


解决方案(正如@markp-fuso 和@Dale 所指出的)是创建一个文件锁。

代替:

echo "$number" >"$pipe"

我现在使用flock先创建/等待锁:

flock "$pipe.lock" echo "$number" >"$pipe"

推荐阅读