linux - 如何检查文件是否存在而不在 bash 脚本中创建竞争条件?
问题描述
如果我错了,请纠正我,但我所知道的以及我对竞争条件和 TOCTOU(检查时间和使用时间)错误的理解,以这种方式检查文件是否存在:
if [ -f /path/to/file ]; then
#File exists do some operations on it
fi
创建竞争条件和 TOCTOU 错误。那么有没有其他方法可以在不创建竞争条件的情况下检查文件或目录是否存在,或者如果文件不存在则尝试打开文件并处理错误。
我知道在大多数脚本中使用以前的方法可能不是那么重要,但对我来说,最好练习避免这种情况。
谢谢你的帮助。
解决方案
为避免竞争条件,您可以将文件重命名为第一步锁定条件。在许多文件系统上,这是一个不能同时完成的“原子”操作(一次 inode 写入)。
这样,如果重命名成功,您可以确定该文件存在并且您的其他进程都没有使用其原始名称。
例如,使用当前进程 PID 重命名文件:
mv /path/to/file path/to/file.$$
if [ $? = 0 ] ; then
# Success, we can work on path/to/file.$$, and we're then the only one to do so from
# our processes point of view.
cat path/to/file.$$ # doing something with the file
# At the end, we can rename/move the file as 'processed'
mv path/to/file.$$ processed_path/to/file
fi
这样,您还可以对带有 PID 号作为扩展名的文件进行恢复过程。
编辑:正如@Thomas 所主张的,这里是这个解决方案的基本实现,作为 bash 脚本,process
. 除非在目录树中,例如:
[`进程`当前目录] |-->[input] 脚本查找要处理的“*.txt”文件的输入目录 |-->[input_path_etl] 脚本将放置 ETL 处理文件的输入目录
该脚本要求/proc
文件系统进行简单的进程检查。对于垂直可读性,SC2181尚未应用。
该脚本处理文件./process
并可以在崩溃时./process -r
从其当前路径进行恢复。这只是一个例子来说明如何使用 mv lock。此处对 .txt 文件的处理是将文件中的数据虚构加载到数据库中的第一步,以及为 ETL 处理器生成虚构文件的第二步。
#!/bin/bash
# process factory paths, should read from a config file, LDAP source, wathever...
process_input_path="input"
process_input_etl_path="input_path_etl"
# Example of an imaginary process that load stdin into a db
load_into_db() {
return 0;
}
# Example of an imaginary process that clean the data in the db for recovery
# Parameters : { filename }
# Returns: 0 successful recover, 1 otherwise
# filename: file path and name which require cleaning in db, mandatory
# stderr: potential cleaning erros
clean_db() {
if [ $# != 1 ] ; then
echo "ERROR: clean_db, wrong parameters" 2>&1
return 1;
fi
return 0;
}
# Example of an imaginary process that load a file into a db
# Parameters: { filename }
# returns: 0 if successfull, 1 if failed
# filename: file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_first_step() {
if [ $# != 1 ] ; then
echo "ERROR: process_first_step wrong parameters" 2>&1
return 1;
fi
# first example step, load things from the file into a db
cat "$1.$$_1" | load_into_db
if [ $? = 0 ] ; then
# rename first the file to means step 1 was succesfully done and
# we go for the second
mv "$1.$$_1" "$1.$$_2"
if [ $? = 0 ] ; then
# success, the file is ready for step 2
return 0;
fi
fi
# If we're here, something went wrong in step 1, exiting with error
return 1;
}
# Example of an imaginary process that put a file into the input path of an ETL
# Parameters: { filename }
# returns: 0 if successfull, 1 if failed
# filename: file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_second_step() {
if [ $# != 1 ] ; then
echo "ERROR: process_second_step wrong parameters" 2>&1
return 1;
fi
# the file is ready for step 2, we create the appropriate input
# for the ETL with some sed transfomration beforehand
cat "$1.$$_2" | sed 's/line/lInE/g' > "$1.$$_2.etl"
if [ $? = 0 ] ; then
# Success, the file is ready for the ETL factory process,
# we move it in with an atomic mv to make it visible from
# the ETL factory process
mv "$1.$$_2.etl" "${process_input_etl_path}/"
if [ $? = 0 ] ; then
# Successful, step 2 is done
return 0;
fi
fi
# If we're here, something went wrong in step 2, exiting with error
return 1;
}
# Example of an imaginary file processor that conducts all the
# required step on the provided file
# Parameters : { filename }
# Returns : 0 if successful, 1 otherwise
# filename : file's path and name of the file to process, mandatory
# stderr: potential processing errors
process_file() {
if [ $# != 1 ] ; then
echo "ERROR: process_file, wrong parameters" 2>&1
return 1;
fi
# Lock the file for processing step one
mv "$1" "$1.$$_1"
if [ $? = 0 ] ; then
# ok we have the file for us
# first example step, load things from the file into a db
process_first_step "$1"
if [ $? = 0 ] ; then
# first step is successful, so continue the process
# next example step, add the loaded lines with transformations into the input path of antoher process factory (like an ETL)
process_second_step "$1"
if [ $? = 0 ] ; then
# Second step is susccesful, we can now rename the file with
# a suffix meaning it was fully processed, a filename that would
# not be visible for the factory process
mv "$1.$$_2" "$1_processed"
if [ $? != 0 ] ; then
# if this failed, we have to return an error,
# the current file name would be $1.$$_2, not visible
# from the process factory and the error message will mean
# that the file was fully processed but can't be renamed
# at the end, so no recovering is required
echo "ERROR: process_file, $1 can't be renamed as fully processed." 2>&1
return 1;
fi
# if we're here, the file was fully processed and rename accordingly,
# we return a success status
return 0;
fi
fi
fi
# If we're here, something went wrong in the process, we exit with an error
# the actual filename will be $1.$$_1 or $1.$$_2 depending of where it was
# in the processing chain, it will not be visible from the main
# process factory and the rcovery process can then process it accordingly
return 1;
}
# Example of an imaginary process recovery for orphan files due to a crash,
# power outage, unexpected reboot, CTRL^C, etc.
# Returns: 0 for success, 1 if error(s)
# stdout: recovery operations infos if any
# stderr: potential error(s)
process_recovery() {
if [ $# != 0 ] ; then
echo "ERROR: process_recovery, wrong parameters." 2>&1
return 1;
fi
# local variables
local process_FILE=""
local process_PID=""
local process_STEP=""
local process_CMD=""
# flag for the file that means :
# 0 : do not recover
# 1 : recover
# 2 : can't recover
# 3 : recover successful, rename to put the file back in the process
# 4 : recover successful, rename it as fully processed
local recover_status=0
# flag to check if the recover process is succesful,
# 0: success
# 1: error(s)
local recovery_status=0
# We can only have one recovery process at a time, check for the corresponding lock, we use an atomic mkdir for that
mkdir "${process_input_path}/recover" &>/dev/null
if [ $? != 0 ] ; then
# if it fails, it means there is probably already a running recover
echo "ERROR: process_recovery, a recovery seems to be still in progress." 2>&1
echo " if there is no more running recovery (crash)," 2>&1
echo " disarm manually the lock by removing the recover folder." 2>&1
echo " Check also that the input folder is writable for script." 2>&1
return 1;
fi
# We first have to check every files in the input path that match
# a *.txt.<PID>_<step> pattern
find "${process_input_path}/" -name '*.txt.[0-9]*_[12]' | ( while read -r file_to_check || exit ${recovery_status}; do
# By default, do not recover
recover_status=0
# Get the PID and check if there is a running corresponding process
process_PID="$(echo "${file_to_check}" | sed 's/^.*\.txt\.\([^_]*\)_[0-9]*$/\1/')"
if [[ $? != 0 || "${process_PID}" = "${file_to_check}" ]] ; then
# Something went wrong, we output an error on stderr and set the flag
echo "ERROR: process_recovery, failed to parse pid from file name ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# We check the shell process through /proc and check it is our
process_CMD="$(cat "/proc/${process_PID}/comm" 2>/dev/null)"
if [[ $? = 0 && "$(echo "${process_CMD}" | grep process.sh)" != "" ]] ; then
# There is a process.sh with the same PID, no recover needed
echo "File ${file_to_check} is processed by PID ${process_PID}..."
else
# There is no corresponding process, but it could have finished during
# our operations, so we check if the file is still here
if [ -e "${file_to_check}" ] ; then
# The file is still here, so we need to recover
echo "XX${process_CMD}"
recover_status=1;
fi
fi
fi
if [ "${recover_status}" = "1" ] ; then
# The file should be recovered, signal it
echo "Recovering file ${file_to_check}..."
# Get the original file name
process_FILE="$(echo "${file_to_check}" | sed 's/^\(.*\.txt\)\.[^_]*_[0-9]*$/\1/')"
if [[ $? != 0 || "${process_FILE}" = "${file_to_check}" ]] ; then
# Something went wrong, we output an error on stderr and set the flag
echo "ERROR: process_recovery, failed to parse original name from file name ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# We need to know at which step it was
process_STEP="$(echo "${file_to_check}" | sed 's/^.*\.txt\.[^_]*_\([0-9]*\)$/\1/')"
if [[ $? != 0 || "${process_STEP}" = "${file_to_check}" ]] ; then
# Something went wrong, we output an error on stderr and set the flag
echo "ERROR: process_recovery, failed to parse step from file name ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
fi
fi
# Still ok to recover ?
if [ "${recover_status}" = "1" ] ; then
# check the step
case "${process_STEP}" in
"1")
# Do database cleaning for the file, we will revert and rename the file
# so it will be processed next by the factory process
clean_db "${file_to_check}"
if [ $? != 0 ]; then
# The cleaning process has failed, signal it
echo "ERROR: process_recovery, failed to clean the db for ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# Cleaning was successful, rename the file so it will be
# visible at new from the process factory
recover_status=3;
fi
;;
"2")
# If the file is still here, check if it is not in the input path of the ETL
# or if the ETL is/has already processing/processed it
if [[ -e "${process_input_etl_path}/${process_FILE}.etl" || -e "${process_input_etl_path}/${process_FILE}.etl_processed" ]] ; then
# The file as fully completed step 2 then and should be marked as processed
recover_status=4;
else
# If the file has not reach the ETL input path, we just have to launch step 2 for the file
# If there is .etl local file, we aren't sure it was completed before crash, so a redo of step will simply overwrite it,
# as it is a local file in the current path, it has never been seen by the ETL
# We rename it for processing with the recovery PID
echo "Recovering ${file_to_check} on step 2 as ${process_FILE}.$$_2..."
mv "${file_to_check}" "${process_FILE}.$$_2"
if [ $? != 0 ]; then
# The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to rename file ${file_to_check} for step 2" 2>&1
recovery_status=1;
recover_status=2;
else
# File is ready for step 2
process_second_step "${process_FILE}"
if [ $? != 0 ]; then
# The step 2 redo failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to redo step 2 for ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
# The file as fully completed step 2 then and should be marked as processed
recover_status=4;
# Need so that the processed part deals with the new filename
file_to_check="${process_FILE}.$$_2"
fi
fi
fi
;;
*)
# Abnormal situation, unknow step, signal it
echo "ERROR: process_recovery, unknown step for ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
;;
esac;
# If the recovery operations were successful, we can now rename the file accordingly
case "${recover_status}" in
"3")
# Rename it 'back' so the file will be processed by the process factory next
mv "${file_to_check}" "${process_FILE}"
if [ $? != 0 ]; then
# The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to put back the file ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
echo "Recovering ${file_to_check}...done, reverted."
fi
;;
"4")
# Rename as already fully processed
mv "${file_to_check}" "${process_FILE}_processed"
if [ $? != 0 ]; then
# The renaming failed, signal it. The file is still here, so a future recovery can handle it , nothing more to do
echo "ERROR: process_recovery, failed to rename the fully processed file ${file_to_check}" 2>&1
recovery_status=1;
recover_status=2;
else
echo "Recovering ${file_to_check}...done, processed."
fi
;;
esac;
fi
fi
done )
if [ $? != 0 ] ; then
# the recovery processing meets errors, we have to exit with error
recovery_status=1;
fi
# Finished, we can remove the recovery lock, there'll b nop race condition if a second recovery process start now
# We can only have one recovery process at a time, check for the corresponding lock, we use an atomic mkdir for that
rmdir "${process_input_path}/recover" &>/dev/null
if [ $? != 0 ] ; then
echo "ERROR: process_recovery, can't remove the recovery lock, you'l have to manually remove it." 2>&1
recovery_status=1;
fi
# Return status
return ${recovery_status};
}
# Example of an imaginary file processing factory
# this factory will look for all files matching '*.txt' in its input path
# Parameteres: [ -r ]
# Returns : 0 if all matching files in the input path were processed,
# 1 otherwise
# -r : Instead of processing files, launch the recovery process, optional
# stdout : processing log
# stderr : potential processing errors
process_files() {
if [ $# -gt 1 ]; then
echo "ERROR: process_files, wrong parameters" 2>&1
return 1;
fi
if [[ $# = 1 && "$1" = "-r" ]] ; then
# launch the recovery process and exit its exit status
process_recovery
return $?
fi
if [ $# != 0 ] ; then
echo "ERROR: process_files, unknown parametrs : $*" 2>&1
return 1;
fi
# Parameter(s) have been processed, we are now looking for files to process
local process_status=0;
find "${process_input_path}/" -name '*.txt' | ( while read -r file_to_process || exit ${process_status}; do
echo "Processing ${file_to_process}..."
process_file "${file_to_process}"
if [ $? != 0 ] ; then
# Something went wrong, signal it on stderr
echo "Processing ${file_to_process} failed, the file may has been locked by antoher process or may be in the wrong format." 2>&1
# We set the flag for signaling trouble but we continue to process
# the following files
process_status=1;
else
echo "Processing ${file_to_process}...done."
fi
done )
if [ $? != 0 ] ; then
# the factory processing meets errors, we have to exit with error
return 1;
fi
# All matching files were correctly processed or there was no
# matching files to process, we return a success
return 0;
}
# The main entry point
# check that we have paths before anything harmful happend..
if [[ -z "${process_input_path}" || -z "${process_input_etl_path}" ]] ; then
echo "ERROR: $0, configuration missing..." 2>&1
exit 1;
fi
# Before processing any file, we check for /proc
if [ ! -e "/proc/$$" ] ; then
echo "ERROR: $0, /proc is required..." 2>&1
exit 2;
fi
# We force a common identifier for the processing script, process.sh, so recovery can easily check for running process
echo "process.sh" > "/proc/$$/comm"
if [ $? != 0 ] ; then
echo "ERROR: $0, can't set /proc/$$/comm..." 2>&1
exit 3;
fi
process_files "$@"
推荐阅读
- arrays - 在数组中声明可选的最后一个元素?
- netsuite - 为 NetSuite 创建 Webhook
- angular - 在 Angular 8 中即时更改 moment.js 语言环境
- java - 选择微调器项目时,选定项目更改为第一项
- python-3.x - Visual Studio 代码抛出致命的 Python 错误:Py_Initialize:无法加载文件系统编解码器(conda)
- python-2.7 - 函数格式表示形式识别
- c++ - 如何将用逗号和空格分隔的整数读入二维数组?
- pandas - 我应该如何使用上一行的 fillna 结果填充 na?
- html - 无法向左移动这个 div?
- clojure - 克洛朱尔。使用正则表达式替换具有相同字符数的子字符串