首页 > 解决方案 > Bash string substitution with %

问题描述

I have a list of files named with this format:

S2_7-CHX-2-5_Chr5.bed
S2_7-CHX-2-13_Chr27.bed
S2_7-CHX-2-0_Chr1.bed 

I need to loop through each file to perform a task. Previously, I had named them without the step 2 indicator ("S2"), and this format had worked perfectly:

for FASTQ in *_clean.bam; do
  SAMPLE=${FASTQ%_clean.bam}
  echo $SAMPLE
  echo $(samtools view -c ${SAMPLE}_clean.bam)
done

But now that I have the S2 preceding what I would like to set as the variable, this returns a list of empty "SAMPLE" variables. How can I rewrite the following code to specify only S2_*.bed?

for FASTQ in S2_*.bed; do
  SAMPLE=${S2_FASTQ%.bed}
  echo $SAMPLE
done

Edit: I'm trying to isolate the unique name from each file, for example "7-CHX-2-13_Chr27" so that I can refer to it later. I can't use the "S2" as part of this because I want to rename the file with "S3" for the next step, and so on.

Example of what I'm trying to use it for:

for FASTQ in S2_*.bed; do
  SAMPLE=${S2_FASTQ%.bed}
  echo $SAMPLE
  #rename each mapping position with UCSC chromosome name using sed
  while IFS=, read -r f1 f2; do
    #rename each file
    echo "  sed "s/${f1}.1/chr${f2}/g" S2_${SAMPLE}_Chr${f2}.bed > S3_${SAMPLE}_Chr${f2}.bed" >> $SCRIPT
  done < $INPUT
done

标签: bashvariables

解决方案


The name of the variable is still $FASTQ, the S2_ is not part of the variable name, but its value.

sample=${FASTQ%.bed}
#        ~~~~~|~~~~
#          |  |  |
#    Variable |  What to remove
#     name    |
#           Remove
#        from the right

If you want to remove the S2_ from the $sample, use left hand side removal:

sample=${sample#S2_}

The removals can't be combined, you have to proceed in two steps.

Note that I use lower case variable names. Upper case should be reserved for environment and internal shell variables.


推荐阅读