bash - 如何使用 bash 按字符串拆分列?
问题描述
给定具有八列的制表符分隔文件:
22 51244237 rs575160859 C T 100 PASS AC=19;AF=0.00379393;AN=5008;NS=2504;DP=13345;EAS_AF=0;AMR_AF=0.0043;AFR_AF=0;EUR_AF=0.0099;SAS_AF=0.0061;AA=.|||;VT=SNP
如何使用 bash 从第八列中的信息创建一个新的制表符分隔文件,其中包含以下列:AF;EAS_AF;AMR_AF;AFR_AF;EUR_AF;SAS_AF 和对应的数值?
IE:
#AF EAS_AF AMR_AF AFR_AF EUR_AF SAS_AF
0.00379393 0 0.0043 0 0.0099 0.0061
我知道我可以用“;”分割第八列 (https://unix.stackexchange.com/questions/156919/splitting-a-column-using-awk)然后删除不需要的文本列和文本字符串(即“AF =”),但是有没有更有效的方法去做这个?
谢谢
解决方案
请您尝试以下操作。
awk '
{
match($0,/AF[^;]*/)
af=substr($0,RSTART,RLENGTH)
match($0,/EAS_AF[^;]*/)
eas=substr($0,RSTART,RLENGTH)
match($0,/AMR_AF[^;]*/)
amr=substr($0,RSTART,RLENGTH)
match($0,/AFR_AF[^;]*/)
afr=substr($0,RSTART,RLENGTH)
match($0,/EUR_AF[^;]*/)
eur=substr($0,RSTART,RLENGTH)
match($0,/SAS_AF[^;]*/)
sas=substr($0,RSTART,RLENGTH)
VAL=af OFS ac OFS eas OFS amr OFS afr OFS eur OFS sas
split(VAL,array,"[= ]")
print array[1],array[4],array[6],array[8],array[10],array[12] ORS array[2],array[5],array[7],array[9],array[11],array[13]
}' Input_file | column -t
说明:这里也为上述代码添加说明。
awk '
{
match($0,/AF[^;]*/) ##Using match out of the box awk function for matching AF string till semi colon.
af=substr($0,RSTART,RLENGTH) ##creating variable named af whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/EAS_AF[^;]*/) ##Using match out of the box awk function for matching EAS_AF string till semi colon.
eas=substr($0,RSTART,RLENGTH) ##creating variable named eas whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/AMR_AF[^;]*/) ##Using match out of the box awk function for matching AMR_AF string till semi colon.
amr=substr($0,RSTART,RLENGTH) ##creating variable named amr whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/AFR_AF[^;]*/) ##Using match out of the box awk function for matching AFR_AF string till semi colon.
afr=substr($0,RSTART,RLENGTH) ##creating variable named afr whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/EUR_AF[^;]*/) ##Using match out of the box awk function for matching EUR_AF string till semi colon.
eur=substr($0,RSTART,RLENGTH) ##creating variable named eur whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/SAS_AF[^;]*/) ##Using match out of the box awk function for matching SAS_AF string till semi colon.
sas=substr($0,RSTART,RLENGTH) ##creating variable named sas whose value is substring of indexes of RSTART to till value of RLENGTH.
VAL=af OFS ac OFS eas OFS amr OFS afr OFS eur OFS sas ##Creating variable VAL whose value is values of all above mentioned variables.
split(VAL,array,"[= ]") ##Using split function of awk to split it into array named array with delimiter space OR =.
print array[1],array[4],array[6],array[8],array[10],array[12] ORS array[2],array[5],array[7],array[9],array[11],array[13] ##Printing all array values as per OP.
af=ac=eas=amr=afr=eur=sas="" ##Nullifying all variables mentioned above.
}' Input_file | column -t ##Mentioning Input_file name here and passing awk output to column command to take output in TAB format.
推荐阅读
- swift - Xcode - 依赖 PickerView 选择
- javascript - revokeObjectURL() 在 Safari 中不起作用?
- android - 在 Vulkan 中显示来自 ImageReader 的纹理
- c++ - 在程序中使用某个可执行文件的返回值
- java - 如何使用模拟单元测试覆盖静态字段?
- javascript - 导入后如何设置 React 组件的样式
- python - 使用 NLP - Spacy Matcher 提取电子邮件,然后对其进行加密和解密
- ios - 在换行符后附加字符串
- python - Pandas:如何使用生成的 MultiIndex 在另一列中插入 DataFrame 列?
- marklogic - MarkLogic 中的增量备份比平时花费的时间更长?