regex - 根据另一列的值填充 CSV 列
问题描述
在 Bash 脚本中,我想根据另一列(第 1 列)的值填充一个当前为空的列(第 5 列)。
我想我可以使用awk
它来达到预期的结果,但我遇到了语法问题:
awk -F, '
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_f_[0-9]{3}[a-z]\.tif$/{$5="Text"}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_[a-b]_[0-9]{1,3}[a-z]?\.tif$/{$5="Front matter"}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_y_[0-9]{1,3}[a-z]?\.tif$/{$5="Back matter"}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_z_1[a-z]?\.tif$/{$5="Back matter"}
' file.csv
我的输入如下所示:
File Name,Item Sequence,Visibility,Title
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0001_a_1.tif,1,discovery,Front Board Outside
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0002_a_1a.tif,2,discovery,Front Board Outside
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0003_b_000.tif,3,discovery,Front Board Inside
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0009_b_003v.tif,9,discovery,Flyleaf 003v
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0010_f_001r.tif,10,discovery,f. 001r
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0060_y_001r.tif,60,discovery,Flyleaf 001r
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0070_y_999.tif,70,discovery,Back Board Inside
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0071_z_1.tif,71,discovery,Back Board Outside
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0072_z_1a.tif,72,discovery,Back Board Outside
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0073_z_2.tif,73,discovery,Spine
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0074_z_3.tif,74,discovery,Fore edge
所需结果应如下所示,其中第五列 ( IIIF Range
) 已由我根据第 1 列 ( ) 的值在上面分配的值 ( Front matter
、Text
、Back matter
和空白)填充File Name
:
File Name,Item Sequence,Visibility,Title,IIIF Range
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0001_a_1.tif,1,discovery,Front Board Outside,Front matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0002_a_1a.tif,2,discovery,Front Board Outside,Front matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0003_b_000.tif,3,discovery,Front Board Inside,Front matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0009_b_003v.tif,9,discovery,Flyleaf 003v,Front matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0010_f_001r.tif,10,discovery,f. 001r,Text
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0060_y_001r.tif,60,discovery,Flyleaf 001r,Back matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0070_y_999.tif,70,discovery,Back Board Inside,Back matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0071_z_1.tif,71,discovery,Back Board Outside,Back matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0072_z_1a.tif,72,discovery,Back Board Outside,Back matter
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0073_z_2.tif,73,discovery,Spine,,
Masters/sinaimasters/ara/arabic_0695/sld_arb0695_0074_z_3.tif,74,discovery,Fore edge,,
解决方案
您可以使用~
运算符将字符串与正则表达式模式匹配:
awk -F, 'BEGIN{OFS=","}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_f_[0-9]{3}[a-z]\.tif$/{$5="Text"}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_[a-b]_[0-9]{1,3}[a-z]?\.tif$/{$5="Front matter"}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_y_[0-9]{1,3}[a-z]?\.tif$/{$5="Back matter"}
$1~/sld_[a-z]{3}[0-9]{4}_[0-9]{4}_z_1[a-z]?\.tif$/{$5="Back matter"}
1' file.csv
推荐阅读
- c++ - 是 const_cast
(const char*) 在 std::string::data() 的情况下未定义的行为? - unity3d - Physics2D.OverLapBox 检测应忽略的对象
- regex - 如何使用 Spark 在 Scala 中为 Regex 类构建编码器
- r - R中的内存分析:如何找到最大内存使用的位置?
- python - 在不知道聚类数量的情况下根据起点和终点对线进行聚类
- java - Spring Boot项目上的Hibernate SessionFactory注入不起作用
- c# - 如何将 Double 存储在 Type 变量中?
- python - Numpy,将数据帧乘以输出 NaN 的数字
- recommendation-engine - 在 CF item-item 推荐器中,当矩阵稀疏时如何计算项目相似度?
- java - 如何使用最终 int 设置字符串数组的大小?