python - Python: Use regex to extract a column of a file
问题描述
I am currently extracting columns in a file by using awk in os.system():
os.system("awk '{print $'%i'}' < infile > outfile"%some_column)
np.loadtxt('outfile')
Is there an equivalent way to accomplish this using regex?
Thanks.
Edit: I want to clarify that I am looking for the most optimal way to extract specific columns of large files.
解决方案
根据您的数据分隔符是什么,正则表达式可能是矫枉过正。如果分隔符很简单(空格或特定字符/字符串),您可以使用string.split
方法简单地分隔列。
这是一个示例程序来解释它是如何工作的:
column = 0 # First column
with open("data.txt") as file:
data = file.readlines()
columns = list(map(lambda x: x.strip().split()[column], data))
要打破这一点:
column = 0
# Read a file named "data.txt" into an array of lines
with open("data.txt") as file:
data = file.readlines()
# This is where we will store the columns as we extract them
columns = []
# Iterate over each line in the file
for line in data:
# Strip the whitespace (including the trailing newline character) from the
# start and end of the string
line = line.strip()
# Split the line, using the standard delimiter (arbitrary number of
# whitespace characters)
line = line.split()
# Extract the column data from the desired index and store it in our list
columns.append(line[column])
# columns now holds a list of strings extracted from that column
推荐阅读
- javascript - 如何在没有 concat 的情况下 forEach 数据然后是动态的?Javascript , 地图
- javascript - 为什么同一 API 调用的副本对于一个函数返回 undefined 而对另一个函数有效
- assembly - 只用一个寄存器计算简单的公式
- plc - 在 Twincat 上使用扩展时如何从基础 FB 调用代码
- prometheus - 如何在 grafana 中显示指标存在持续时间?
- java - Flutter 和 gradle 的问题:线程“main”java.io.IOException 中的异常:服务器返回 HTTP 响应代码:URL 的 403:http://services.gr
- python - discord.py - 当我尝试更改用户名时出现这样的错误,有人知道如何解决吗?
- python-3.x - mv: cannot stat 'uc?id=1Djfm4PqE7Su4WqEdZKiGL-8HtrbVBuMm': 没有这样的文件或目录
- neural-network - 与单隐藏相比,两个隐藏更可取?
- sql - 从 sql 中的结果中删除重复项