python - 解析生成的文件 Python
问题描述
我正在尝试将生成的文件解析为对象列表。
不幸的是,生成的文件的结构并不总是相同的,但它们包含相同的字段(以及许多其他垃圾)。
例如:
function foo(); # Don't Care
function maybeanotherfoo(); # Don't Care
int maybemoregarbage; # Don't Care
product_serial = "CDE1102"; # I want this <---------------------
unnecessary_info1 = 10; # Don't Care
unnecessary_info2 = "red" # Don't Care
product_id = 1134412; # I want this <---------------------
unnecessary_info3 = "88" # Don't Care
product_serial = "DD1232"; # I want this <---------------------
product_id = 3345111; # I want this <---------------------
unnecessary_info1 = "22" # Don't Care
unnecessary_info2 = "panda" # Don't Care
product_serial = "CDE1102"; # I want this <---------------------
unnecessary_info1 = 10; # Don't Care
unnecessary_info2 = "red" # Don't Care
unnecessary_info3 = "bear" # Don't Care
unnecessary_info4 = 119 # Don't Care
product_id = 1112331; # I want this <---------------------
unnecessary_info5 = "jj" # Don't Care
我想要一个对象列表(每个对象都有:序列号和 ID)。
我尝试了以下方法:
import re
class Product:
def __init__(self, id, serial):
self.product_id = id
self.product_serial = serial
linenum = 0
first_string = "product_serial"
second_string = "product_id"
with open('products.txt', "r") as products_file:
for line in products_file:
linenum += 1
if line.find(first_string) != -1:
product_serial = re.search('\"([^"]+)', line).group(1)
#How do I proceed?
任何建议将不胜感激!谢谢!
解决方案
我在这里使用 内联数据io.StringIO()
,但您可以替换data
您的products_file
.
这个想法是我们将键/值收集到current_object
中,一旦我们拥有了我们知道的单个对象(两个键)所需的所有数据,我们就将其推送到一个列表中objects
并启动一个新的current_object
.
您可以使用类似的东西if line.startswith('product_serial')
来代替公认的复杂的正则表达式。
import io
import re
data = io.StringIO("""
function foo();
function maybeanotherfoo();
int maybemoregarbage;
product_serial = "CDE1102";
unnecessary_info1 = 10;
unnecessary_info2 = "red"
product_id = 1134412;
unnecessary_info3 = "88"
product_serial = "DD1232";
product_id = 3345111;
unnecessary_info1 = "22"
unnecessary_info2 = "panda"
product_serial = "CDE1102";
unnecessary_info1 = 10;
unnecessary_info2 = "red"
unnecessary_info3 = "bear"
unnecessary_info4 = 119
product_id = 1112331;
unnecessary_info5 = "jj"
""")
objects = []
current_object = {}
for line in data:
line = line.strip() # Remove leading and trailing whitespace
m = re.match(r'^(product_id|product_serial)\s*=\s*(\d+|"(?:.+?)");?$', line)
if m:
key, value = m.groups()
current_object[key] = value.strip('"')
if len(current_object) == 2: # Got the two keys we want, ship the object
objects.append(current_object)
current_object = {}
print(objects)
推荐阅读
- android - MotionLayout/Carousel 当前状态
- css - focus-within 适用于 Android 浏览器,但不适用于 iOS
- javascript - 为什么无法在 firebase.app() 中读取属性“app”?
- terraform - 迭代 Terraform 动态块内的单个地图
- php - 具有多对多关系的模型在更新时不会与 symfony 中的 Doctrine 保持一致
- html - img 标签(在 nav 内)扰乱了它之后所有元素的定位(边距也变得无用)
- python - Pandas Rolling vs Scipy kurtosis - 严重的数值不准确
- python - 为什么我的模型没有对所有数据进行训练(Tensorflow 2)
- c# - 我有一个模态表单 ShowDialog 我想关闭表单 如果 1 按下按钮(有效) 2 如果静态变量中的值发生变化(无效)
- reactjs - 如何在 Redux 中使用扩展运算符获取内部数组和对象的旧状态