python - 创建一个循环来处理多个文件
问题描述
我已经编写了下面的代码,但目前我需要为每个文件重新键入相同的条件,并且由于有超过 100 个文件,这并不理想。
我想不出一种方法来使用循环来实现这一点,该循环将读取所有这些文件并过滤掉 MP 中的值。同时,将两个新列添加到每个过滤器文件作为下面的书面代码将是我目前知道的唯一方法。我尝试获取一个新的组合数据框,其中包含所有过滤器文件及其条件
请建议使用循环实现此目的的方法:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
df1 = pd.read_csv(r'E:\Unmanned Cars\Unmanned Cars\2017040810_052.csv')
df2 = pd.read_csv(r'E:\Unmanned Cars\Unmanned Cars\2017040901_052.csv')
df3 = pd.read_csv(r'E:\Unmanned Cars\Unmanned Cars\2017040902_052.csv')
df1 =df1["MP"].unique()
df1=pd.DataFrame(df1, columns=['MP'])
df1["Dates"] = "2017-04-08"
df1["Inspection"] = "10"
##
df2 =df2["MP"].unique()
df2=pd.DataFrame(df2, columns=['MP'])
df2["Dates"] = "2017-04-09"
df2["Inspection"] = "01"
##
df3 =df3["MP"].unique()
df3=pd.DataFrame(df3, columns=['MP'])
df3["Dates"] = "2017-04-09"
df3["Inspection"] = "02"
Final = pd.concat([df1,df2,df3,df4],axis = 0, sort = False)
解决方案
也许这个示例代码会对你有所帮助。
#!/usr/bin/env python3
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
from os import path
import glob
import re
def process_file(file_path):
result = None
file_path = file_path.replace("\\","/")
filename = path.basename(file_path)
regex = re.compile("^(\\d{4})(\\d{2})(\\d{2})(\\d{2})")
match = regex.match(filename)
if match:
date = "%s-%s-%s" % (match[1] , match[2] , match[3])
inspection = match[4]
df1 = pd.read_csv(file_path)
df1 =df1["MP"].unique()
df1=pd.DataFrame(df1, columns=['MP'])
df1["Dates"] = date
df1["Inspection"] = inspection
result = df1
return result
def main():
# files_list = [
# r'E:\Unmanned Cars\Unmanned Cars\2017040810_052.csv',
# r'E:\Unmanned Cars\Unmanned Cars\2017040901_052.csv',
# r'E:\Unmanned Cars\Unmanned Cars\2017040902_052.csv'
# ]
directory = 'E:\\Unmanned Cars\\Unmanned Cars\\'
files_list = [f for f in glob.glob(directory + "*_052.csv")]
result_list = [ process_file(filename) for filename in files_list ]
Final = pd.concat(result_list, axis = 0, sort = False)
if __name__ == "__main__":
main()
我创建了一个process_file函数来处理每个文件。使用正则表达式从文件名中提取数据。此外,glob 模块用于从具有模式匹配和扩展的目录中读取文件。