python - Is there some way to find the intersect in multiple file names between multiple CSV headers?
问题描述
I am trying to loop through all CSV files in a folder and find all header names that are in all files. I am thinking the code would start like this...it needs treatment and enhancement, for sure.
import glob
import pandas as pd
csvs = glob.glob('C:\\my_path\' + '*.csv')
master_set = set()
for file in csvs:
this_df = pd.read_csv(file)
cols = set(this_df.columns)
master_set = master_set.intersection(cols)
print(master_set)
This is just looping through files in a folder, obviously. What I want to do is compare all CSV headers in one folder, and check for the matches (intersection) of all headers, and print that result. Does it make sense? I hope so. I will need to do a UNION of all these files at some point. I am trying to determine the best way to get all common headers together. This is the lowest common denominator of the whole data series.
So, if I have 4 files with this schema:
colA colB colC colD colE
And, I have one file with this schema:
colA colC colE colX colX
Then, this is want I to see:
colA colC colE
解决方案
是的,您可以这样做,但需要您在文件列表上循环并存储结果。就示例而言,这是代码。
import pandas as pd
df1 = pd.read_csv("File1.csv")
df2 = pd.read_csv("File2.csv")
setA = set(df1.columns)
setB = set(df2.columns)
common = setA.intersection(setB)
推荐阅读
- javascript - 为什么 javasript 找不到变量?
- c# - Oracle 托管驱动程序和 OracleAQQueue .NET
- android - 在视图模型中管理视图状态
- python - 有没有办法使用 Python 而不是 Powershell 查询 Windows Server DHCP?
- excel - Excel的IF THEN公式
- c# - 在winforms的面板中添加大约4000个或更多控件创建错误创建窗口句柄
- java - JAVA中相同变量的一元增量运算符的奇怪表达式评估顺序
- node.js - 我如何检查从角度传递到服务器的数据到 node.js
- javascript - 如果长度大于 Jquery 隐藏 Div 元素
- python-3.x - pytorch中使用cnn的二元分类模型