python - 如何按时间顺序将多个csv文件合并到彼此的右侧?(Python)
问题描述
我目前已经下载了 120 个文件(10 年,逐月)的 csv 数据。
我正在使用下面的一些代码将所有这些合并到一个按时间顺序排列的文档中,例如从 1/1/09 到 1/1/19。
from glob import glob
files = sorted(glob('*.csv'))
with open('cat.csv', 'w') as fi_out:
for i, fname_in in enumerate(files):
with open(fname_in, 'r') as fi_in:
if i_line > 0 or i == 0:
fi_out.write(line)
这一切正常,但是知道我还下载了相同类型的数据,除了不同的产品。我还按时间顺序对所有这些新数据进行排序,但将其与旧数据集并排放置。
我收到这样的错误:
任何帮助,将不胜感激。
编辑1:
Traceback (most recent call last):
File "/Users/myname/Desktop/collate/asdas.py", line 4, in <module>
result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP']], data2, on='SETTLEMENTDATE')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 61, in merge
validate=validate)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 551, in __init__
self.join_names) = self._get_merge_keys()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 871, in _get_merge_keys
lk, stacklevel=stacklevel))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1382, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'SETTLEMENTDATE'
编辑2:
import pandas as pd
df1 = pd.read_csv("product1.csv")
df2 = pd.read_csv("product2.csv")
combine = pd.merge(df1, df2, on='DATE', how='outer')
combine.columns = ['product1_price', 'REGION1', 'DATE', 'product2_price', 'REGION2']
combine[['DATE','product1_price','product2_price']]
combine.to_csv("combine.csv",index=False)
错误:
Traceback (most recent call last):
File "/Users/george/Desktop/collate/asdas.py", line 5, in <module>
combine.columns = ['VICRRP', 'REGION1', 'SETTLEMENTDATE', 'QLD1RRP', 'REGION2']
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 4389, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 646, in _set_axis
self._data.set_axis(axis, labels)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 3323, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 9 elements, new values have 5 elements
解决方案
将数据加载到数据框中
import pandas as pd
data1 = pd.read_csv("filename1.csv")
data2 = pd.read_csv("filename2.csv")
合并两个数据框SETTLEMENTDATE
result = pd.merge(data1, data2, on='SETTLEMENTDATE')
这假设settlementdate
两个数据帧之间存在一对一的关系。如果没有,就会有重复。
编辑:要删除列“PERIOD TYPE”,请执行
result = pd.merge(data1[['REGION', 'TOTALDEMA', 'RRP', 'SETTLEMENTDATE']], data2, on='SETTLEMENTDATE')
推荐阅读
- datetime - 将“2021-05-06 00:00:00 +0530 IST”之类的字符串转换为 time.Time 值
- bash - 即使我已将驱动程序位置添加到我的 PATH 也会弹出错误消息
- python - SQLAlchemy filter_by 来自同一列的多个项目
- google-kubernetes-engine - GKE 限制容器拥有外部 IP
- java - 如何使用 openapi-generator-maven-plugin 仅更改一个字段的类型?
- binary - 二进制表示 - 7 位二进制补码
- list - 如何将重复元素从已经存在的单链表复制到新的单链表
- java - 更新到 vaadin 8 后出现意外异常
- vtk - 无法在 Windows 10 中使用 VTK,到处崩溃
- c# - Xamarin.Forms VersionTracking 未检测到首次运行