pandas - 如何将多行和多列插入数据框
问题描述
我有一个包含汇率数据的数据框。我想以单位值 1 插入整个日期范围(从最小日期到最大日期)的基础货币(挪威克朗)。
试图合并数据框,但我的技能没有运气。该数据是为另一项任务进一步计算所必需的。
Currency Date Rate UoM
0 Swedish krona 2016-01-05 1.0395 Hundreds
1 Swedish krona 2016-01-06 1.0422 Hundreds
2 Swedish krona 2016-01-07 1.0452 Hundreds
3 Swedish krona 2016-01-08 1.0450 Hundreds
4 Swedish krona 2016-01-11 1.0437 Hundreds
5 Swedish krona 2016-01-12 1.0422 Hundreds
6 Swedish krona 2016-01-13 1.0338 Hundreds
7 Swedish krona 2016-01-14 1.0347 Hundreds
8 Swedish krona 2016-01-15 1.0279 Hundreds
9 Swedish krona 2016-01-18 1.0371 Hundreds
... ... ... ... ...
3313 US dollar 2019-03-15 8.5674 Units
3314 US dollar 2019-03-18 8.5223 Units
3315 US dollar 2019-03-19 8.5178 Units
3316 US dollar 2019-03-20 8.5358 Units
3317 US dollar 2019-03-21 8.4463 Units
3318 US dollar 2019-03-22 8.5315 Units
3319 US dollar 2019-03-25 8.5289 Units
预期的输出是数据帧的新行,即
3320 Norwegian krone 2016-01-06 1 Units
3321 Norwegian krone 2016-01-07 1 Units
3322 Norwegian krone 2016-01-08 1 Units
3323 Norwegian krone 2016-01-11 1 Units
... ... ... ... ...
XXXX Norwegian krone 2019-03-21 1 Units
XXXX Norwegian krone 2019-03-22 1 Units
XXXX Norwegian krone 2019-03-25 1 Units
解决方案
诀窍是像源数据一样获取其中有漏洞的日期范围,然后有效地构造重复行以进行追加和排序。构建数据框时,可以使用单个字典来填充数据框。
import pandas as pd
import csv
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""Currency,Date,Rate,UoM
Swedish krona,2016-01-05,1.0395,Hundreds
Swedish krona,2016-01-06,1.0422,Hundreds
Swedish krona,2016-01-07,1.0452,Hundreds
Swedish krona,2016-01-08,1.0450,Hundreds
Swedish krona,2016-01-11,1.0437,Hundreds
Swedish krona,2016-01-12,1.0422,Hundreds
Swedish krona,2016-01-13,1.0338,Hundreds
Swedish krona,2016-01-14,1.0347,Hundreds
Swedish krona,2016-01-15,1.0279,Hundreds
Swedish krona,2016-01-18,1.0371,Hundreds
US dollar,2019-03-15,8.5674,Units
US dollar,2019-03-18,8.5223,Units
US dollar,2019-03-19,8.5178,Units
US dollar,2019-03-20,8.5358,Units
US dollar,2019-03-21,8.4463,Units
US dollar,2019-03-22,8.5315,Units
US dollar,2019-03-25,8.5289,Units""")
df = pd.read_csv(csvdata, sep=",")
df = df.set_index(['Date'])
date_range = df.index.values
nk_df = pd.DataFrame(index=date_range, data={'Currency':'Norwegian krone', 'Rate':1, 'UoM':'Units'})
df = pd.concat([df, nk_df])
print(df.sort_index().head(10))
生产
0.24.2
Currency Rate UoM
2016-01-05 Swedish krona 1.0395 Hundreds
2016-01-05 Norwegian krone 1.0000 Units
2016-01-06 Swedish krona 1.0422 Hundreds
2016-01-06 Norwegian krone 1.0000 Units
2016-01-07 Norwegian krone 1.0000 Units
2016-01-07 Swedish krona 1.0452 Hundreds
2016-01-08 Swedish krona 1.0450 Hundreds
2016-01-08 Norwegian krone 1.0000 Units
2016-01-11 Norwegian krone 1.0000 Units
2016-01-11 Swedish krona 1.0437 Hundreds
推荐阅读
- sql - 如何在 SQL 中用小数表示百分比
- sql - SQOOP 导入:删除 ORACLE CLOB 数据类型中的 pilcrow
- python - Django 模型不接受字段
- python - 为什么我的类工厂没有填充我的动态对象的`__dict__`属性
- hibernate - 如何将从 Named Native 查询返回的结果映射到 pojo(非实体)类字段
- python - Librosa (Python) 到 Meyda (Node.js) 的转换
- c - 显示奇怪行为的反向链接列表
- file - 如何使用 Common Lisp 成功重写将元素附加到列表的文件?
- java - 使用 java 拆分大型 csv 文件时写入标头
- javascript - ID 令牌和访问令牌有什么区别,以及如何使用 JWT 实现它们?