python - 比较 2 个大小不均匀的 Pandas DataFrames 的匹配值并组合在一起,然后用最接近的数据替换 NaN 值
问题描述
我是 python 和 Stackoverflow 的新手。我在处理数据时遇到问题。我有两组不同大小的数据。df1 的大小为 1000,df2 的大小为 100000。这里是 df1 和 df2 的样本。
df1=
Date x y
0 2020-01-01 01:01 1.1 2.4
1 2020-01-01 01:05 4.2 5.5
2 2020-01-01 01:08 7.3 8.6
df2=
Date x y
0 2020-01-01 01:00 NaN NaN
1 2020-01-01 01:01 NaN NaN
2 2020-01-01 01:02 NaN NaN
3 2020-01-01 01:03 NaN NaN
4 2020-01-01 01:04 NaN NaN
5 2020-01-01 01:05 NaN NaN
6 2020-01-01 01:06 NaN NaN
7 2020-01-01 01:07 NaN NaN
8 2020-01-01 01:08 NaN NaN
9 2020-01-01 01:09 NaN NaN
10 2020-01-01 01:10 NaN NaN
我想做的是将它们组合在一起作为一个新的数据框,如果df1['Date']=df2['Date']
,df3 将显示如下。
df3=
Date x y
0 2020-01-01 01:00 NaN NaN
1 2020-01-01 01:01 1.1 2.4
2 2020-01-01 01:02 NaN NaN
3 2020-01-01 01:03 NaN NaN
4 2020-01-01 01:04 NaN NaN
5 2020-01-01 01:05 4.2 5.5
6 2020-01-01 01:06 NaN NaN
7 2020-01-01 01:07 NaN NaN
8 2020-01-01 01:08 7.3 8.6
9 2020-01-01 01:09 NaN NaN
10 2020-01-01 01:10 NaN NaN
然后,NaN 值将等于上面最接近的值
df3=
Date x y
0 2020-01-01 01:00 NaN NaN
1 2020-01-01 01:01 1.1 2.4
2 2020-01-01 01:02 1.1 2.4
3 2020-01-01 01:03 1.1 2.4
4 2020-01-01 01:04 1.1 2.4
5 2020-01-01 01:05 4.2 5.5
6 2020-01-01 01:06 4.2 5.5
7 2020-01-01 01:07 4.2 5.5
8 2020-01-01 01:08 7.3 8.6
9 2020-01-01 01:09 7.3 8.6
10 2020-01-01 01:10 7.3 8.6
多谢!
解决方案
One way, would be to use update
on your complete df (assuming it includes all indices). Then use fillna
to get the previous values for all your missings:
a = pd.DataFrame(
{
"date": pd.date_range(start="2020-01-01", periods=3),
"x": [1, np.nan, 3],
"y": [5, np.nan, 6],
}
).set_index("date")
b = pd.DataFrame(
{
"date": pd.date_range(start="2020-01-01", periods=5),
"x": [np.nan] * 5,
"y": [np.nan] * 5,
}
).set_index("date")
print(a, b)
| date | x | y |
|:--------------------|----:|----:|
| 2020-01-01 00:00:00 | 1 | 5 |
| 2020-01-02 00:00:00 | nan | nan |
| 2020-01-03 00:00:00 | 3 | 6 |
| date | x | y |
|:--------------------|----:|----:|
| 2020-01-01 00:00:00 | nan | nan |
| 2020-01-02 00:00:00 | nan | nan |
| 2020-01-03 00:00:00 | nan | nan |
| 2020-01-04 00:00:00 | nan | nan |
| 2020-01-05 00:00:00 | nan | nan |
b.update(a)
b = b.fillna(method="ffill")
print(b)
| date | x | y |
|:--------------------|----:|----:|
| 2020-01-01 00:00:00 | 1 | 5 |
| 2020-01-02 00:00:00 | 1 | 5 |
| 2020-01-03 00:00:00 | 3 | 6 |
| 2020-01-04 00:00:00 | 3 | 6 |
| 2020-01-05 00:00:00 | 3 | 6 |
推荐阅读
- php - Wordpress 使用本地回退加载外部脚本
- android-studio - Android 应用开发 - 库 android-graphview
- python - 在 spyder 中使用烧瓶卡在本地服务器中
- r - 使用 R 和 httr 获取页面内容时出错:http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp
- ios - 在 Swift 4 中从 String 对象创建 CFData 对象
- jquery - 用于为多个元素设置高度的 jQuery 循环
- c# - 如何隐藏打开chrome之前出现的Selenium黑色控制台窗口
- python - param-grid 将参数传递给底层函数。迷失在 kw_args
- python-3.x - 是否可以在 spaCy 中排除某些 POS 标签?Python
- segmentation-fault - 分段故障运动检测 Opencv4tegra