python - 如何在不同的表之间进行转换和连接?
问题描述
我想根据相同的索引连接两个表。但表格格式不同。如何将它们转换为相同的格式然后连接?
一张桌子是这样的:
close date
0 1658.92 2009-02-01
1 1835.84 2009-03-01
2 2057.33 2009-04-01
3 2120.32 2009-05-01
4 2174.52 2009-06-01
5 2348.48 2009-07-01
6 2378.73 2009-08-01
7 2510.82 2009-09-01
8 2417.32 2009-10-01
9 2532.77 2009-11-01
10 2684.40 2009-12-01
另一个是这样的:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec year
0 7.8 8.3 8.7 9.0 9.4 9.5 9.5 9.6 9.8 10.0 9.9 9.9 2009
1 9.8 9.8 9.9 9.9 9.6 9.4 9.4 9.5 9.5 9.4 9.8 9.3 2010
2 9.1 9.0 9.0 9.1 9.0 9.1 9.0 9.0 9.0 8.8 8.6 8.5 2011
3 8.3 8.3 8.2 8.2 8.2 8.2 8.2 8.1 7.8 7.8 7.7 7.9 2012
4 8.0 7.7 7.5 7.6 7.5 7.5 7.3 7.2 7.2 7.2 6.9 6.7 2013
5 6.6 6.7 6.7 6.2 6.3 6.1 6.2 6.1 5.9 5.7 5.8 5.6 2014
6 5.7 5.5 5.4 5.4 5.6 5.3 5.2 5.1 5.0 5.0 5.1 5.0 2015
7 4.9 4.9 5.0 5.0 4.8 4.9 4.8 4.9 5.0 4.9 4.7 4.7 2016
8 4.7 4.7 4.4 4.4 4.4 4.3 4.3 4.4 4.2 4.1 4.2 4.1 2017
9 4.1 4.1 4.0 3.9 3.8 4.0 3.9 3.8 3.7 3.8 3.7 3.9 2018
10 4.0 3.8 3.8 3.6 -1 -1 -1 -1 -1 -1 -1 -1 2019
我是 python 新手,不熟悉数据处理。请给我一些建议和意见。感谢你们。
我想将它们组合到一个表中,列可以是'year'
, 'month'
, 'data1'
,'data2'
解决方案
这是一种遍历所有数据帧的解决方案。它不是最有效的,但它是可读的。
import pandas as pd
df = pd.DataFrame([[7.8, 8.3, 8.7, 9.0, 9.4, 9.5, 9.5, 9.6, 9.8, 10.0, 9.9, 9.9, 2009],
[9.8, 9.8, 9.9, 9.9, 9.6, 9.4, 9.4, 9.5, 9.5, 9.4, 9.8, 9.3, 2010],
[9.1, 9.0, 9.0, 9.1, 9.0, 9.1, 9.0, 9.0, 9.0, 8.8, 8.6, 8.5, 2011],
[8.3, 8.3, 8.2, 8.2, 8.2, 8.2, 8.2, 8.1, 7.8, 7.8, 7.7, 7.9, 2012],
[8.0, 7.7, 7.5, 7.6, 7.5, 7.5, 7.3, 7.2, 7.2, 7.2, 6.9, 6.7, 2013],
[6.6, 6.7, 6.7, 6.2, 6.3, 6.1, 6.2, 6.1, 5.9, 5.7, 5.8, 5.6, 2014],
[5.7, 5.5, 5.4, 5.4, 5.6, 5.3, 5.2, 5.1, 5.0, 5.0, 5.1, 5.0, 2015],
[4.9, 4.9, 5.0, 5.0, 4.8, 4.9, 4.8, 4.9, 5.0, 4.9, 4.7, 4.7, 2016],
[4.7, 4.7, 4.4, 4.4, 4.4, 4.3, 4.3, 4.4, 4.2, 4.1, 4.2, 4.1, 2017],
[4.1, 4.1, 4.0, 3.9, 3.8, 4.0, 3.9, 3.8, 3.7, 3.8, 3.7, 3.9, 2018],
[4.0, 3.8, 3.8, 3.6, -1, -1, -1, -1, -1, -1, -1, -1, 2019]],
columns=["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "year"],
)
df1 = pd.DataFrame([[1658.92, "2009-02-01"],
[1835.84, "2009-03-01"],
[2057.33 ,"2009-04-01"],
[2120.32, "2009-05-01"],
[2174.52, "2009-06-01"],
[2348.48, "2009-07-01"],
[2378.73 ,"2009-08-01"],
[2510.82, "2009-09-01"],
[2417.32, "2009-10-01"],
[2532.77, "2009-11-01"],
[2684.40, "2009-12-01"]],
columns=["close" , "date"])
# Rename columns
df.columns = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "year"]
# Convert to datetime for having the year and the month
df1["date"] = pd.to_datetime(df1["date"])
df1["month"] = df1.date.dt.month.astype(int)
df1["year"] = df1.date.dt.to_period('Y').astype(int)
df1= df1[["close", "month", "year"]]
# Create a new DataFrame
new_df = pd.DataFrame(columns=["month", "year", "df1"])
# Iterate over all the dataFrame
for index, row in df.iterrows():
for i, cell in enumerate(row[["1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12"]]):
new_df.loc[index*12 + i] = [int(df.columns[i]),
int(row.year), cell]
# Add the close column to the "new_df"
new_df = pd.merge(new_df, df1, how='outer')
print(new_df)
# month year df1 close
# 0 1.0 2009.0 7.8 NaN
# 1 2.0 2009.0 8.3 NaN
# 2 3.0 2009.0 8.7 NaN
# 3 4.0 2009.0 9.0 NaN
# 4 5.0 2009.0 9.4 NaN
# 5 6.0 2009.0 9.5 NaN
# 6 7.0 2009.0 9.5 NaN
# 7 8.0 2009.0 9.6 NaN
# 8 9.0 2009.0 9.8 NaN
# 9 10.0 2009.0 10.0 NaN
# 10 11.0 2009.0 9.9 NaN
# .. ... ... ... ...
# 128 9.0 2019.0 - 1.0 NaN
# 129 10.0 2019.0 - 1.0 NaN
# 130 11.0 2019.0 - 1.0 NaN
# 131 12.0 2019.0 - 1.0 NaN
# 132 2.0 39.0 NaN 1658.92
# 133 3.0 39.0 NaN 1835.84
# 134 4.0 39.0 NaN 2057.33
# 135 5.0 39.0 NaN 2120.32
# 136 6.0 39.0 NaN 2174.52
# 137 7.0 39.0 NaN 2348.48
# 138 8.0 39.0 NaN 2378.73
# 139 9.0 39.0 NaN 2510.82
# 140 10.0 39.0 NaN 2417.32
# 141 11.0 39.0 NaN 2532.77
# 142 12.0 39.0 NaN 2684.40
推荐阅读
- security - 站点范围的身份验证方法?
- javascript - 第一个技巧,就是errorHandler
- java - 如何在 Java 中不断提示正确的用户输入?
- python-3.x - 我应该将计数放在这个 tim 排序算法中的哪个位置,以便准确地将运行时间与其他算法进行比较
- tensorflow - 遇到 ModuleNotFoundError:当我使用 Tensorflow GPU 处理时,没有名为“tensorflow.contrib”的模块
- python-3.x - 在python中创建一个方阵
- python - 为什么我的递归函数之外的变量没有保持它们的值?
- python - 制作火炬数据集错误“无法从'dataloader'导入名称'read_data_sets'”
- sql-server - 使用 VB.net App 修复 SqlConnection 中的 System.TypeInitializationException 错误
- c - 如何编写 ARM 子程序并从 C 中调用它