首页 > 解决方案 > 如何在不同的表之间进行转换和连接?

问题描述

我想根据相同的索引连接两个表。但表格格式不同。如何将它们转换为相同的格式然后连接?

一张桌子是这样的:

close     date

0  1658.92   2009-02-01

1  1835.84   2009-03-01

2  2057.33   2009-04-01

3  2120.32   2009-05-01

4  2174.52   2009-06-01

5  2348.48   2009-07-01

6  2378.73   2009-08-01

7  2510.82   2009-09-01

8  2417.32   2009-10-01

9  2532.77   2009-11-01

10  2684.40  2009-12-01

另一个是这样的:

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec year

0   7.8 8.3 8.7 9.0 9.4 9.5 9.5 9.6 9.8 10.0    9.9 9.9 2009

1   9.8 9.8 9.9 9.9 9.6 9.4 9.4 9.5 9.5 9.4 9.8 9.3 2010

2   9.1 9.0 9.0 9.1 9.0 9.1 9.0 9.0 9.0 8.8 8.6 8.5 2011

3   8.3 8.3 8.2 8.2 8.2 8.2 8.2 8.1 7.8 7.8 7.7 7.9 2012

4   8.0 7.7 7.5 7.6 7.5 7.5 7.3 7.2 7.2 7.2 6.9 6.7 2013

5   6.6 6.7 6.7 6.2 6.3 6.1 6.2 6.1 5.9 5.7 5.8 5.6 2014

6   5.7 5.5 5.4 5.4 5.6 5.3 5.2 5.1 5.0 5.0 5.1 5.0 2015

7   4.9 4.9 5.0 5.0 4.8 4.9 4.8 4.9 5.0 4.9 4.7 4.7 2016

8   4.7 4.7 4.4 4.4 4.4 4.3 4.3 4.4 4.2 4.1 4.2 4.1 2017

9   4.1 4.1 4.0 3.9 3.8 4.0 3.9 3.8 3.7 3.8 3.7 3.9 2018

10  4.0 3.8 3.8 3.6 -1  -1  -1  -1  -1  -1  -1  -1  2019

我是 python 新手,不熟悉数据处理。请给我一些建议和意见。感谢你们。

我想将它们组合到一个表中,列可以是'year', 'month', 'data1','data2'

标签: python

解决方案


这是一种遍历所有数据帧的解决方案。它不是最有效的,但它是可读的。

import pandas as pd

df = pd.DataFrame([[7.8, 8.3, 8.7, 9.0, 9.4, 9.5, 9.5, 9.6, 9.8, 10.0, 9.9, 9.9, 2009],
                   [9.8, 9.8, 9.9, 9.9, 9.6, 9.4, 9.4, 9.5, 9.5, 9.4, 9.8, 9.3, 2010],
                   [9.1, 9.0, 9.0, 9.1, 9.0, 9.1, 9.0, 9.0, 9.0, 8.8, 8.6, 8.5, 2011],
                   [8.3, 8.3, 8.2, 8.2, 8.2, 8.2, 8.2, 8.1, 7.8, 7.8, 7.7, 7.9, 2012],
                   [8.0, 7.7, 7.5, 7.6, 7.5, 7.5, 7.3, 7.2, 7.2, 7.2, 6.9, 6.7, 2013],
                   [6.6, 6.7, 6.7, 6.2, 6.3, 6.1, 6.2, 6.1, 5.9, 5.7, 5.8, 5.6, 2014],
                   [5.7, 5.5, 5.4, 5.4, 5.6, 5.3, 5.2, 5.1, 5.0, 5.0, 5.1, 5.0, 2015],
                   [4.9, 4.9, 5.0, 5.0, 4.8, 4.9, 4.8, 4.9, 5.0, 4.9, 4.7, 4.7, 2016],
                   [4.7, 4.7, 4.4, 4.4, 4.4, 4.3, 4.3, 4.4, 4.2, 4.1, 4.2, 4.1, 2017],
                   [4.1, 4.1, 4.0, 3.9, 3.8, 4.0, 3.9, 3.8, 3.7, 3.8, 3.7, 3.9, 2018],
                   [4.0, 3.8, 3.8, 3.6, -1, -1, -1, -1, -1, -1, -1, -1, 2019]],
    columns=["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "year"],
)

df1 = pd.DataFrame([[1658.92, "2009-02-01"],
                    [1835.84, "2009-03-01"],
                    [2057.33 ,"2009-04-01"],
                    [2120.32, "2009-05-01"],
                    [2174.52, "2009-06-01"],
                    [2348.48, "2009-07-01"],
                    [2378.73 ,"2009-08-01"],
                    [2510.82, "2009-09-01"],
                    [2417.32, "2009-10-01"],
                    [2532.77, "2009-11-01"],
                    [2684.40, "2009-12-01"]],
    columns=["close"   ,  "date"])

# Rename columns
df.columns = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "year"]

# Convert to datetime for having the year and the month
df1["date"] = pd.to_datetime(df1["date"])
df1["month"] = df1.date.dt.month.astype(int)
df1["year"] = df1.date.dt.to_period('Y').astype(int)
df1= df1[["close", "month", "year"]]

# Create a new DataFrame
new_df = pd.DataFrame(columns=["month", "year", "df1"])
# Iterate over all the dataFrame
for index, row in df.iterrows():
    for i, cell in enumerate(row[["1", "2", "3", "4", "5", "6",
                     "7", "8", "9", "10", "11", "12"]]):
        new_df.loc[index*12 + i] = [int(df.columns[i]),
                                    int(row.year), cell]

# Add the close column to the "new_df"
new_df = pd.merge(new_df, df1, how='outer')

print(new_df)
#      month    year   df1    close
# 0      1.0  2009.0   7.8      NaN
# 1      2.0  2009.0   8.3      NaN
# 2      3.0  2009.0   8.7      NaN
# 3      4.0  2009.0   9.0      NaN
# 4      5.0  2009.0   9.4      NaN
# 5      6.0  2009.0   9.5      NaN
# 6      7.0  2009.0   9.5      NaN
# 7      8.0  2009.0   9.6      NaN
# 8      9.0  2009.0   9.8      NaN
# 9     10.0  2009.0  10.0      NaN
# 10    11.0  2009.0   9.9      NaN
# ..     ...     ...   ...      ...
# 128    9.0  2019.0 - 1.0      NaN
# 129   10.0  2019.0 - 1.0      NaN
# 130   11.0  2019.0 - 1.0      NaN
# 131   12.0  2019.0 - 1.0      NaN
# 132    2.0    39.0   NaN  1658.92
# 133    3.0    39.0   NaN  1835.84
# 134    4.0    39.0   NaN  2057.33
# 135    5.0    39.0   NaN  2120.32
# 136    6.0    39.0   NaN  2174.52
# 137    7.0    39.0   NaN  2348.48
# 138    8.0    39.0   NaN  2378.73
# 139    9.0    39.0   NaN  2510.82
# 140   10.0    39.0   NaN  2417.32
# 141   11.0    39.0   NaN  2532.77
# 142   12.0    39.0   NaN  2684.40

推荐阅读