首页 > 解决方案 > Python将多列添加到数据框

问题描述

我并尝试创建一个数据框,其中包含来自源数据框的 1 列的 9 列不同的列。我无法弄清楚我做错了什么。第一个始终有效,但其余的则无效。我首先尝试了橙色和黄色,数据被插入,但第二个命令和其余的被插入为 nan。

lastdate = '10/23/2021'
column_names = ["Red", "Orange", "Yellow","Green","Blue","Violet","Black","Brown"]
dftcolorAgg = pd.DataFrame(columns = column_names)
dftcolorAgg['Red'] = df[ (df['Color'] == 'Red') & (df['Date'] == lastdate)].Height
dftcolorAgg['Orange'] = df[ (df['Color'] == 'Orange') & (df['Date'] == lastdate)].Height
dftcolorAgg['Yellow'] = df[ (df['Color'] == 'Yellow') & (df['Date'] == lastdate)].Height
dftcolorAgg['Green'] = df[ (df['Color'] == 'Green') & (df['Date'] == lastdate)].Height
dftcolorAgg['Blue'] = df[ (df['Color'] == 'Blue') & (df['Date'] == lastdate)].Height
dftcolorAgg['Indigo'] = df[ (df['Color'] == 'Indigo') & (df['Date'] == lastdate)].Height
dftcolorAgg['Violet'] = df[ (df['Color'] == 'Violet') & (df['Date'] == lastdate)].Height
dftcolorAgg['Black'] = df[ (df['Color'] == 'Black') & (df['Date'] == lastdate)].Height
dftcolorAgg['Brown'] = df[ (df['Color'] == 'Brown') & (df['Date'] == lastdate)].Height

第二种方法

red = df[ (df['Color'] == 'Red') & (df['Date'] == lastdate)].Height
orange = df[ (df['Color'] == 'Orange') & (df['Date'] == lastdate)].Height
yellow = df[ (df['Color'] == 'Yellow') & (df['Date'] == lastdate)].Height
green = df[ (df['Color'] == 'Green') & (df['Date'] == lastdate)].Height
blue = df[ (df['Color'] == 'Blue') & (df['Date'] == lastdate)].Height
indigo = df[ (df['Color'] == 'Indigo') & (df['Date'] == lastdate)].Height
violet = df[ (df['Color'] == 'Violet') & (df['Date'] == lastdate)].Height
black = df[ (df['Color'] == 'Black') & (df['Date'] == lastdate)].Height
brown = df[ (df['Color'] == 'Brown') & (df['Date'] == lastdate)].Height

dftcolorAgg = pd.concat([red, orange,yellow,green,blue,indigo,violet,black,brown], axis=1)

第二种方法只是添加 nans 没有值。此外,每条语句又增加了 8 行,所以我最终得到了 72 行,全部为 nan

我想让每个语句将值插入到新数据框中,并且所有值都在每列相同的 8 行中。

这是一些示例数据

Date,color,Height
10/25/2021,red,15
10/25/2021,red,0
10/25/2021,red,15
10/25/2021,red,17.5
10/25/2021,red,4.5
10/25/2021,red,18
10/25/2021,red,9
10/25/2021,red,18
10/25/2021,orange,16
10/25/2021,orange,19.9
10/25/2021,orange,17.8
10/25/2021,orange,16
10/25/2021,orange,.1
10/25/2021,orange,6.5
10/25/2021,orange,13
10/25/2021,orange,0
10/25/2021,yellow,0
10/25/2021,yellow,10.9
10/25/2021,yellow,12
10/25/2021,yellow,18
10/25/2021,yellow,16.5
10/25/2021,yellow,16
10/25/2021,yellow,8
10/25/2021,yellow,14.6

预期结果

Red orange  yellow
15    16      0
0     19.9    10.9
15    17.8    12
17.5  16      18
4.5   .1      16.5
18    6.5     16
9     13      8
18    0       14.6

标签: pythonpython-3.x

解决方案


从您的问题不清楚如何处理Height列,即必须应用哪种聚合。无论如何,pandas 中有几个有用的函数可以使用: pivot

此示例使用枢轴从列中的值创建新列。

import pandas as pd

color = ['red','yellow','red','blue','blue']
height = range(len(colors))
df = pd.DataFrame({'color':color, 'height':height})

df.pivot(columns='color')


    height
color   blue    red     yellow
0   NaN     0.0     NaN
1   NaN     NaN     1.0
2   NaN     2.0     NaN
3   3.0     NaN     NaN
4   4.0     NaN     NaN

如您所见,透视值基于名为 的列的值创建新列color

另一种方法是使用索引和unstack函数。

更新

使用给定的数据,pivot方法是这样的:

import pandas as pd
from io import StringIO
strm = StringIO("""Date,color,Height
10/25/2021,red,15
10/25/2021,red,0
10/25/2021,red,15
10/25/2021,red,17.5
10/25/2021,red,4.5
10/25/2021,red,18
10/25/2021,red,9
10/25/2021,red,18
10/25/2021,orange,16
10/25/2021,orange,19.9
10/25/2021,orange,17.8
10/25/2021,orange,16
10/25/2021,orange,.1
10/25/2021,orange,6.5
10/25/2021,orange,13
10/25/2021,orange,0
10/25/2021,yellow,0
10/25/2021,yellow,10.9
10/25/2021,yellow,12
10/25/2021,yellow,18
10/25/2021,yellow,16.5
10/25/2021,yellow,16
10/25/2021,yellow,8
10/25/2021,yellow,14.6""")

df = pd.read_csv(strm, sep=",")
df['idx'] = df.groupby('color')['color'].cumcount()
df.pivot(columns=['color'], index='idx', values='Height').reset_index(drop=True)
    
#alternative
#pd.pivot_table(df, columns=['color'], index='idx').reset_index(drop=True)


# output
color   orange  red     yellow
idx             
0   16.0    15.0    0.0
1   19.9    0.0     10.9
2   17.8    15.0    12.0
3   16.0    17.5    18.0
4   0.1     4.5     16.5
5   6.5     18.0    16.0
6   13.0    9.0     8.0
7   0.0     18.0    14.6

推荐阅读