python - Python将多列添加到数据框
问题描述
我并尝试创建一个数据框,其中包含来自源数据框的 1 列的 9 列不同的列。我无法弄清楚我做错了什么。第一个始终有效,但其余的则无效。我首先尝试了橙色和黄色,数据被插入,但第二个命令和其余的被插入为 nan。
lastdate = '10/23/2021'
column_names = ["Red", "Orange", "Yellow","Green","Blue","Violet","Black","Brown"]
dftcolorAgg = pd.DataFrame(columns = column_names)
dftcolorAgg['Red'] = df[ (df['Color'] == 'Red') & (df['Date'] == lastdate)].Height
dftcolorAgg['Orange'] = df[ (df['Color'] == 'Orange') & (df['Date'] == lastdate)].Height
dftcolorAgg['Yellow'] = df[ (df['Color'] == 'Yellow') & (df['Date'] == lastdate)].Height
dftcolorAgg['Green'] = df[ (df['Color'] == 'Green') & (df['Date'] == lastdate)].Height
dftcolorAgg['Blue'] = df[ (df['Color'] == 'Blue') & (df['Date'] == lastdate)].Height
dftcolorAgg['Indigo'] = df[ (df['Color'] == 'Indigo') & (df['Date'] == lastdate)].Height
dftcolorAgg['Violet'] = df[ (df['Color'] == 'Violet') & (df['Date'] == lastdate)].Height
dftcolorAgg['Black'] = df[ (df['Color'] == 'Black') & (df['Date'] == lastdate)].Height
dftcolorAgg['Brown'] = df[ (df['Color'] == 'Brown') & (df['Date'] == lastdate)].Height
第二种方法
red = df[ (df['Color'] == 'Red') & (df['Date'] == lastdate)].Height
orange = df[ (df['Color'] == 'Orange') & (df['Date'] == lastdate)].Height
yellow = df[ (df['Color'] == 'Yellow') & (df['Date'] == lastdate)].Height
green = df[ (df['Color'] == 'Green') & (df['Date'] == lastdate)].Height
blue = df[ (df['Color'] == 'Blue') & (df['Date'] == lastdate)].Height
indigo = df[ (df['Color'] == 'Indigo') & (df['Date'] == lastdate)].Height
violet = df[ (df['Color'] == 'Violet') & (df['Date'] == lastdate)].Height
black = df[ (df['Color'] == 'Black') & (df['Date'] == lastdate)].Height
brown = df[ (df['Color'] == 'Brown') & (df['Date'] == lastdate)].Height
dftcolorAgg = pd.concat([red, orange,yellow,green,blue,indigo,violet,black,brown], axis=1)
第二种方法只是添加 nans 没有值。此外,每条语句又增加了 8 行,所以我最终得到了 72 行,全部为 nan
我想让每个语句将值插入到新数据框中,并且所有值都在每列相同的 8 行中。
这是一些示例数据
Date,color,Height
10/25/2021,red,15
10/25/2021,red,0
10/25/2021,red,15
10/25/2021,red,17.5
10/25/2021,red,4.5
10/25/2021,red,18
10/25/2021,red,9
10/25/2021,red,18
10/25/2021,orange,16
10/25/2021,orange,19.9
10/25/2021,orange,17.8
10/25/2021,orange,16
10/25/2021,orange,.1
10/25/2021,orange,6.5
10/25/2021,orange,13
10/25/2021,orange,0
10/25/2021,yellow,0
10/25/2021,yellow,10.9
10/25/2021,yellow,12
10/25/2021,yellow,18
10/25/2021,yellow,16.5
10/25/2021,yellow,16
10/25/2021,yellow,8
10/25/2021,yellow,14.6
预期结果
Red orange yellow
15 16 0
0 19.9 10.9
15 17.8 12
17.5 16 18
4.5 .1 16.5
18 6.5 16
9 13 8
18 0 14.6
解决方案
从您的问题不清楚如何处理Height
列,即必须应用哪种聚合。无论如何,pandas 中有几个有用的函数可以使用:
pivot
此示例使用枢轴从列中的值创建新列。
import pandas as pd
color = ['red','yellow','red','blue','blue']
height = range(len(colors))
df = pd.DataFrame({'color':color, 'height':height})
df.pivot(columns='color')
height
color blue red yellow
0 NaN 0.0 NaN
1 NaN NaN 1.0
2 NaN 2.0 NaN
3 3.0 NaN NaN
4 4.0 NaN NaN
如您所见,透视值基于名为 的列的值创建新列color
。
另一种方法是使用索引和unstack函数。
更新
使用给定的数据,pivot
方法是这样的:
import pandas as pd
from io import StringIO
strm = StringIO("""Date,color,Height
10/25/2021,red,15
10/25/2021,red,0
10/25/2021,red,15
10/25/2021,red,17.5
10/25/2021,red,4.5
10/25/2021,red,18
10/25/2021,red,9
10/25/2021,red,18
10/25/2021,orange,16
10/25/2021,orange,19.9
10/25/2021,orange,17.8
10/25/2021,orange,16
10/25/2021,orange,.1
10/25/2021,orange,6.5
10/25/2021,orange,13
10/25/2021,orange,0
10/25/2021,yellow,0
10/25/2021,yellow,10.9
10/25/2021,yellow,12
10/25/2021,yellow,18
10/25/2021,yellow,16.5
10/25/2021,yellow,16
10/25/2021,yellow,8
10/25/2021,yellow,14.6""")
df = pd.read_csv(strm, sep=",")
df['idx'] = df.groupby('color')['color'].cumcount()
df.pivot(columns=['color'], index='idx', values='Height').reset_index(drop=True)
#alternative
#pd.pivot_table(df, columns=['color'], index='idx').reset_index(drop=True)
# output
color orange red yellow
idx
0 16.0 15.0 0.0
1 19.9 0.0 10.9
2 17.8 15.0 12.0
3 16.0 17.5 18.0
4 0.1 4.5 16.5
5 6.5 18.0 16.0
6 13.0 9.0 8.0
7 0.0 18.0 14.6
推荐阅读
- python - 如何在熊猫中用字符串值替换 NaN
- python - 比较两个输入值(投票系统)python
- qt - 错误的编码 (utf 8 -> iso-8859-1) 使用带有 webassembly 的 Qt qml 文本字段进行复制粘贴
- bash - Bash 参数扩展 - 获取目录路径的一部分
- amazon-web-services - AWS 数据管道 DynamoDB 到 S3 503 减速错误
- c++ - 如何在 C++ 中实现具有返回类型的访问者模式
- typescript - 将 TypeORM 实体模型类与 NestJS-GraphQL 模式类型结合使用好吗?
- php - 如何获取所选标签的值以更新目的?
- python - 如何在 Windows 10 中安装 python commpy 包?
- game-physics - Unity 播放器和地板 2D 对象不碰撞