python - pandas: assign value based on equivalent value in another row with lookup
问题描述
I have a dataframe with this structure:
df = pd.DataFrame([
{ "state": "CA", region="West", "total" 2, "week": 10 },
{ "state": "UT", region="Midwest", "total" 7, "week": 10 },
{ "state": "CA", region="CA", "total" 14, "week": 11 },
{ "state": "UT", region="UT", "total" 18, "week": 11 },
{ "state": "CA", region="West", "total" 21, "week": 12 },
{ "state": "UT", region="Midwest", "total" 30, "week": 12 }
])
(My real dataset has all the states in!)
All the region
values for rows with week
set to 11
have been set wrongly, so I would like to replace them with the value for the same state in week 10.
I tried this:
df[df.week == 11].region = df[df.week == 10].region
But it's reliant on the rows being in the same order, which they might not be. Is there a way I can do this while matching on state name?
解决方案
the code to create your dataframe was messed up, so I couldn't test the following, but you can:
- merge a subset of the dataframe back onto itself with the defined criteria merging on state and week (order does not matter since you are using a merge)
- use
np.where()
to update the relevant values using the newregion_x
column to updatedregion
where appropriate - then, you can just drop
region_x
df = df.merge(df.loc[df['week'] == 11, ['state','region','week']].replace(10,11),
how='left',
on=['state', 'week'], suffixes=('', '_x'))
df['region'] = np.where(df['region'] == 11, df['region_x'], df['region'])
df = df.drop('region_x', axis=1)
If you don't mind sorting your dataframe you could also use ffill()
. Make sure your 'region' column doesn't already have NaN
values though. In case it does, I included this line of code: df['region'] = df['region'].replace(np.nan, 'Unknown')
so that unintended NaN
values aren't forward -filled unintentionally, but perhaps you don't need that line of code:
df = df.sort_values(['state', 'week'])
df['region'] = df['region'].replace(np.nan, 'N/A') #optional
df['region'] = np.where(df['week'] == 11, np.nan, df['region'])
df['region'] = df['region'].ffill()
推荐阅读
- python - 使用带有构造函数的 MySQL 连接器
- c - 为什么在 MASM 汇编中使用 FPU x87 指令集执行操作时得到无意义的数字?
- javascript - 不调用 ES6 导入
- wordpress - WordPress 自定义查询字符串未被 Google 索引
- css - HighCharts 组织节点重叠
- javascript - HttpRequest 和 XMLHttpRequest 之间的真正区别
- python-3.x - 如何将时间序列数据的多个文件重新采样为长度相同(观察次数相同)
- bash - 如何创建一个 bash 脚本来删除 mac 上的帐户
- php - 在 Wordpress 的“最近帖子”功能中排除精选帖子
- r - 使用ggtext的ggplot标题中的符号问题