python - Pandas 使用 If 语句逐行执行

问题描述

如果其他人对标题有更好的想法，我不确定这是否是最好的标题，我很乐意提出建议。

假设我有一个如下所示的数据框：

df2

             A     section
0      <fruit>
1        apple
2       orange
3         pear
4   watermelon
5     </fruit>
6  <furniture>
7        chair
8         sofa
9        table
10        desk
11 </furniture>

我想要的是一个看起来像这样的数据框：

             A     section
0      <fruit>       fruit
1        apple       fruit
2       orange       fruit
3         pear       fruit
4   watermelon       fruit
5     </fruit>       fruit
6  <furniture>   furniture
7        chair   furniture
8         sofa   furniture
9        table   furniture
10        desk   furniture
11 </furniture>  furniture

有没有办法做到这一点？我考虑过使用 if 语句逐行执行，但是在执行此操作时遇到了布尔逻辑问题。

编辑＃1：

下面发布的这个解决方案解决了我的问题。

解决方案：

df['section']=pd.Series(np.where(df.A.str.contains('<'),df.A.str.replace('<|>|/',''),np.nan)).ffill()

如果我有这样的数据怎么办？我想要同样的结果。

                                       A          section
0                                 <fruit>
1                <fruit_1>apple</fruit_1>
2               <fruit_2>orange</fruit_2>
3                 <fruit_3>pear</fruit_3>
4           <fruit_4>watermelon</fruit_4>
5                                </fruit>
6                             <furniture>
7        <furniture_1>chair</furniture_1>
8         <furniture_2>sofa</furniture_2>
9        <furniture_3>table</furniture_3>
10        <furniture_4>desk</furniture_4>
11                           </furniture>

标签： pythonpython-3.xpandas

IIUC 使用contains查找行，并np.where分配值，然后使用ffill填充np.nan

df['section']=pd.Series(np.where(df.A.str.contains('<'),df.A.str.replace('<|>|/',''),np.nan)).ffill()
df
Out[1003]: 
               A    section
0        <fruit>      fruit
1          apple      fruit
2         orange      fruit
3           pear      fruit
4     watermelon      fruit
5       </fruit>      fruit
6    <furniture>  furniture
7          chair  furniture
8           sofa  furniture
9          table  furniture
10          desk  furniture
11  </furniture>  furniture

如果您想更精确/具体/更严格，您还可以使用 and 检查字符串的开始和startswith结束endswith。

df1['Section'] = pd.Series(np.where(df1.A.str.startswith('<') & df1.A.str.endswith('>'), df1.A.str.replace('<|>|/',''), np.nan)).ffill()

python - Pandas 使用 If 语句逐行执行

问题描述

解决方案

推荐阅读