首页 > 解决方案 > Python:Pandas 在 Pandas (Dataframe) 中查找文本的可用性

问题描述

我在熊猫数据框中有两列 ColA ColB,如果 colA 包含与 colB 匹配的单词,我想将 ColB 与 ColA 进行比较,那么我必须将 colC 更新为可用。

If it not macthes print not available.
ColA                                                            ColB  
You can extract_insights on product reception                   insights
user various sources like extract_insights etc.                 insights   
some other sourced mail by using signals from state art         text       

注意:即使 A 列包含任何特殊字符,它仍然应该能够识别 colB 文本

期望的输出:

If it not macthes print not available.
ColA                                                           ColB     Colc
You can extract_insights on product reception                  insights AVB
user various sources like extract_insights etc.                insights AVB  
some other sourced mail by using signals from state art        text     NAVB  

标签: pythonregexpython-3.xpandas

解决方案


尝试以下操作:

import pandas as pd

# Initialize example dataframe
data = [
    ["You can extract_insights on product reception", "insights"],
    ["user various sources like extract_insights etc.", "insights"],
    ["some other sourced mail by using signals from state art", "text"],
]
df = pd.DataFrame(data=data, columns=["ColA", "ColB"])

# Create column C with comparison results
df["ColC"] = [
    "AVB" if (b in a) else "NAVB"
    for (a, b) in zip(df["ColA"], df["ColB"])
]

print(df)
# Output:
#                                                 ColA      ColB  ColC
# 0      You can extract_insights on product reception  insights   AVB
# 1    user various sources like extract_insights etc.  insights   AVB
# 2  some other sourced mail by using signals from ...      text  NAVB

推荐阅读