首页 > 解决方案 > 比较两个 DataFrame 中的两列并创建一个布尔列

问题描述

我是熊猫新手。我想比较按每个 DataFrame 的 2 列分组的两个数据帧。在df1我们有BrandSignal_range作为列。在中,df2我们有order和作为列。我需要将df2 的2和df1 中的 2 进行比较。BrandsignalBrandsignalSinal_range

df1

+-------+-------------------+
|Brand  | Signal_range 
+-------+-------------------+
| AB    | {"Weak": {"low": 15, "high": 120}, "Strong": {"low": null, "high": -240}, "Average": {"low": -240, "high": 15}}
| CA    |  
| FZ    | {"Weak": {"low": 10, "high": 100}, "Strong": {"low": null, "high": -200}, "Average": {"low": -200, "high": 10}}
+-------+-------+------------+

df2

+-------+-------+--------+
|order  | Brand | signal | 
+-------+-------+--------+
| 1233  | AB    |  -250  |
| 34565 | AB    |   100  |
| 34552 | FZ    |   5    |
+-------+-------+--------+

我需要将signal某个品牌的's 纳入df2其中并将其与该品牌df1的 's进行比较Signal_range。如果信号落在 df1 的范围内,Signal_range则在 df1 中创建一个新列作为布尔值,如下所示

+-------+-------+--------------------+-------------+------------+
|order  | Brand | signal |  weak_ind | Average_ind | Strong_ind |
+-------+-------+--------------------+-------------+------------+
| 1233  | AB    |  -250  |  0        |    0        |   1        |
| 34565 | AB    |   100  |  1        |    0        |   0        |
| 34552 | FZ    |   5    |  0        |    1        |   0        |
+-------+-------+--------+-----------+-------------+------------+

我们有什么方法可以轻松实现这一目标?

标签: pythonpandasdataframe

解决方案


一切都是为了准备。将嵌入的 JSON 转换为列,然后它就是直接的逻辑。你还没有定义你的逻辑,所以我为了这个例子的目的而对它进行了测试。


sr = """
|Brand|Signal_range
| AB    | {"Weak": {"low": 15, "high": 120}, "Strong": {"low": null, "high": -240}, "Average": {"low": -240, "high": 15}}
| CA    |  
| FZ    | {"Weak": {"low": 10, "high": 100}, "Strong": {"low": null, "high": -200}, "Average": {"low": -200, "high": 10}}"""
df = pd.read_csv(io.StringIO(sr), sep="|", engine="python").drop(columns="Unnamed: 0")
# cleanup the data and convert JSON to columns
df["Brand"] = df["Brand"].str.strip()
df["Signal_range"] = df["Signal_range"].str.strip().replace("", "{}")
df["Signal_range"] = df["Signal_range"].apply(lambda x: json.loads(x))
df = pd.json_normalize(df.to_dict(orient="records"))
df1 = pd.read_csv(io.StringIO("""|order|Brand|signal
| 1233  | AB    |  -250  
| 34565 | AB    |   100  
| 34552 | FZ    |   5    """), sep="|").drop(columns="Unnamed: 0")
df1["Brand"] = df1["Brand"].str.strip()
dfm = df1.merge(df, on="Brand")
# derive columns...
dfm = dfm.assign(
    weak_ind=np.where(1, (dfm["Signal_range.Weak.low"]<=dfm["signal"])&(dfm["signal"]<=dfm["Signal_range.Weak.high"]), 0),
    average_ind=np.where(1, (dfm["Signal_range.Average.low"]<=dfm["signal"])&(dfm["signal"]<=dfm["Signal_range.Average.high"]), 0),
    strong_ind=np.where(1, (dfm["Signal_range.Strong.low"]<=dfm["signal"])&(dfm["signal"]<=dfm["Signal_range.Strong.high"]), 0),

          )
dfr = dfm.copy()
print(dfr.drop(columns=[c for c in dfr.columns if "Signal" in c]).to_string(index=False))

输出

 order Brand  signal  weak_ind  average_ind  strong_ind
  1233    AB    -250         0            0           0
 34565    AB     100         1            0           0
 34552    FZ       5         0            1           0

推荐阅读