首页 > 解决方案 > 从列中提取数字

问题描述

我有一个包含许多列的数据集。我想搜索以下任意数字之一:

Column_to_look_at

10 days ago I was ...
How old are you?
I am 24 years old
I do not know. Maybe 23.12?
I could21n  .... 

我需要创建两列:一列提取该列中包含的数字,另一列仅具有布尔值(如果行包含或不包含数字)。

我期望的输出

Column_to_look_at                 Numbers          Bool

10 days ago I was ...               [10]            1
How old are you?                    []              0
I am 24 years old                   [24]            1
I do not know. Maybe 23.12 or 23.14?   [23.12, 23.14]  1
I could21n  ....                     [21]           1

我应用于选择数字的代码是这个

df[df.applymap(np.isreal).all(1)]

但实际上这并没有给我预期的输出(至少对于数字选择)。任何有关如何从该列中提取数字的建议将不胜感激。谢谢

标签: pythonpandas

解决方案


这会做

def checknum(x):
    num_list = re.findall(r"[+-]?\d+(?:\.\d+)?", x['Column_to_look_at'])
    return num_list

df['Numbers'] = df.apply(checknum, axis=1)
df['Bool'] = df.apply(lambda x: 1 if len(x['Numbers']) > 0 else 0, axis=1)

推荐阅读