首页 > 解决方案 > 如何从熊猫的列中仅过滤掉浮点数据类型

问题描述

我有一个看起来像这样的列:

col1
20.5
21.2
21.2
17315/06/2021 09:06:481032.14310134.91082996.3001047998.93380132341231
0060232346956263174
$365140110030
$36516011007C27
$3651501100E743

我希望只有浮点值会保留在我尝试过各种替换方法的列中,但没有运气:

df['col1'] = df['col1'].replace(r'/ [ ^\d.] / g', '', regex=True, inplace=False)

似乎它什么也没做

或者

df['Temp'] = df['Temp'].replace(r'/ [ ^\d.] / g', '', regex=True, inplace=True)

将所有值作为NaN

标签: pythonpandas

解决方案


一个选项可能是在所有列的元素中查找所有“数字点数字”序列,如果恰好有一个匹配项,则转换为数字:

import pandas as pd

df = pd.DataFrame({"col1": [
            20.5,
            21.2,
            21.2,
            "17315/06/2021 09:06:481032.14310134.91082996.3001047998.93380132341231",
            "0060232346956263174",
            "$365140110030",
            "$36516011007C27",
            "$3651501100E743",
            "This is a cell with a float 5.4",
            -50.0 ]})

# with an apply/lambda
# df['floats'] = df['col1'].astype(str).str.findall("\-?\d+\.\d+").apply(lambda x: pd.to_numeric(*x) if len(x)==1 else None)

# you can also avoid the apply/lambda with a temporary series:
s = df['col1'].astype(str).str.findall("\-?\d+\.\d+")
df['floats'] = pd.to_numeric(s[s.str.len() == 1].str[0])

print(df['floats'])
0    20.5
1    21.2
2    21.2
3     NaN
4     NaN
5     NaN
6     NaN
7     NaN
8     5.4
9   -50.0
Name: floats, dtype: float64

推荐阅读