首页 > 解决方案 > 如何将数据类型从对象转换为数字,然后在 pandas 中找到每一行的平均值?例如。将 '<17,500, >=15,000' 转换为 16250(平均值)

问题描述

data['family_income'].value_counts()
>=35,000             2517
<27,500, >=25,000    1227
<30,000, >=27,500     994
<25,000, >=22,500     833
<20,000, >=17,500     683
<12,500, >=10,000     677
<17,500, >=15,000     634
<15,000, >=12,500     629
<22,500, >=20,000     590
<10,000, >= 8,000     563
< 8,000, >= 4,000     402
< 4,000               278
Unknown               128

要显示为 MEAN 值而不是范围内的值的数据列

data['family_income']
    0        <17,500, >=15,000
    1        <27,500, >=25,000
    2        <30,000, >=27,500
    3        <15,000, >=12,500
    4        <30,000, >=27,500
                   ...        
    10150    <30,000, >=27,500
    10151    <25,000, >=22,500
    10152             >=35,000
    10153    <10,000, >= 8,000
    10154    <27,500, >=25,000
    Name: family_income, Length: 10155, dtype: object

输出:作为平均估算值

0      16250
1      26250
3      28750
     ...
10152  35000
10153   9000
10154  26500


data['family_income']=data['family_income'].str.replace(',', ' ').str.replace('<',' ')
data[['income1','income2']] = data['family_income'].apply(lambda x: pd.Series(str(x).split(">=")))

data['income1']=pd.to_numeric(data['income1'], errors='coerce')

data['income1']
        0       NaN
        1       NaN
        2       NaN
        3       NaN
        4       NaN
                 ..
        10150   NaN
        10151   NaN
        10152   NaN
        10153   NaN
        10154   NaN
        Name: income1, Length: 10155, dtype: float64

在这种情况下,数据类型从对象到数字的转换似乎不起作用,因为所有值都返回为 NaN。那么,如何转换为数值数据类型并找到平均估算值?

标签: pythonpandas

解决方案


您可以使用以下代码段:

# Importing Dependencies
import pandas as pd
import string

# Replicating Your Data
data = ['<17,500, >=15,000', '<27,500, >=25,000', '< 4,000 ', '>=35,000']
df = pd.DataFrame(data, columns = ['family_income'])

# Removing punctuation from family_income column
df['family_income'] = df['family_income'].apply(lambda x: x.translate(str.maketrans('', '', string.punctuation)))

# Splitting ranges to two columns A and B
df[['A', 'B']] = df['family_income'].str.split(' ', 1, expand=True)

# Converting cols A and B to float
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_numeric)

# Creating mean column from A and B
df['mean'] = df[['A', 'B']].mean(axis=1)
# Input DataFrame
family_income
0   <17,500, >=15,000
1   <27,500, >=25,000
2   < 4,000
3   >=35,000

# Result DataFrame
mean
0   16250.0
1   26250.0
2   4000.0
3   35000.0

推荐阅读