python - 使用 loc 标准化 pandas 中的列
问题描述
我有一列似乎有 4 种不同格式的数据,我创建了一个带有数组的小片段来说明我正在使用的内容
ex_array = np.array(['100X172',
'78X120',
'1 ac',
'76,666',
'85X175',
'19,928',
'14810',
'3 ac',
'90X181',
'38X150',
'19040',
'8265',
'100X125',
'6000',
'8,750',
'.448 ac'])
ex_df = pd.DataFrame(data=ex_array, columns=['ex_col'])
这将按预期输出以下内容:
ex_col
0 100X172
1 78X120
2 1 ac
3 76,666
4 85X175
5 19,928
6 14810
7 3 ac
8 90X181
9 38X150
10 19040
11 8265
12 100X125
13 6000
14 8,750
15 .448 ac
目标是标准化列,其中所有内容都以英亩为单位,所需的输出如下
ex_df['acreage'] =
acreage
0 .394858
1 .214876
2 1
3 1.76
4 .341483
5 .457484
6 .339991
7 3
8 .373967
9 .130854
10 .437098
11 .189738
12 .284665
13 .137741
14 .200872
15 .448
我在 pandas 中的想法是创建 3 个布尔列来处理不同类型的数据
ex_df['hasX'] = ex_df['ex_col'].str.contains('X')
ex_df['has_ac'] = ex_df['ex_col'].str.contains('ac')
ex_df['has_comma'] = ex_df['ex_col'].str.contains(',')
这按预期输出
ex_df
ex_col hasX has_ac has_comma
0 100X172 True False False
1 78X120 True False False
2 1 ac False True False
3 76,666 False False True
4 85X175 True False False
5 19,928 False False True
6 14810 False False False
7 3 ac False True False
8 90X181 True False False
9 38X150 True False False
10 19040 False False False
11 8265 False False False
12 100X125 True False False
13 6000 False False False
14 8,750 False False True
15 .448 ac False True False
接下来我尝试了多个 loc 操作,如下所示
ex_df.loc[(ex_df['hasX']==True), 'acreage']= ex_df['ex_col'].apply(lambda x: float(((int(x.split('X')[0]))*(int(x.split('X')[-1])))/43560))
ex_df.loc[(ex_df['has_ac']==True), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x.split()[0]))
ex_df.loc[(ex_df['has_comma']==True), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x.replace(',','')))
ex_df.loc[((ex_df['hasX']==False) & (ex_df['has_ac']==False) & (ex_df['has_comma']==False)), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x))
这会输出以下错误:
<ipython-input-40-e9eb1eacbacb> in <module>
----> 1 ex_df.loc[(ex_df['hasX']==True), 'acreage']= ex_df['ex_col'].apply(lambda x: float(((int(x.split('X')[0]))*(int(x.split('X')[-1])))/43560))
2 ex_df.loc[(ex_df['has_ac']==True), 'acreage'] = ex_df['ex_col'].apply(lambda x: x.split()[0])
3 ex_df.loc[(ex_df['has_comma']==True), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x.replace(',','')))
4 ex_df.loc[((ex_df['hasX']==False) & (ex_df['has_ac']==False) & (ex_df['has_comma']==False)), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x))
4198 else:
4199 values = self.astype(object)._values
-> 4200 mapped = lib.map_infer(values, f, convert=convert_dtype)
4201
4202 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-40-e9eb1eacbacb> in <lambda>(x)
----> 1 ex_df.loc[(ex_df['hasX']==True), 'acreage']= ex_df['ex_col'].apply(lambda x: float(((int(x.split('X')[0]))*(int(x.split('X')[-1])))/43560))
2 ex_df.loc[(ex_df['has_ac']==True), 'acreage'] = ex_df['ex_col'].apply(lambda x: x.split()[0])
3 ex_df.loc[(ex_df['has_comma']==True), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x.replace(',','')))
4 ex_df.loc[((ex_df['hasX']==False) & (ex_df['has_ac']==False) & (ex_df['has_comma']==False)), 'acreage'] = ex_df['ex_col'].apply(lambda x: float(x))
ValueError: invalid literal for int() with base 10: '1 ac'
解决方案
让我们尝试extract
拆分您的数据列并用于np.select
映射:
data = (ex_df['ex_col'].str.replace(',','')
.str.extract('([\.\d]+)\s?(ac|X)?([\.\d,]+)?')
)
data[[0,2]] = data[[0,2]].astype(float)
ex_df['area'] = np.select((data[1].eq('X'), data[1].eq('ac')),
(data[0]* data[2]/43560,data[0]),
data[0]/43560 )
输出:
ex_col area
0 100X172 0.394858
1 78X120 0.214876
2 1 ac 1.000000
3 76,666 1.760009
4 85X175 0.341483
5 19,928 0.457484
6 14810 0.339991
7 3 ac 3.000000
8 90X181 0.373967
9 38X150 0.130854
10 19040 0.437098
11 8265 0.189738
12 100X125 0.286961
13 6000 0.137741
14 8,750 0.200872
15 .448 ac 0.448000
推荐阅读
- snowflake-cloud-data-platform - 雪花程序因任务执行而失败
- arrays - 初始化 UIimages 数组
- python - Selenium - 单击 SVG 下拉菜单以打开扩展的信息部分
- angularjs - 当另一个元素具有焦点时,使用 ng-focus 将焦点设置到一个元素
- spring - 使用 curl 获取 Spring Initializr 的所有依赖项列表
- jupyter-notebook - 即使在清除所有输出后,Jupyter Notebook 仍超过 100 MB
- c# - 重定向页面返回认证失败
- html - How to make dynamic width in flex box row with two columns?
- c - 在 VSCode 中查看 C 文档?
- reactjs - 如何同时更新两个不同组件上的相同 redux 状态?