首页 > 解决方案 > 将字符串转换为数据框中列内的整数(5 星级 = 5)

问题描述

我想将包含评论字符串的列转换为整数,例如 5.0 颗星(满分 5 颗星)。

0    5.0 out of 5 stars
1    2.0 out of 5 stars
2    5.0 out of 5 stars
3    5.0 out of 5 stars
4    5.0 out of 5 stars
5    5.0 out of 5 stars
6    4.0 out of 5 stars
7    5.0 out of 5 stars
8    5.0 out of 5 stars
9    5.0 out of 5 stars
Name: StarRating, dtype: object

我熟悉遍历行和列,并尝试过

df[["StarRating"]] = df[["StarRating"]] .apply(pd.to_numeric)

但收到以下错误

ValueError: Unable to parse string "5.0 out of 5 stars" at position 0

我也试过:

for col in df.StarRating()
    if df['StarRating'] = (df['StarRating'] !='5.0 out of 5 stars').astype(int, 5.0)
    if df['StarRating'] = (df['StarRating'] !='4.0 out of 4 stars').astype(int, 4.0)
    if df['StarRating'] = (df['StarRating'] !='3.0 out of 3 stars').astype(int, 3.0)
    if df['StarRating'] = (df['StarRating'] !='2.0 out of 2 stars').astype(int, 2.0)
    if df['StarRating'] = (df['StarRating'] !='1.0 out of 1 stars').astype(int, 1.0)
    print(StarInt)

但收到错误

File "<ipython-input-43-e2e6fd3fae34>", line 1
    for col in df.StarRating()
                              ^
SyntaxError: invalid syntax

任何建议将不胜感激。谢谢

标签: pythonpandasfor-loop

解决方案


尝试拆分字符串并将第一个元素转换为浮点数:

df['StarRatingNumeric'] = df.StarRating.apply(lambda r: float(r.split()[0]))

或者如果您需要整数数据类型:

df['StarRatingNumeric'] = df.StarRating.apply(lambda r: int(float(r.split()[0])))

推荐阅读