首页 > 解决方案 > 计算熊猫列中的字符

问题描述

有没有办法显式计算 Pandas 列中字符串的字符并按各自的单词对它们进行分组?

df["text"]=[["Hello how are you?"],["I am fine"]]
Then the counter should be
df["count"]= [[0-4 6-8 10-12 14-16 17],[0 2-3 5-8]]

标签: pythonpandas

解决方案


据我所知,您所要求的没有熊猫功能,但您可以这样做:

import re
import pandas as pd

# setup
df = pd.DataFrame(data=[["Hello how are you?"], ["I am fine"]], columns=['text'])


def extract_spans(m):
    """Convert span to required string representation"""
    start, end = m.span()
    return f'{start}-{end - 1}' if end - start > 1 else f'{start}'


# create count column
df['count'] = [' '.join([extract_spans(m) for m in re.finditer(r'([^\w\s_]|\w+)', v)]) for v in df['text'].tolist()]
print(df)

输出

                 text                   count
0  Hello how are you?  0-4 6-8 10-12 14-16 17
1           I am fine               0 2-3 5-8

推荐阅读