首页 > 解决方案 > Word Counter 循环在 Python 中不断加载

问题描述

我有一个comments如下所示的 DataFrame。我想为这个领域写一个Counter词。Text我已经列出了UserId需要其字数的列表,这些UserIds 存储在gold_users. 但是要创建的循环Counter只是不断加载。请帮我解决这个问题。

评论 这只是数据框的一部分,原来有很多行。

Id|                    Text                             |    UserId  
 6|  Before the 2006 course, there was Allen Knutso...  |    3   
 8|  Also, Theo Johnson-Freyd has some notes from M...  |    1  

代码

#Text Cleaning

punct = set(string.punctuation)
stopword = set(stopwords.words('english'))
lm = WordNetLemmatizer()

def clean_text(text):
    text = ''.join(char.lower() for char in text if char not in punct)
    tokens = re.split('\W+', text)
    text = [lm.lemmatize(word) for word in tokens if word not in stopword]
    return tuple(text)         # Writing only `return text` was giving unhashable error 'list'

comments['Text'] = comments['Text'].apply(lambda x: clean_text(x))

    
for index,rows in comments.iterrows():
      gold_comments = rows[comments.Text.loc[comments.UserId.isin(gold_users)]]
      Counter(gold_comments)

预期产出

[['scholar',20],['school',18],['bus',15],['class',14],['teacher',14],['bell',13],['time',12],['books',11],['bag',9],'student',7],......]

标签: pythonpython-3.xpandaslistnlp

解决方案


传递您的数据框已经只有您的gold_usersid 和文本,以下纯 python 函数将返回您需要的内容:

def word_count(df):
    counts = dict()
    for str in df['Text']:
        words = str.split()
        for word in words:
            if word in counts:
                counts[word] += 1
            else:
                counts[word] = 1
    return list(counts.items())

希望能帮助到你!


推荐阅读