首页 > 解决方案 > Python & Pandas: 'Series' objects are mutable, thus they cannot be hashed

问题描述

With Python and Pandas, I'm seeking to write a script that takes the data from the text column, evaluates that text with the textstat module, and then write the results back into the csv under the word_countcolumn.

Here is the structure of the csv:

 user_id         text text_number  word_count
0       10  test text A      text_0         NaN
1       11          NaN         NaN         NaN
2       12          NaN         NaN         NaN
3       13          NaN         NaN         NaN
4       14          NaN         NaN         NaN
5       15  test text B      text_1         NaN

Here is my code attempt to loop the text column into textstat:

df = pd.read_csv("texts.csv").fillna('')
text_data = df["text"]
length1 = len(text_data)

for x in range(length1):
    (text_data[x])

    #this is the textstat word count operation
    word_count = textstat.lexicon_count(text_data, removepunct=True)
    output_df = pd.DataFrame({"word_count":[word_count]})
    output_df.to_csv('texts.csv', mode="a", header=False, index=False)

However, I recieve this error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any suggestions on how to proceed? All assistance appreciated.

标签: pythonpandascsv

解决方案


The more pandas approach would be to use fillna + apply. Then write the Series directly out to_csv:

(
    df["text"].fillna('')  # Replace NaN with empty String
        .apply(textstat.lexicon_count,
               removepunct=True)  # Call lexicon_count on each value
        .rename('word_count')  # Rename Series
        .to_csv('texts.csv', mode="a", index=False)  # Write to csv
)

texts.csv:

word_count
1
0
0
0
0
1

To add a column to the existing DataFrame/csv instead of appending to the end of it can also do:

df['word_count'] = (
    df["text"].fillna('')  # Replace NaN with empty String
        .apply(textstat.lexicon_count,
               removepunct=True)  # Call lexicon_count on each value
)

df.to_csv('texts.csv', index=False)  # Write to csv

texts.csv:

user_id,text,text_number,word_count
text,A,text_0,1
,,,0
,,,0
,,,0
,,,0
text,B,text_1,1

To fix the current implementation, also use fillna and conditionally write the header only on the first iteration:

text_data = df["text"].fillna('')

for i, x in enumerate(text_data):
    # this is the textstat word count operation
    word_count = textstat.lexicon_count(x, removepunct=True)
    output_df = pd.DataFrame({"word_count": [word_count]})
    output_df.to_csv('texts.csv', mode="a", header=(i == 0), index=False)

texts.csv:

word_count
1
0
0
0
0
1

DataFrame and imports:

import pandas as pd
import textstat
from numpy import nan

df = pd.DataFrame({
    'user_id': ['text', nan, nan, nan, nan, 'text'],
    'text': ['A', nan, nan, nan, nan, 'B'],
    'text_number': ['text_0', nan, nan, nan, nan, 'text_1'],
    'word_count': [nan, nan, nan, nan, nan, nan]
})

推荐阅读