首页 > 解决方案 > TypeError:预期的字符串或类似字节的对象“

问题描述

def preprocess_text(sen): # 移除 html 标签 sentence = remove_tags(sen)

# Remove punctuations and numbers
sentence = re.sub('[^a-zA-Z]', ' ', sentence)

# Single character removal
sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)

# Removing multiple spaces
sentence = re.sub(r'\s+', ' ', sentence)

return sentence

TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text): return TAG_RE.sub('', text)

评论 = [] 句子 = list(renamed_df['CanonSkillClusters']) 句子中的 sen:reviews.append(preprocess_text(sen)) print(renamed_df.columns.values)

标签: pythonpandasbert-language-model

解决方案


推荐阅读