python - 即使代码具有给定值,ValueError 也会引发
问题描述
我正在尝试编写一段代码来帮助我删除所有连词代词、标点符号等。
macbeth = open("macbeth.txt", "r")
contents = macbeth.read()
contents = contents.split()
def remove_uninteresting_stuff(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
for x in punctuations:
file_contents.remove(x)
for x in uninteresting_words:
file_contents.remove(x)
return file_contents
print(remove_uninteresting_stuff(contents))
此代码引发此错误:
Traceback (most recent call last):
File "testingFile.py", line 33, in <module>
print(remove_uninteresting_stuff(contents))
File "testingFile.py", line 25, in remove_uninteresting_stuff
file_contents.remove(x)
ValueError: list.remove(x): x not in list
现在很明显,在像麦克白(莎士比亚的)这样的小说中,这些词将存在。
有人可以解释这个错误并帮我解决这个问题吗?
解决方案
您假设您的单词和标点列表都存在于 Macbeth 中,但事实并非如此。
另一种可能可行的编写方法是:
macbeth = open("macbeth.txt", "r")
contents = macbeth.read()
contents = contents.split()
def remove_uninteresting_stuff(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
file_contents = [word for word in file_contents if word not in uninteresting_words and word not in punctuations]
return file_contents
print(remove_uninteresting_stuff(contents))
不同之处在于,您在这里检查该词是否不存在于您的不需要的词列表中,而不是从您的内容中删除不需要的词,无论它是否存在。
由于您无法确定内容中是否存在不需要的单词,因此您必须先检查它是否存在,然后将其删除,这与仅保留不需要的单词列表中不存在的单词相同(如我已经在代码片段中完成)。
更新
如果您要删除的标点符号是单词的一部分,上面的代码片段将不起作用(惊喜!)
另一方面,这确实有效:
contents = "The the a to To IF is OF and and or here when where where how all ANY any both few whom who wHo!! -;."
contents = contents.split()
def remove_uninteresting_stuff(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
file_contents = [word.translate(str.maketrans('', '', punctuations)) for word in file_contents]
file_contents = [word for word in file_contents if word.lower() not in uninteresting_words]
return file_contents
print(remove_uninteresting_stuff(contents))
推荐阅读
- azure - 是否有任何 api 可用于使用工作区 id 获取特定日志分析工作区中的所有资源
- telegram-bot - 是否可以更改电报机器人的字体大小?
- git - 如何扩展 master 以包含 rebased 功能分支?
- kotlin - 我对 kotlin 中的泛型有疑问
- wordpress - Internet Explorer 未从 wordpress 多站点中找到正确的内容
- c - 如何正确传递 void 指针?
- java - Spring boot可以创建application.properties模板吗
- flutter - PageView 中的 InteractiveViewer
- c - 调用带有 va_list 参数的函数一开始需要 va_start() 吗?
- javascript - 调用两个动作的按钮