首页 > 解决方案 > 即使代码具有给定值,ValueError 也会引发

问题描述

我正在尝试编写一段代码来帮助我删除所有连词代词、标点符号等。

macbeth = open("macbeth.txt", "r")

contents = macbeth.read()

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    for x in punctuations:
        file_contents.remove(x)
        
        
    for x in uninteresting_words:
        file_contents.remove(x)

    return file_contents

print(remove_uninteresting_stuff(contents))

此代码引发此错误:

Traceback (most recent call last):
  File "testingFile.py", line 33, in <module>
    print(remove_uninteresting_stuff(contents))
  File "testingFile.py", line 25, in remove_uninteresting_stuff
    file_contents.remove(x)
ValueError: list.remove(x): x not in list

现在很明显,在像麦克白(莎士比亚的)这样的小说中,这些词将存在。

有人可以解释这个错误并帮我解决这个问题吗?

标签: pythonlistvalueerror

解决方案


您假设您的单词和标点列表都存在于 Macbeth 中,但事实并非如此。

另一种可能可行的编写方法是:

macbeth = open("macbeth.txt", "r")

contents = macbeth.read()

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    file_contents = [word for word in file_contents if word not in uninteresting_words and word not in punctuations]

    return file_contents

print(remove_uninteresting_stuff(contents))

不同之处在于,您在这里检查该词是否不存在于您的不需要的词列表中,而不是从您的内容中删除不需要的词,无论它是否存在。

由于您无法确定内容中是否存在不需要的单词,因此您必须先检查它是否存在,然后将其删除,这与仅保留不需要的单词列表中不存在的单词相同(如我已经在代码片段中完成)。

更新

如果您要删除的标点符号是单词的一部分,上面的代码片段将不起作用(惊喜!)

另一方面,这确实有效:

contents = "The the a to To IF is OF and and or here when where where how all ANY any both few whom who wHo!! -;."

contents = contents.split()  

def remove_uninteresting_stuff(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    file_contents = [word.translate(str.maketrans('', '', punctuations)) for word in file_contents]
    file_contents = [word for word in file_contents if word.lower() not in uninteresting_words]

    return file_contents

print(remove_uninteresting_stuff(contents))

推荐阅读