首页 > 解决方案 > 我无法弄清楚我的代码有什么问题

问题描述

我正在尝试从有关前 10 部电影的 html 文件中抓取数据,这是链接。我想抓取这些信息:标题、audience_scores 和tomato_meter_score。但每次我运行这段代码时:

df_list = []
    for movie_html in os.listdir(folder):
        with open(os.path.join(folder, movie_html)) as file:
            soup = BeautifulSoup(file, 'utf-8')
            title = soup.find('h1')
            audience_score = soup.find('div', class_="score-icon-audience left").find_all('span') 
   [1].contents[0][:-1]
            tomato_meter = soup.find('div', class_="score-icon-critic right").find_all('span') 
   [1].contents[0][:-1]
            df_list.append({'title' : title, 'audience_score' : audience_score, 'tomato_meter' : 
    tomato_meter})
    df = pd.DataFrame(df_list, columns = ['title', 'audience_score', 'tomato_meter'])        `folder = 
    'rtmovies_html'  #this is the directory containing the 10 html files.

不幸的是,我收到此错误:

    FeatureNotFound                           Traceback (most recent call last)
    <ipython-input-3-96935eee023c> in <module>
          2 for movie_html in os.listdir(folder):
          3     with open(os.path.join(folder, movie_html)) as file:
    ----> 4         soup = BeautifulSoup(file, 'utf-8')
          5         title = soup.find('h1')
          6         audience_score = soup.find('div', class_="score-icon-audience left").find_all('span') 
    [1].contents[0][:-1]

    ~\Anaconda3\lib\site-packages\bs4\__init__.py in __init__(self, markup, features, builder, 
    parse_only, from_encoding, exclude_encodings, element_classes, **kwargs)
        241             builder_class = builder_registry.lookup(*features)
        242             if builder_class is None:
    --> 243                 raise FeatureNotFound(
        244                     "Couldn't find a tree builder with the features you "
        245                     "requested: %s. Do you need to install a parser library?"

    FeatureNotFound: Couldn't find a tree builder with the features you requested: utf-8. Do you need to 
    install a parser library?

有人能帮我吗?

标签: pandasbeautifulsoupjupyter-notebookoperating-system

解决方案


您似乎缺少一个有效的解析器,以便 BeautifulSoup 可以发挥它的魔力。

尝试“pip install -U lxml”并像这样修改第 4 行:

soup = BeautifulSoup(file, 'lxml')

另外,尝试如下修改第 3 行:

with open(os.path.join(folder, movie_html), encoding='utf-8', errors='ignore')


推荐阅读