python - 初学者 Python 网页抓取问题
问题描述
我是网络抓取的新手,非常感谢一些帮助!我想进行搜索并返回其结果,但它返回运行时错误。我当前的代码如下所示:
from googlesearch import search
import requests
from bs4 import BeautifulSoup
print('Please enter your first name')
firstName = input()
print('Please enter your surname')
secondName = input()
query = firstName + ' ' + secondName
print('Please enter language ex:[en,fr,ar,jp,cn...]: ')
lang = input()
# requests
url = 'https://www.google.com/search?hl={}&q;={}&start;=3i#=10&ie;=UTF-8'.format(lang, query)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
# url source
source = requests.get(url, headers=headers).text
# BeautifulSoup
soup = BeautifulSoup(source, 'lxml')
# find all divs that contain search result
search_div = soup.find_all(class_='rc')
for result in search_div:
# loop result list
#geting h3
print('Title: %s'%result.h3.string)
print('\n')
#geting a.href
print('Url: %s'%result.a.get('href'))
print('\n')
# description
print('Description: %s'%result.find(class_='st').text)
print('\n###############\n')
然而我得到这个错误:
Traceback (most recent call last):
File "/Users/axy/PycharmProjects/Name_Search/main.py", line 20, in <module>
soup = BeautifulSoup(source, 'lxml')
File "/Users/axy/PycharmProjects/Name_Search/venv/lib/python3.8/site-packages/bs4/__init__.py", line 242, in __init__
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Process finished with exit code 1
我是这方面的初学者,非常感谢一些指导。我也希望有人能解释这条线的含义:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
谢谢!
解决方案
看起来你需要安装 lxml。
只需 pip install lxml 它应该可以工作
推荐阅读
- nginx - nginx 正在寻找不应该的目录
- php - isset 可以在 for 循环的条件下使用吗?
- angular - 在角度服务中返回 vs 暴露 observable
- vb.net - 使用 WinApi 函数 EnableWindow 禁用/启用窗口
- r - 在R中制作错误分类表
- python - Kivy 1.10.1 滑块复制自身
- dart - 如何在飞镖中展开列表
- .htaccess - symfony 3.4 使用 URL 中的语言环境重定向 301
- python - Pandas ewm 与 marketwatch 不匹配
- dialogflow-es - 将 Dialogflow/Action/Assistant 添加到例程中