python - 使用用户定义的 url 和文件名构建 python 网络爬虫函数
问题描述
我希望用户在这个刮板中输入 URL 和 csv 名称。
#Dependencies
from lxml import html
import requests
import pandas as pd
x =input('https://web.archive.org/web/20170111201527/https://www.yellowpages.com/nashville-tn/air-conditioning-service-repair')
def Scraper(x):
#URL
url = x
#Use Requests to retrieve html
resp = requests.get(url)
#Create Tree from Request Response
tree = html.fromstring(resp.content)
#Path to Website Link
elements = tree.xpath('//*[starts-with(@id,"lid-")]/div/div/div[2]/div[2]/div[2]/a[1]')
websites = []
for element in elements:
try:
websites.append("http"+element.attrib['href'].split("http")[2])
except:
continue
#Create Pandas Dataframe
webdf= pd.DataFrame(websites,columns =['Links']).drop_duplicates()
print(webdf)
#Export as CSV
y=input()
webdf.to_csv(y+".csv")
我的输出返回“NameError: name 'websites' is not defined”,但这在代码中很明显。我什至尝试在函数之前将其添加为空列表,但没有成功。
解决方案
您甚至没有调用 Scraper 函数并返回值,首先更改函数,例如
#Dependencies
from lxml import html
import requests
import pandas as pd
x =input('https://web.archive.org/web/20170111201527/https://www.yellowpages.com/nashville-tn/air-conditioning-service-repair')
def Scraper(x):
#URL
url = x
#Use Requests to retrieve html
resp = requests.get(url)
#Create Tree from Request Response
tree = html.fromstring(resp.content)
#Path to Website Link
elements = tree.xpath('//*[starts-with(@id,"lid-")]/div/div/div[2]/div[2]/div[2]/a[1]')
websites = []
for element in elements:
try:
websites.append("http"+element.attrib['href'].split("http")[2])
except:
continue
return websites
并打电话给
websites = Scraper(x)
webdf = pd.DataFrame(websites,columns =['Links']).drop_duplicates()
print(webdf)
#Export as CSV
y=input()
webdf.to_csv(y+".csv")
推荐阅读
- c - 如何从动态创建结构的函数返回错误代码
- javascript - 拒绝访问跨域对象上的属性“文档”的权限
- php - 如何去除水印?
- eclipse - 我可以使用哪些工具和日志来找出 Eclipse 中的服务器无法启动的原因
- java - 仅限公共访问约束的注释
- angular - 在ngrx Effect 中的switchmap(api call) 之后访问一个action payload
- sql - 如何组合2个表的部分数据?
- php - 根据配置数组选择方法
- javascript - 我可以在 React 头盔标签中使用 react-i18next 反式标签吗?
- reactjs - npx create-react-app 没有进一步进行并停止在“使用 cra-template 安装 react、react-dom 和 react-scripts