dataframe - 如果找不到元素,beautifulsoup 如何故意添加 return none
问题描述
[none]
如果找不到该元素,如何故意添加?我有一个有时存在有时不存在的元素。(链接在这里)
低于电流输出df
:
name tag
ZX Torsion Releasing Soon
Campus Restock
Campus Restock
Consortium Runner Mid 4D Sold out
Ozweego Sold out
Ozweego Sold out
Yeezy Boost 350 V2 Infant Sold out
Yeezy Boost 350 V2 Kids Sold out
Yeezy Boost 350 V2 Sold out
Yung-1 Sold out
Yung 1 Sold out
A.R. Trainer Sold out
A.R. Trainer Sold out
期望的输出
name tag
ZX Torsion Releasing Soon
Campus Restock
Campus Restock
Consortium Runner Mid 4D null
Ozweego null
Ozweego null
Yeezy Boost 350 V2 Infant Sold out
Yeezy Boost 350 V2 Kids Sold out
Yeezy Boost 350 V2 Sold out
Yung-1 null
Yung 1 null
A.R. Trainer null
A.R. Trainer null
....and so on
工作代码:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
urls = [
'https://www.nakedcph.com/sneakers-by-adidas/s/37'
]
baseURL = 'https://www.nakedcph.com'
final = []
with requests.Session() as s:
for url in urls:
driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get(url)
soup = bs(driver.page_source, 'lxml')
items = soup.findAll("div", {"class" : lambda L: L and L.startswith('col-6 col-md-3 mb-5')})
name = [item.find('span',{'class':'product-name d-block'}).text.strip() for item in items]
tag = [item.find('svg').next_sibling.strip() for item in soup.findAll('div',{'class':'card-ribbon'})]
results = list(zip(name,tag))
df = pd.DataFrame(results)
driver.quit()
df
解决方案
你可以使用try except
. 我从来没有将它合并到列表理解中,我可能会尝试回去做:
import requests
import pandas as pd
from selenium import webdriver
urls = [
'https://www.nakedcph.com/sneakers-by-adidas/s/37'
]
baseURL = 'https://www.nakedcph.com'
final = []
with requests.Session() as s:
for url in urls:
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
soup = bs(driver.page_source, 'lxml')
items = soup.findAll("div", {"class" : lambda L: L and L.startswith('col-6 col-md-3 mb-5')})
name = []
tag = []
for each in items:
name.append(each.find('span',{'class':'product-name d-block'}).text.strip())
try:
tag.append(each.find('svg').next_sibling.strip())
except:
tag.append(None)
results = list(zip(name,tag))
df = pd.DataFrame(results)
driver.quit()
输出:
print (df)
0 1
0 ZX Torsion Releasing Soon
1 Campus Restock
2 Campus Restock
3 Consortium Runner Mid 4D None
4 Ozweego None
5 Ozweego None
6 Yeezy Boost 350 V2 Infant Sold out
7 Yeezy Boost 350 V2 Kids Sold out
8 Yeezy Boost 350 V2 Sold out
9 Yung-1 None
10 Yung 1 None
11 A.R. Trainer None
12 A.R. Trainer None
13 Adilette Pride None
14 Supercourt None
15 Supercourt RX None
16 ZX 4000 4D None
17 Yeezy Boost 700 V2 Sold out
18 Yeezy Boost 350 V2 Infant Sold out
19 Yeezy Boost 350 V2 Kids Sold out
20 Yeezy Boost 350 V2 Sold out
21 Yeezy Boost 700 V2 Sold out
22 Yeezy Boost 700 V2 Kids Sold out
23 Yeezy Boost 700 V2 Infant Sold out
推荐阅读
- python - 如何在 Altair 中制作“小倍数”(地图)图表
- javascript - 如何使用 Rollup.js 创建一个通用的 js 和命名的导出包?
- performance - Solr - 每个用户组的架构
- spring-data - Jaeger 跟踪未捕获弹簧数据
- microsoft-graph-api - 通过 MS Graph API 更新事件删除事件中的加入按钮
- kotlin - 箭头镜头不允许我将可空属性设置为空
- php - CURLINFO_HTTP_CODE 有时检测不到 HTTP 状态码
- python - Pandas 到 sql,同时保持小数
- spring-boot - 来自服务器的错误响应 - 似乎我的 FrontEnd 得到的是 Options 响应而不是 Post
- python - 如何通过在提供的 RUN_Comand 中添加文件名来运行目录中的每个文件