pandas - 为什么我得到“NoneType”对象没有属性“absolute_links”错误?
问题描述
我是 Python 和一般编码的菜鸟,所以对我的回复很轻松,我不懂很多术语,所以对我来说用简单的术语回答啊哈。
我正在尝试使用已在其他站点上成功使用的代码来抓取站点,但现在它不适用于该站点。它说“NoneType”对象没有属性“absolute_links”,但我不知道为什么。我尝试过使用多个不同的类、部分的“jobs”字符串,我相信它们是正确的,因为它包含我需要的 a/hrefs。谁能告诉我哪里出错以及如何纠正?这是错误代码
for item in jobs.absolute_links:
AttributeError: 'NoneType' object has no attribute 'absolute_links'
这是我的代码,我已经删除了大部分类别列表,所以它没有那么长。
from requests_html import HTMLSession
import re
import pandas as pd
url = 'https://jobs.zalando.com/en/jobs/1621977-maintenance-shift-leader-in-intralogistics/?gh_src=22377bdd1us'
departmentcategories = {
"android": "Software Development",
"social media": "Marketing",
"content ": "Marketing",
"sales": "Sales",
"ecommerce": "Ecommerce",
}
languagecategories = {
" and ": "English",
" und ": "German",
" et ": "French",
" y ": "Spanish",
" e ": "Spanish",
"German": "German",
"Italian": "Italian",
"French": "French",
"Spanish": "Spanish",
"Dutch": "Dutch",
}
experiencecategories = {
"senior": "Mid Senior Level",
"Junior": "Entry Level",
"VP ": "Executive",
"Director": "Director",
"Head of ": "Mid Senior Level",
}
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)
jobs = r.html.find('ul.cards-container', first=True)
#Section for reviewing department, language, categories
def get_department_categories(department):
depcats = []
for k, v in departmentcategories.items():
if re.search(k, department, re.IGNORECASE):
depcats.append(v)
return depcats
def get_language_categories(language):
langcats = []
for k, v in languagecategories.items():
if re.search(k, language, re.IGNORECASE):
langcats.append(v)
return langcats
def get_experience_categories(experience):
expcats = []
for k, v in experiencecategories.items():
if re.search(k, experience, re.IGNORECASE):
expcats.append(v)
return expcats
#Section for job title, city, and country
jobtitles=[]
cities=[]
countries=[]
departments=[]
experiencelevels=[]
jobpostlinks=[]
languages=[]
urllinks=[]
for item in jobs.absolute_links:
r = s.get(item)
urllinks.append(item)
job_title = r.html.xpath('//*[@id="root"]/div/div[2]/div[3]/div[1]/div[1]/h1', first=True).text
jobtitles.append(job_title)
city = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[1]', first=True).text
cities.append(city)
country = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[1]', first=True).text
if country == ('Berlin, Germany'):
country = 'Germany'
countries.append(country)
#Section for the department, languages, and experience level
#Deparment section and job title
department = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[5]', first=True).text and r.html.xpath('//*[@id="root"]/div/div[2]/div[3]/div[1]/div[1]/h1', first=True).text
department_cats = get_department_categories(department)
departments.append(department_cats)
#What we're looking for section
language = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[1]/div/ul[2]', first=True).text
language_cats = get_language_categories(language)
languages.append(language_cats)
#experience section
experience = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[4]', first=True).text and r.html.xpath('//*[@id="job"]/div[1]/div[1]/div[1]/h1', first=True).text
experience_cats = get_experience_categories(experience)
experiencelevels.append(experience_cats)
print("-"*10)
print(job_title, city, country, "Zalando", ", ".join(department_cats), ", ".join(experience_cats), ", ".join(language_cats), "Fashion", item)
df = pd.DataFrame({'Job Title':jobtitles, 'City':cities, 'Country':countries, 'Department Tags':departments, 'Language Tags':languages, 'Experience Tags':experiencelevels, 'Link':urllinks})
df.to_csv("zalando.csv", encoding='utf-8')
解决方案
'jobs.absolute_links' 似乎是空的。
在给出错误的行之前调试添加:
print (jobs.absolute_links)
并查看它是否包含任何值
推荐阅读
- sql - 需要帮助旋转数据
- javascript - 如何从云函数返回 Firebase 时间戳
- google-cloud-platform - Cloud Composer 工作器无法连接到外部数据库
- python - 预处理时测试/删除不可解码的字节
- amazon-web-services - Elastic Beanstalk:`eb local run` 启动容器乱序,导致启动失败
- macos - 在 MacOS Catalina 上找不到 nix-env 或 nix-build
- multithreading - 使克隆的线程 pthread 兼容
- docker - fedora32 上 pod 的 minikube docker 映像上的 i/o 超时 ImagePullBackOff 问题
- css - 如何在 Vue.js 中向 Konva 层添加自定义 CSS 样式?
- python - 冒号运算符在python中导致未定义的行为