首页 > 解决方案 > 为什么我得到“NoneType”对象没有属性“absolute_links”错误?

问题描述

我是 Python 和一般编码的菜鸟,所以对我的回复很轻松,我不懂很多术语,所以对我来说用简单的术语回答啊哈。

我正在尝试使用已在其他站点上成功使用的代码来抓取站点,但现在它不适用于该站点。它说“NoneType”对象没有属性“absolute_links”,但我不知道为什么。我尝试过使用多个不同的类、部分的“jobs”字符串,我相信它们是正确的,因为它包含我需要的 a/hrefs。谁能告诉我哪里出错以及如何纠正?这是错误代码

 for item in jobs.absolute_links:
AttributeError: 'NoneType' object has no attribute 'absolute_links'

这是我的代码,我已经删除了大部分类别列表,所以它没有那么长。

from requests_html import HTMLSession
import re
import pandas as pd

url = 'https://jobs.zalando.com/en/jobs/1621977-maintenance-shift-leader-in-intralogistics/?gh_src=22377bdd1us'
departmentcategories = {
    "android": "Software Development",
    "social media": "Marketing",
    "content ": "Marketing",
    "sales": "Sales",
    "ecommerce": "Ecommerce",
}

languagecategories = {
    " and ": "English",
    " und ": "German",
    " et ": "French",
    " y ": "Spanish",
    " e ": "Spanish",
    "German": "German",
    "Italian": "Italian",
    "French": "French",
    "Spanish": "Spanish",
    "Dutch": "Dutch",
}

experiencecategories = {
    "senior": "Mid Senior Level",
    "Junior": "Entry Level",
    "VP ": "Executive",
    "Director": "Director",
    "Head of ": "Mid Senior Level",
}

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1)

jobs = r.html.find('ul.cards-container', first=True)

#Section for reviewing department, language, categories

def get_department_categories(department):
    depcats = []
    for k, v in departmentcategories.items():
        if re.search(k, department, re.IGNORECASE):
            depcats.append(v)
    return depcats
 
def get_language_categories(language):
    langcats = []
    for k, v in languagecategories.items():
        if re.search(k, language, re.IGNORECASE):
            langcats.append(v)
    return langcats 

def get_experience_categories(experience):
    expcats = []
    for k, v in experiencecategories.items():
        if re.search(k, experience, re.IGNORECASE):
            expcats.append(v)
    return expcats 

#Section for job title, city, and country

jobtitles=[]
cities=[]
countries=[]
departments=[]
experiencelevels=[]
jobpostlinks=[]
languages=[]
urllinks=[]

for item in jobs.absolute_links:
    r = s.get(item)
    urllinks.append(item)

    job_title = r.html.xpath('//*[@id="root"]/div/div[2]/div[3]/div[1]/div[1]/h1', first=True).text

    jobtitles.append(job_title)

    city = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[1]', first=True).text

    cities.append(city)  
        
    country = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[1]', first=True).text

    if country == ('Berlin, Germany'):
        country = 'Germany'

    countries.append(country)  
    
    #Section for the department, languages, and experience level

    #Deparment section and job title
    department = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[5]', first=True).text and r.html.xpath('//*[@id="root"]/div/div[2]/div[3]/div[1]/div[1]/h1', first=True).text
    department_cats = get_department_categories(department)
    departments.append(department_cats)

    #What we're looking for section
    language = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[1]/div/ul[2]', first=True).text
    language_cats = get_language_categories(language)
    languages.append(language_cats)

    #experience section
    experience = r.html.xpath('/html/body/div[1]/div/div[2]/div[3]/div[2]/div[2]/div/div[1]/div[4]', first=True).text and r.html.xpath('//*[@id="job"]/div[1]/div[1]/div[1]/h1', first=True).text
    experience_cats = get_experience_categories(experience)
    experiencelevels.append(experience_cats)

print("-"*10)    
print(job_title, city, country, "Zalando", ", ".join(department_cats), ", ".join(experience_cats), ", ".join(language_cats), "Fashion", item)
    
df = pd.DataFrame({'Job Title':jobtitles, 'City':cities, 'Country':countries, 'Department Tags':departments, 'Language Tags':languages, 'Experience Tags':experiencelevels, 'Link':urllinks})
df.to_csv("zalando.csv", encoding='utf-8')

标签: pandasweb-scrapingattributeerrorpython-requests-html

解决方案


'jobs.absolute_links' 似乎是空的。

在给出错误的行之前调试添加:

print (jobs.absolute_links)

并查看它是否包含任何值


推荐阅读