首页 > 解决方案 > 将列表附加到 DataFrame 时遇到问题

问题描述

我体验过在一个循环中处理多个标签/属性并将它们附加到 DataFrame 中。更具体地说,它涉及 Place 循环:

for car_item in soup2.findAll('ul', {'class': 'seller-info-links'}):
                place = car_item.find('h3', {'class':'heading'}).text.strip()
                places.append(place)

将其附加到 DataFrame 仅产生预期 30 个结果中的 1 个。

先感谢您。

import requests
import bs4
import pandas as pd

frames = []

for pagenumber in range (0,2):
        url = 'https://www.marktplaats.nl/l/auto-s/p/'
        txt = requests.get(url + str(pagenumber))
        soup = bs4.BeautifulSoup(txt.text, 'html.parser')
        soup_table = soup.find('ul', 'mp-Listings mp-Listings--list-view')

        for car in soup_table.findAll('li'):

            link = car.find('a')
            sub_url = 'https://www.marktplaats.nl/' + link.get('href')

            sub_soup = requests.get(sub_url)
            sub_soup_txt = bs4.BeautifulSoup(sub_soup.text, 'html.parser')

            soup1 = sub_soup_txt.find('div', {'id': 'car-attributes'})
            soup2 = sub_soup_txt.find('div', {'id': 'vip-seller'})

            tmp = []
            places = []

            for car_item in soup1.findAll('div', {'class': 'spec-table-item'}):

                key = car_item.find('span', {'class': 'key'}).text
                value = car_item.find('span', {'class': 'value'}).text
                tmp.append([key, value])

            for car_item in soup2.findAll('ul', {'class': 'seller-info-links'}):
                place = car_item.find('h3', {'class':'heading'}).text.strip()
                places.append(place)

            frames.append(pd.DataFrame(tmp).set_index(0))

df_final = pd.concat((tmp_df for tmp_df in frames), axis=1, join='outer').reset_index()
df_final = df_final.T
df_final.columns = df_final.loc["index"].values
df_final.drop("index", inplace=True)
df_final.reset_index(inplace=True, drop=True)

df_final['Places'] = pd.Series(places)

df_final.to_csv('auto_database.csv')

标签: pythonweb-scrapingbeautifulsoup

解决方案


当您添加places到 finaldf时,这条线(当前位于for pagenumber in ... for car in ...

places = []

应该一直向上并离开for这里的主循环:

frames = []
places = []

推荐阅读