首页 > 解决方案 > Having issues with my first python web scraper part 2 (The Sequel)

问题描述

I am trying to write a web scraper to take information from a database on supreme clothing called supremecommunity.com I made a post about it and it was not working, got some great help, and now it is almost working.

The code works for the most part but it starts having issues after Fall-Winter 17'

This is the error message I got in my Jupiter notebook.

UnicodeEncodeError Traceback (most recent call last) in 24 upvote = card.select_one('.progress-bar-success > span').get_text(strip=True) 25 downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True) ---> 26 writer.writerow([item_name,item_image,upvote,downvote]) 27 print(item_name,item_image,upvote,downvote)

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 17 class IncrementalEncoder(codecs.IncrementalEncoder): 18 def encode(self, input, final=False): ---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0] 20 21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u0392' in position 0: character maps to

Any advice would be greatly appreciated.

import csv
import requests
from bs4 import BeautifulSoup

base = 'https://www.supremecommunity.com{}'
links = ['https://www.supremecommunity.com/season/fall-winter2011/overview/','https://www.supremecommunity.com/season/spring-summer2012/overview/','https://www.supremecommunity.com/season/fall-winter2012/overview/',
         'https://www.supremecommunity.com/season/spring-summer2013/overview/','https://www.supremecommunity.com/season/fall-winter2013/overview/','https://www.supremecommunity.com/season/spring-summer2014/overview/',
         'https://www.supremecommunity.com/season/fall-winter2014/overview/','https://www.supremecommunity.com/season/spring-summer2015/overview/','https://www.supremecommunity.com/season/fall-winter2015/overview/',
         'https://www.supremecommunity.com/season/spring-summer2016/overview/','https://www.supremecommunity.com/season/fall-winter2016/overview/','https://www.supremecommunity.com/season/spring-summer2017/overview/',
         'https://www.supremecommunity.com/season/fall-winter2017/overview/', 'https://www.supremecommunity.com/season/spring-summer2018/overview/','https://www.supremecommunity.com/season/fall-winter2018/overview/',
         'https://www.supremecommunity.com/season/spring-summer2019/overview/','https://www.supremecommunity.com/season/fall-winter2019/overview/']


with open("supremecommunity.csv","w",newline="") as f:     
    writer = csv.writer(f)
    writer.writerow(['item_name','item_image','upvote','downvote'])
    for link in links:
        r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
        soup = BeautifulSoup(r.text,"lxml")

        for card in soup.select('[class$="d-card"]'):
            item_name = card.select_one('.card__top')['data-itemname']
            item_image = base.format(card.select_one('img.prefill-img').get('data-src'))
            upvote = card.select_one('.progress-bar-success > span').get_text(strip=True)
            downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True)
            writer.writerow([item_name,item_image,upvote,downvote])
            print(item_name,item_image,upvote,downvote)

标签: pythonwebweb-scraping

解决方案


推荐阅读