首页 > 解决方案 > 如何在抓取网页时从输出中删除 \n?

问题描述

我正在抓取一个网页,当我得到结果时,一切看起来都很好,除了我的卡名列,因为我\n在卡名之前得到了一个。我如何防止它被输出?

    # Scraping
    def parse(self, response):
        item = GameItem()

        item["Category"] = response.css("span.titletext::text").extract()
        for game in response.css("tr[class^=deckdbbody]"):
            item["card_name"] = game.css("a.card_popup::text").extract_first()
            if item["card_name"] != None:
                saved_name = item["card_name"]
            else:
                item["card_name"] = saved_name

            item["Condition"] = game.css("td[class^=deckdbbody].search_results_7 a::text").get()
            item["stock"] = game.css("td[class^=deckdbbody].search_results_8::text").extract_first()
            item["Price"] = game.css("td[class^=deckdbbody].search_results_9::text").extract_first()

            yield item

样本输出

{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAether Membrane", "Condition": "NM/M", "stock": "93", "Price": "$0.59"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAether Membrane", "Condition": "PL", "stock": "59", "Price": "$0.49"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAngelic Shield", "Condition": "NM/M", "stock": "35", "Price": "$0.25"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAnger", "Condition": "NM/M", "stock": "9", "Price": "$1.49"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAnger", "Condition": "PL", "stock": "49", "Price": "$1.19"},

标签: pythonweb-scrapingscrapysplash-screenscrapy-splash

解决方案


内置字符串方法strip()( str.strip()) 删除不可打印的字符。


推荐阅读