python - 如何在抓取网页时从输出中删除 \n?
问题描述
我正在抓取一个网页,当我得到结果时,一切看起来都很好,除了我的卡名列,因为我\n
在卡名之前得到了一个。我如何防止它被输出?
# Scraping
def parse(self, response):
item = GameItem()
item["Category"] = response.css("span.titletext::text").extract()
for game in response.css("tr[class^=deckdbbody]"):
item["card_name"] = game.css("a.card_popup::text").extract_first()
if item["card_name"] != None:
saved_name = item["card_name"]
else:
item["card_name"] = saved_name
item["Condition"] = game.css("td[class^=deckdbbody].search_results_7 a::text").get()
item["stock"] = game.css("td[class^=deckdbbody].search_results_8::text").extract_first()
item["Price"] = game.css("td[class^=deckdbbody].search_results_9::text").extract_first()
yield item
样本输出
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAether Membrane", "Condition": "NM/M", "stock": "93", "Price": "$0.59"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAether Membrane", "Condition": "PL", "stock": "59", "Price": "$0.49"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAngelic Shield", "Condition": "NM/M", "stock": "35", "Price": "$0.25"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAnger", "Condition": "NM/M", "stock": "9", "Price": "$1.49"},
{"Category": ["Duel Decks: Venser vs. Koth"], "card_name": "\nAnger", "Condition": "PL", "stock": "49", "Price": "$1.19"},
解决方案
内置字符串方法strip()
( str.strip()
) 删除不可打印的字符。
推荐阅读
- android - 从 Firebase 检索数据时的应用加载程序文本
- neo4j - 具有 2 种节点类型的 neo4j 社区检测
- centos7 - 从 PHP 使用 OpenLDAP 进行身份验证
- windows - WSL(Ubuntu):如何从 bash 终端在浏览器中打开 localhost
- tensorflow - 如何在 TensorFlow 中从头开始训练 Deeplab 模型?
- git - Gerrit 发布非本地更改补丁
- git - 使用 PowerShell 远程处理在 Windows Server 上安装 git
- jhipster - 添加新实体
- bbc-microbit - 从命令行生成 micropython + python 代码 `.hex` 文件
- php - 使用 php 使引导导航栏动态化