首页 > 解决方案 > 如何删除从网站上抓取的 \n 表单字符串

问题描述

这是从网站 (itemHtml.text) 抓取的文本:

 dolar amerykański 1 USD 3.8436
 euro 1 EUR 4.2989
 funt szterling 1 GBP 4.8768

如何从此文本中删除 \n?我试过这个:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = "https://www.nbp.pl/home.aspx?f=/kursy/kursya.html"
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")
soup.findAll("tr")

for itemHtml in soup.select('.pad5 tr'):
    currency = ['amerykański', 'euro', 'szterling']
    if itemHtml.find('td'):
        if any (cur in itemHtml.text for cur in currency):
            dane_comma = itemHtml.text
            dane_dot = dane_comma.replace(',', '.')
            dane = dane_dot.replace('\n', ' ')
            print(dane)



</i>

感谢帮助

标签: pythonpython-3.x

解决方案


该文本中没有换行符 (\n)。
您看到的是 3 个打印语句,它为您提供 3 行输出。
例如

import requests
from bs4 import BeautifulSoup

url = "https://www.nbp.pl/home.aspx?f=/kursy/kursya.html"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
soup.findAll("tr")
single_line = ""
cnt = 0

for itemHtml in soup.select('.pad5 tr'):
    currency = ['amerykański', 'euro', 'szterling']
    if itemHtml.find('td'):
        if any (cur in itemHtml.text for cur in currency):
            dane = itemHtml.text
            dane = dane.replace(',', '.')
            single_line += " "+dane
            cnt += 1
            print("Print count",cnt,dane)
print(single_line.strip())

给出:

Print count 1  dolar amerykański 1 USD 3.8436 
Print count 2  euro 1 EUR 4.2989 
Print count 3  funt szterling 1 GBP 4.8768 
dolar amerykański 1 USD 3.8436   euro 1 EUR 4.2989   funt szterling 1 GBP 4.8768

在代码中没有尝试删除换行符(single_line.strip() 仅用于删除前导和任何后续空格)


推荐阅读