首页 > 解决方案 > 将数据框上传到 csv 文件时删除 html 标签

问题描述

我遇到了一个问题,基本上是我正在进行网络抓取并删除所有 Html 标签,但是当我制作数据框并将它们上传到 CSV 文件中时,它会正确地将它们保存到文件中,但是当我打开 CSV 文件时Html 标签没有。

import pandas as pd
from os import link
import requests
from bs4 import BeautifulSoup
from requests.api import get
from pandas import DataFrame
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
    'referrer': 'https://google.com',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Pragma': 'no-cache'}

r = requests.get('https://www.bloomberg.com/news/articles/2021-10-10/stocks-set-for-mixed-start-as-traders-mull-growth-markets-wrap?srnd=premium-asia', headers=headers)
htmlContent= r.content
soup = BeautifulSoup(htmlContent, 'html.parser')
soup.get_text()
title=soup.find('title')
print(title.get_text())
par=soup.find(class_='body-copy-v2')
print(par.get_text())
meta=soup.find(class_='article-timestamp')
print(meta.get_text())
link=('https://www.bloomberg.com/news/articles/2021-10-10/stocks-set-for-mixed-start-as-traders-mull-growth-markets-wrap?srnd=premium-asia')
print(link)
News={'Title':[title],'Description':[par],'Published':[meta],"Link":[link]}
Important=DataFrame(News)
Important
Important.to_csv("E:\\Scraping\\hello.csv")
Someone please Help me

标签: html

解决方案


推荐阅读