首页 > 解决方案 > 用漂亮的汤在网上刮掉约会表

问题描述

我想将日期表从不同的 html 网页抓取到 csv 文件中,但日期正在导入编码格式

我在 python 3 中使用了漂亮的汤,还为 html 页面打开了编码为 utf-8 的文件。我正在尝试从页面https://www.timeanddate.com/holidays/india/2010导入表格

示例代码:

rows = table.find_all('tr')

csvFile = open("test12.csv","w+", newline='', encoding = "utf-8")

try:
    writer=csv.writer(csvFile)
    for row in rows:
        csvRow = []
        for cell in row.findAll(['td','th']):
            csvRow.append(cell.get_text())
        writer.writerow(csvRow)

我得到以下结果。日期未以正确格式导入

日期 

1 जनवरी रविवार 5 जनवरी गà¥à¤ °à¥à¤µà¤¾à¤° 14 जनवरी शनिवार 15 जनवठ°à¥€ रविवार 23 जनवरी सोमवार 26 जनवरी गà¥à¤°à¥à¤µà¤¾à¤° 28 जनवरी शन िवार

标签: pythonhtmlclassweb-scrapingbeautifulsoup

解决方案


让 Pandas 完成所有工作:

import pandas as pd

url = 'https://www.timeanddate.com/holidays/india/2010'

# Gets all tables from site and stores as list of dataframes
table = pd.read_html(url)

# Get the dataframe in index position 0
table = table[0]

# Drop the rows with nulls
table = table.dropna(axis=0)

# Write to file
table.to_csv('file.csv', index=False)

这可以浓缩为 1 行:

pd.read_html('https://www.timeanddate.com/holidays/india/2010')[0].dropna(axis=0).to_csv('C:/file.csv', index=False)

输出:

print (table.head(10).to_string())
      Date Unnamed: 1_level_0                                  Name                Type
      Date Unnamed: 1_level_1                                  Name                Type
0    Jan 1             Friday                        New Year's Day  Restricted Holiday
1    Jan 5            Tuesday             Guru Govind Singh Jayanti  Restricted Holiday
2   Jan 14           Thursday                                Pongal  Restricted Holiday
3   Jan 20          Wednesday                       Vasant Panchami  Restricted Holiday
4   Jan 26            Tuesday                          Republic Day    Gazetted Holiday
6    Feb 8             Monday  Maharishi Dayanand Saraswati Jayanti  Restricted Holiday
7   Feb 12             Friday            Maha Shivaratri/Shivaratri    Gazetted Holiday
8   Feb 14             Sunday                      Chinese New Year          Observance
9   Feb 14             Sunday                       Valentine's Day          Observance
10  Feb 19             Friday                       Shivaji Jayanti  Restricted Holiday

推荐阅读