python - 如何使用 Python 在 CSV 中拆分字符串并删除不完整的行？

问题描述

目前，我在 Python 中使用 Scrapy 从网站中提取 html 表以写入 csv。我设法得到了我想要的数据，但是最终的格式并不是我需要的 100%。链接是https://fbref.com/en/comps/9/schedule/Premier-League-Scores-and-Fixtures，这是我的代码：

import scrapy

class XGSpider(scrapy.Spider):

    name = 'expectedGoals'

    start_urls = [
        'https://fbref.com/en/comps/9/schedule/Premier-League-Scores-and-Fixtures',
    ]

    def parse(self, response):

        for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):
            yield {
                'home': row.xpath('td[4]//text()').extract_first(),
                'homeXg': row.xpath('td[5]//text()').extract_first(),
                'score': row.xpath('td[6]//text()').extract_first(),
                'awayXg': row.xpath('td[7]//text()').extract_first(),
                'away': row.xpath('td[8]//text()').extract_first()
            }

因此，要将文件保存到 csv 中，我在终端中输入以下内容：

scrapy crawl expectedGoals --output exG.csv

我得到这个csv：

home,homeXg,score,awayXg,away
Liverpool,1.9,4–1,1.0,Norwich City
West Ham,0.7,0–5,3.3,Manchester City
Burnley,0.7,3–0,0.8,Southampton
Watford,0.9,0–3,0.7,Brighton
Bournemouth,1.0,1–1,1.0,Sheffield Utd
Crystal Palace,0.8,0–0,0.9,Everton
Tottenham,2.6,3–1,0.6,Aston Villa
Newcastle Utd,0.5,0–1,0.9,Arsenal
Leicester City,0.6,0–0,0.6,Wolves
Manchester Utd,2.1,4–0,0.8,Chelsea
,,,,
Arsenal,1.0,2–1,1.3,Burnley

.
.
.
.

我想将scoreintohomeScore和awayScore字段-用作分隔符。另外，我想弄清楚如果字段为空，如何完全删除一行，例如上面。我不知道该怎么做？

标签： pythonhtmlcsvscrapy

首先，加载您的 csv 并将其转换为 Pandas

df = pd.read_csv("exG.csv")

为了替代 csv，我正在创建人工 data_csv 并将其转换为 Pandas

data_csv = [{'home': 'Liverpool', 'score': '4-1', 'away': 'Norwich City'},
            {'home': 'West Ham,', 'score': '9-5', 'away': "Manchester City"},
            {'home': 'Burnley', 'score': '3-0', 'away': 'Southampton'}]



df = pd.DataFrame(data_csv)

我想使用-分隔符将分数分成 homeScore 和 awayScore 字段

df[['homeScore', 'awayScore']] = df['score'].str.split("-", expand=True)

python - 如何使用 Python 在 CSV 中拆分字符串并删除不完整的行？

问题描述

解决方案

推荐阅读