首页 > 解决方案 > 网页抓取指定表

问题描述

我的任务是从表中抓取第一条和最后一条记录并保存到 Excel。预期结果如下:'07-28 03:17', '3.90', '1.97', '2.75' '07-29 18:41', '3.90', '1.97', '2.75'

这是代码:

import pandas as pd
import datetime
import requests
from bs4 import BeautifulSoup
        
url = ('https://g10oal.com/match/116539/odds')
r = requests.get(url)
data = BeautifulSoup(r.text, 'lxml')
fha = data.findAll('table')[1] #半場主客和
file2 = open("c:/logs/link/G10oal-fha.txt","a+")
rows = fha.find_all('tr')
for row in rows:
    cols=row.find_all('td')
    cols=[x.text.strip() for x in cols]
    print(cols)
    file2.write(str(cols))
file2.close()

==================================================== =========================== 问题解决如下编码:

data = BeautifulSoup(r.content, 'html.parser')
    fha = data.findAll('table')[1] #半場主客和
    file3 = open("c:/logs/history/HKJC-FHA-2021-" + mth + day + ".txt","a+")
    rows = fha.find_all('tr')
    check_row = (len(rows))
    early_row = check_row - 1
    early_row = rows[early_row].text.split()
    last_row = rows[1].text.split()
    result = early_row, last_row
    print(link)
    print(result)
    file3.write(str(result) + "\n")
    file3.close()

标签: pythonbeautifulsoup

解决方案


也许将您的数据放入列表中并抓住第一个和最后一个?

data = BeautifulSoup(r.text, 'lxml')
result = [res for res in data.select('body > div.container.match-odds > div.match-info-header > div > div:nth-child(2) > div > p')]
print("First: %s\nLast: %s" % (result[0], restult[-1]))

推荐阅读