首页 > 解决方案 > BeautifulSoup:从元素中提取属性?

问题描述

我试图在 Stackoverflow 上查找此内容,但无法使其适合我的代码。也许有人可以帮助我吗?

我正在尝试从此 HTML 中获取“team1”、“team2”和“bettext”属性:

<table class="sportbet_extra_list_table" id="mc-ga312004790">
    <tbody>
        <tr>
            <td class="sportbet_extra_c0"></td>
            <td class="sportbet_extra_c1"><span>
                <a class="combi_1"></a>
                Hvem vinder kampen?                            </span></td>
            <td class="sportbet_extra_c2">
			                <div id="mc-ti312004790_1" class="js-ti312004790_1 sportbet_extra_rate_content" onclick="Bettingslip.addBet({type: 'N', team1: 'Rusland', team2: 'Saudi Arabien', bettext: 'Hvem vinder kampen?', combi_cat: 1, sub_group: 0, game: 312004790, groupId:461392, leagueId:30124, odd: 138, odd_id: 312004790, tiptext: '1', tip: 1, betstyle: 2224})">
                    <div class="sportbet_content_rate_left">1</div>
                    <div class="sportbet_content_rate_right">1,38</div>
                </div>
				
            </td>

到目前为止,这段代码是我用来从 sportbet_extra_list_table 中提取信息的代码:

    REQUEST = requests.get('https://www.cashpoint.dk/en/? 
              r=bets/xtra&group=461392&game=312004790').text
    SOUP = BeautifulSoup(REQUEST, 'lxml')
    # find_all to extract all
    SCRAPE = SOUP.find('table', class_='sportbet_extra_list_table')

    for CLEAN in SCRAPE:
        CLEANER = BeautifulSoup(str(CLEAN), 'lxml').text
        STRIP = " ".join(line.strip() for line in CLEANER.split("\n"))
        print(STRIP)

我试图添加

SOUP.find('table', class_='sportbet_extra_list_table', attrs={"onclick": "team1"})

但它没有用

标签: pythonbeautifulsoup

解决方案


尝试以下操作以按照您在帖子中提到的方式获取输出:

import json
import requests 
from bs4 import BeautifulSoup

url = "https://www.cashpoint.dk/en/?r=bets/xtra&group=461392&game=312004790"

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')

dataset = []
for items in soup.select("#container_xtra [id^='mc-ti']"):
    d = {}
    data = items.get("onclick").split("Bettingslip.addBet(")[1].split(")")[0]

    d['team1'] = data.split("team1:")[1].split(",")[0].split("'")[1].split("'")[0]
    d['team2'] = data.split("team2:")[1].split(",")[0].split("'")[1].split("'")[0]
    d['bettext'] = data.split("bettext:")[1].split(",")[0].split("'")[1].split("'")[0]
    if d not in dataset:
        dataset.append(d)

print(json.dumps(dataset,indent=4))

部分结果:

[
    {
        "team1": "Rusland",
        "team2": "Saudi Arabien",
        "bettext": "Hvem vinder kampen?"
    },
    {
        "team1": "Rusland",
        "team2": "Saudi Arabien",
        "bettext": "Dobbeltchance"
    },

推荐阅读