首页 > 解决方案 > 无法访问我 BeautifulSoup 中的第一个 Div

问题描述

我正在尝试进行一些网络抓取(这里是新手!),到目前为止一切进展顺利,但我一直坚持这一点。我这样叫汤(结果在最后):

'''

events = soup.findAll("div", {"class": "detailMS__incidentRow"})

'''

然后我遍历它以找到我需要的信息。但我现在意识到我需要来自第一个 div 的一条信息,但我不知道如何访问它:

'''

event_lst = []
for e in events:   
    instance = []
    for c in e.contents:
        instance.append(c.string)
        if "icon-box" in c['class']:
            instance.append(c['class'][1])
            
    del instance[1] #this one is always empty
    
    event_lst.append(instance)

'''

一个示例结果:

[“34”,“替换”,“Skov R.”,无]

预期结果:

["detailMS__incidentRow eventRow--home 奇数", "34'", 'substitution-in', 'Skov R.', None]

希望这是有道理的,有人可以提供帮助。

事件:

[<div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">34'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="Sebastian Hoeness is forced to make a change.&lt;br /&gt;Stefan Posch is unable to continue due to&lt;br /&gt;injury, Robert Skov (Hoffenheim) comes on."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/skov-robert/SdlhgTvE/'); return false;">Skov R.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/posch-stefan/nsVWqZE2/'); return false;">Posch S.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">58'</div><div class="icon-box r-card" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="The referee goes straight to his pocket&lt;br /&gt;and shows Robert Skov (Hoffenheim) a red&lt;br /&gt;card. No point in arguing. He has to leave&lt;br /&gt;the pitch."><span class="icon r-card"> </span></div><span class="participant-name"><a href="#" onclick="window.open('/en/player/skov-robert/SdlhgTvE/'); return false;">Skov R.</a></span><span class="subincident-penalty subincident-name">(Foul)</span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">60'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Goal! Max Kruse (Union Berlin) wins the&lt;br /&gt;battle of wills and sends an unstoppable&lt;br /&gt;penalty past Oliver Baumann into the bottom&lt;br /&gt;right corner."><span class="icon soccer-ball"> </span></div><span class="subincident-name">(Penalty)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/kruse-max/Cz8ZjrLB/'); return false;">Kruse M.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">62'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="The referee stops play so that a substitution&lt;br /&gt;can be made and Dennis Geiger trots off&lt;br /&gt;the pitch and is replaced by Christoph Baumgartner&lt;br /&gt;(Hoffenheim)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/baumgartner-christoph/rqcu4a9O/'); return false;">Baumgartner C.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/geiger-dennis/GtZAyRi8/'); return false;">Geiger D.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">62'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="Sebastian Hoeness decides to make a substitution.&lt;br /&gt;Ishak Belfodil will be replaced by Sargis&lt;br /&gt;Adamyan (Hoffenheim)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/adamyan-sargis/6JByionA/'); return false;">Adamyan S.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/belfodil-ishak/zJpmU9tM/'); return false;">Belfodil I.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">63'</div><div class="icon-box y-card" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Andreas Luthe (Union Berlin) goes in the&lt;br /&gt;referee's notebook after being shown a yellow&lt;br /&gt;card."><span class="icon y-card"> </span></div><span class="subincident-penalty subincident-name">(Delay of game)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/luthe-andreas/CrnFB7xg/'); return false;">Luthe A.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">71'</div><div class="icon-box y-card"><span class="icon y-card"> </span></div><span class="subincident-penalty subincident-name">(Tripping)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/griesbeck-sebastian/S02VCxKU/'); return false;">Griesbeck S.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">77'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="It's time for a substitution. Joel Pohjanpalo&lt;br /&gt;(Union Berlin) comes on in place of Taiwo&lt;br /&gt;Awoniyi."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/pohjanpalo-joel/WfrgzpCh/'); return false;">Pohjanpalo J.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/awoniyi-taiwo/OWFf8WFm/'); return false;">Awoniyi T.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">80'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="GOAL! Superb work from Sargis Adamyan, who&lt;br /&gt;plays a vital role in the build-up. He squares&lt;br /&gt;it to Munas Dabbur (Hoffenheim), who beats&lt;br /&gt;Andreas Luthe with a brilliant shot into&lt;br /&gt;the top of the net. 1:1."><span class="icon soccer-ball"> </span></div><span class="participant-name"><a href="#" onclick="window.open('/en/player/dabbur-munas/vcya3M3U/'); return false;">Dabbur M.</a></span><span class="assist note-name">(<a href="#" onclick="window.open('/en/player/adamyan-sargis/6JByionA/'); return false;">Adamyan S.</a>)</span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">81'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="It will be a substitution. Mijat Gacinovic&lt;br /&gt;is ready to enter the pitch as Florian Grillitsch&lt;br /&gt;(Hoffenheim) walks off."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/klauss-joao/ILFmc7Lp/'); return false;">Klauss J.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/dabbur-munas/vcya3M3U/'); return false;">Dabbur M.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">81'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="The substitution has been made. Joao Klauss&lt;br /&gt;has replaced Munas Dabbur (Hoffenheim)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/gacinovic-mijat/nBEIWHXE/'); return false;">Gacinovic M.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/grillitsch-florian/WfACar9B/'); return false;">Grillitsch F.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">83'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Today's match ends for Christian Gentner&lt;br /&gt;who will be replaced by Cedric Teuchert&lt;br /&gt;(Union Berlin)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/teuchert-cedric/6FRvEoh1/'); return false;">Teuchert C.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/gentner-christian/GxIiEgQ2/'); return false;">Gentner C.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">85'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="What a hit! Joel Pohjanpalo (Union Berlin)&lt;br /&gt;shows good composure inside the box to fire&lt;br /&gt;a first-time shot into the bottom left corner.&lt;br /&gt;1:2."><span class="icon soccer-ball"> </span></div><span class="assist note-name">(<a href="#" onclick="window.open('/en/player/kruse-max/Cz8ZjrLB/'); return false;">Kruse M.</a>)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/pohjanpalo-joel/WfrgzpCh/'); return false;">Pohjanpalo J.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">87'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Marius Bulter (Union Berlin) joins the action&lt;br /&gt;as a substitute, replacing Sebastian Griesbeck."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/bulter-marius/QHFJHvmo/'); return false;">Bulter M.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/griesbeck-sebastian/S02VCxKU/'); return false;">Griesbeck S.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">87'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Urs Fischer prepares a substitution. Julian&lt;br /&gt;Ryerson is replaced by Niko Giesselmann&lt;br /&gt;(Union Berlin)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/giesselmann-niko/vgFlmAm5/'); return false;">Giesselmann N.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/ryerson-julian/8nv0ZZYK/'); return false;">Ryerson J.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">90'</div><div class="icon-box y-card"><span class="icon y-card"> </span></div><span class="participant-name"><a href="#" onclick="window.open('/en/player/baumgartner-christoph/rqcu4a9O/'); return false;">Baumgartner C.</a></span><span class="subincident-penalty subincident-name">(Tripping)</span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box-wide">90+4'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="GOAL! Superb work from Max Kruse, who plays&lt;br /&gt;a vital role in the build-up. He squares&lt;br /&gt;it to Cedric Teuchert (Union Berlin), who&lt;br /&gt;beats Oliver Baumann with a brilliant shot&lt;br /&gt;into the bottom left corner. 1:3."><span class="icon soccer-ball"> </span></div><span class="assist note-name">(<a href="#" onclick="window.open('/en/player/kruse-max/Cz8ZjrLB/'); return false;">Kruse M.</a>)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/teuchert-cedric/6FRvEoh1/'); return false;">Teuchert C.</a></span></div>]

标签: beautifulsouppython-3.7

解决方案


是的,这有点奇怪。但试一试:

from bs4 import BeautifulSoup
import re
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options



driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
url = 'https://www.scoreboard.com/en/match/UgLQho4p/#match-summary'
driver.get(url)

time.sleep(5)
response = driver.page_source

soup = BeautifulSoup(response, 'html.parser')

regex = re.compile('.*detailMS__incidentRow.*')
events = soup.find_all("div", {"class": regex})
event_lst = []
for e in events:   
    instance = []
    instance.append(' '.join(e.find('div').previous['class']))
    for c in e.contents:
        instance.append(c.string)
        if "icon-box" in c['class']:
            instance.append(c['class'][1])
            
    del instance[2] #this one is always empty
    
    event_lst.append(instance)

driver.close()

输出:

print (event_lst)
[['detailMS__incidentRow incidentRow--home odd', "34'", 'substitution-in', 'Skov R.', None], ['detailMS__incidentRow incidentRow--home even', "58'", 'r-card', 'Skov R.', '(Foul)'], ['detailMS__incidentRow incidentRow--away odd', "60'", 'soccer-ball', '(Penalty)', 'Kruse M.'], ['detailMS__incidentRow incidentRow--home even', "62'", 'substitution-in', 'Baumgartner C.', None], ['detailMS__incidentRow incidentRow--home odd', "62'", 'substitution-in', 'Adamyan S.', None], ['detailMS__incidentRow incidentRow--away even', "63'", 'y-card', '(Delay of game)', 'Luthe A.'], ['detailMS__incidentRow incidentRow--away odd', "71'", 'y-card', '(Tripping)', 'Griesbeck S.'], ['detailMS__incidentRow incidentRow--away even', "77'", 'substitution-in', 'Pohjanpalo J.', None], ['detailMS__incidentRow incidentRow--home odd', "80'", 'soccer-ball', 'Dabbur M.', None], ['detailMS__incidentRow incidentRow--home even', "81'", 'substitution-in', 'Klauss J.', None], ['detailMS__incidentRow incidentRow--home odd', "81'", 'substitution-in', 'Gacinovic M.', None], ['detailMS__incidentRow incidentRow--away even', "83'", 'substitution-in', 'Teuchert C.', None], ['detailMS__incidentRow incidentRow--away odd', "85'", 'soccer-ball', None, 'Pohjanpalo J.'], ['detailMS__incidentRow incidentRow--away even', "87'", 'substitution-in', 'Bulter M.', None], ['detailMS__incidentRow incidentRow--away odd', "87'", 'substitution-in', 'Giesselmann N.', None], ['detailMS__incidentRow incidentRow--home even', "90'", 'y-card', 'Baumgartner C.', '(Tripping)'], ['detailMS__incidentRow incidentRow--away odd', "90+4'", 'soccer-ball', None, 'Teuchert C.']]

推荐阅读