beautifulsoup - 无法访问我 BeautifulSoup 中的第一个 Div
问题描述
我正在尝试进行一些网络抓取(这里是新手!),到目前为止一切进展顺利,但我一直坚持这一点。我这样叫汤(结果在最后):
'''
events = soup.findAll("div", {"class": "detailMS__incidentRow"})
'''
然后我遍历它以找到我需要的信息。但我现在意识到我需要来自第一个 div 的一条信息,但我不知道如何访问它:
'''
event_lst = []
for e in events:
instance = []
for c in e.contents:
instance.append(c.string)
if "icon-box" in c['class']:
instance.append(c['class'][1])
del instance[1] #this one is always empty
event_lst.append(instance)
'''
一个示例结果:
[“34”,“替换”,“Skov R.”,无]
预期结果:
["detailMS__incidentRow eventRow--home 奇数", "34'", 'substitution-in', 'Skov R.', None]
希望这是有道理的,有人可以提供帮助。
事件:
[<div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">34'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="Sebastian Hoeness is forced to make a change.<br />Stefan Posch is unable to continue due to<br />injury, Robert Skov (Hoffenheim) comes on."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/skov-robert/SdlhgTvE/'); return false;">Skov R.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/posch-stefan/nsVWqZE2/'); return false;">Posch S.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">58'</div><div class="icon-box r-card" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="The referee goes straight to his pocket<br />and shows Robert Skov (Hoffenheim) a red<br />card. No point in arguing. He has to leave<br />the pitch."><span class="icon r-card"> </span></div><span class="participant-name"><a href="#" onclick="window.open('/en/player/skov-robert/SdlhgTvE/'); return false;">Skov R.</a></span><span class="subincident-penalty subincident-name">(Foul)</span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">60'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Goal! Max Kruse (Union Berlin) wins the<br />battle of wills and sends an unstoppable<br />penalty past Oliver Baumann into the bottom<br />right corner."><span class="icon soccer-ball"> </span></div><span class="subincident-name">(Penalty)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/kruse-max/Cz8ZjrLB/'); return false;">Kruse M.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">62'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="The referee stops play so that a substitution<br />can be made and Dennis Geiger trots off<br />the pitch and is replaced by Christoph Baumgartner<br />(Hoffenheim)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/baumgartner-christoph/rqcu4a9O/'); return false;">Baumgartner C.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/geiger-dennis/GtZAyRi8/'); return false;">Geiger D.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">62'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="Sebastian Hoeness decides to make a substitution.<br />Ishak Belfodil will be replaced by Sargis<br />Adamyan (Hoffenheim)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/adamyan-sargis/6JByionA/'); return false;">Adamyan S.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/belfodil-ishak/zJpmU9tM/'); return false;">Belfodil I.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">63'</div><div class="icon-box y-card" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Andreas Luthe (Union Berlin) goes in the<br />referee's notebook after being shown a yellow<br />card."><span class="icon y-card"> </span></div><span class="subincident-penalty subincident-name">(Delay of game)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/luthe-andreas/CrnFB7xg/'); return false;">Luthe A.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">71'</div><div class="icon-box y-card"><span class="icon y-card"> </span></div><span class="subincident-penalty subincident-name">(Tripping)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/griesbeck-sebastian/S02VCxKU/'); return false;">Griesbeck S.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">77'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="It's time for a substitution. Joel Pohjanpalo<br />(Union Berlin) comes on in place of Taiwo<br />Awoniyi."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/pohjanpalo-joel/WfrgzpCh/'); return false;">Pohjanpalo J.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/awoniyi-taiwo/OWFf8WFm/'); return false;">Awoniyi T.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">80'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="GOAL! Superb work from Sargis Adamyan, who<br />plays a vital role in the build-up. He squares<br />it to Munas Dabbur (Hoffenheim), who beats<br />Andreas Luthe with a brilliant shot into<br />the top of the net. 1:1."><span class="icon soccer-ball"> </span></div><span class="participant-name"><a href="#" onclick="window.open('/en/player/dabbur-munas/vcya3M3U/'); return false;">Dabbur M.</a></span><span class="assist note-name">(<a href="#" onclick="window.open('/en/player/adamyan-sargis/6JByionA/'); return false;">Adamyan S.</a>)</span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">81'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="It will be a substitution. Mijat Gacinovic<br />is ready to enter the pitch as Florian Grillitsch<br />(Hoffenheim) walks off."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/klauss-joao/ILFmc7Lp/'); return false;">Klauss J.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/dabbur-munas/vcya3M3U/'); return false;">Dabbur M.</a></span></div>, <div class="detailMS__incidentRow incidentRow--home odd"><div class="time-box">81'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, true)" title="The substitution has been made. Joao Klauss<br />has replaced Munas Dabbur (Hoffenheim)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/gacinovic-mijat/nBEIWHXE/'); return false;">Gacinovic M.</a></span><span class="substitution-out-name"><span class="icon substitution-out"> </span><a href="#" onclick="window.open('/en/player/grillitsch-florian/WfACar9B/'); return false;">Grillitsch F.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">83'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Today's match ends for Christian Gentner<br />who will be replaced by Cedric Teuchert<br />(Union Berlin)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/teuchert-cedric/6FRvEoh1/'); return false;">Teuchert C.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/gentner-christian/GxIiEgQ2/'); return false;">Gentner C.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">85'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="What a hit! Joel Pohjanpalo (Union Berlin)<br />shows good composure inside the box to fire<br />a first-time shot into the bottom left corner.<br />1:2."><span class="icon soccer-ball"> </span></div><span class="assist note-name">(<a href="#" onclick="window.open('/en/player/kruse-max/Cz8ZjrLB/'); return false;">Kruse M.</a>)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/pohjanpalo-joel/WfrgzpCh/'); return false;">Pohjanpalo J.</a></span></div>, <div class="detailMS__incidentRow incidentRow--away even"><div class="time-box">87'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Marius Bulter (Union Berlin) joins the action<br />as a substitute, replacing Sebastian Griesbeck."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/bulter-marius/QHFJHvmo/'); return false;">Bulter M.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/griesbeck-sebastian/S02VCxKU/'); return false;">Griesbeck S.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box">87'</div><div class="icon-box substitution-in" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="Urs Fischer prepares a substitution. Julian<br />Ryerson is replaced by Niko Giesselmann<br />(Union Berlin)."><span class="icon substitution-in"> </span></div><span class="substitution-in-name"><a href="#" onclick="window.open('/en/player/giesselmann-niko/vgFlmAm5/'); return false;">Giesselmann N.</a></span><span class="substitution-out-name"><a href="#" onclick="window.open('/en/player/ryerson-julian/8nv0ZZYK/'); return false;">Ryerson J.</a><span class="icon substitution-out"> </span></span></div>, <div class="detailMS__incidentRow incidentRow--home even"><div class="time-box">90'</div><div class="icon-box y-card"><span class="icon y-card"> </span></div><span class="participant-name"><a href="#" onclick="window.open('/en/player/baumgartner-christoph/rqcu4a9O/'); return false;">Baumgartner C.</a></span><span class="subincident-penalty subincident-name">(Tripping)</span></div>, <div class="detailMS__incidentRow incidentRow--away odd"><div class="time-box-wide">90+4'</div><div class="icon-box soccer-ball" onmouseout="tt.hide(this)" onmouseover="tt.show(this, event, false)" title="GOAL! Superb work from Max Kruse, who plays<br />a vital role in the build-up. He squares<br />it to Cedric Teuchert (Union Berlin), who<br />beats Oliver Baumann with a brilliant shot<br />into the bottom left corner. 1:3."><span class="icon soccer-ball"> </span></div><span class="assist note-name">(<a href="#" onclick="window.open('/en/player/kruse-max/Cz8ZjrLB/'); return false;">Kruse M.</a>)</span><span class="participant-name"><a href="#" onclick="window.open('/en/player/teuchert-cedric/6FRvEoh1/'); return false;">Teuchert C.</a></span></div>]
解决方案
是的,这有点奇怪。但试一试:
from bs4 import BeautifulSoup
import re
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
url = 'https://www.scoreboard.com/en/match/UgLQho4p/#match-summary'
driver.get(url)
time.sleep(5)
response = driver.page_source
soup = BeautifulSoup(response, 'html.parser')
regex = re.compile('.*detailMS__incidentRow.*')
events = soup.find_all("div", {"class": regex})
event_lst = []
for e in events:
instance = []
instance.append(' '.join(e.find('div').previous['class']))
for c in e.contents:
instance.append(c.string)
if "icon-box" in c['class']:
instance.append(c['class'][1])
del instance[2] #this one is always empty
event_lst.append(instance)
driver.close()
输出:
print (event_lst)
[['detailMS__incidentRow incidentRow--home odd', "34'", 'substitution-in', 'Skov R.', None], ['detailMS__incidentRow incidentRow--home even', "58'", 'r-card', 'Skov R.', '(Foul)'], ['detailMS__incidentRow incidentRow--away odd', "60'", 'soccer-ball', '(Penalty)', 'Kruse M.'], ['detailMS__incidentRow incidentRow--home even', "62'", 'substitution-in', 'Baumgartner C.', None], ['detailMS__incidentRow incidentRow--home odd', "62'", 'substitution-in', 'Adamyan S.', None], ['detailMS__incidentRow incidentRow--away even', "63'", 'y-card', '(Delay of game)', 'Luthe A.'], ['detailMS__incidentRow incidentRow--away odd', "71'", 'y-card', '(Tripping)', 'Griesbeck S.'], ['detailMS__incidentRow incidentRow--away even', "77'", 'substitution-in', 'Pohjanpalo J.', None], ['detailMS__incidentRow incidentRow--home odd', "80'", 'soccer-ball', 'Dabbur M.', None], ['detailMS__incidentRow incidentRow--home even', "81'", 'substitution-in', 'Klauss J.', None], ['detailMS__incidentRow incidentRow--home odd', "81'", 'substitution-in', 'Gacinovic M.', None], ['detailMS__incidentRow incidentRow--away even', "83'", 'substitution-in', 'Teuchert C.', None], ['detailMS__incidentRow incidentRow--away odd', "85'", 'soccer-ball', None, 'Pohjanpalo J.'], ['detailMS__incidentRow incidentRow--away even', "87'", 'substitution-in', 'Bulter M.', None], ['detailMS__incidentRow incidentRow--away odd', "87'", 'substitution-in', 'Giesselmann N.', None], ['detailMS__incidentRow incidentRow--home even', "90'", 'y-card', 'Baumgartner C.', '(Tripping)'], ['detailMS__incidentRow incidentRow--away odd', "90+4'", 'soccer-ball', None, 'Teuchert C.']]
推荐阅读
- django - 如何使用django身份验证
- php - PhpStorm - 我可以自动缩进在 PHP 变量中声明的 HTML 吗?
- reactjs - 搜索和调用服务器时 reactJs 中的 Material-table 问题
- ubuntu - 如何删除vim中的竖线
- java - 在我的第一个场景加载后,如何在我的第二个场景中使用 getUserData()?
- c# - 生成 6 个字符的字母数字代码的正确方法是什么,其中所有字符的总和等于 9?(凭证代码生成器)
- r - Shinyapp中的多个直方图
- spring-boot - 指定为非空的参数在 Kotlin 中为空
- javascript - 如何在 Discord.js 中找到具有特定角色的人数?
- android-studio - 在android studio中打开新项目时代码不显示