python - 如何在段落文本中使用其他一些标签刮取段落标记中的文本？

问题描述

我想在段落标签中抓取信息。这

tag 里面还有一些其他的标签。我将在下面的代码中向您展示。

这里是

这是要抓取的html页面：

<div class="thecontent">
<p>Here&rsquo;s the schedule of matches for the weekend.</p>
<p>&nbsp;</p>
<p><strong>Saturday, August 17</strong></p>

<p>Achara vs. Buad, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>

<p>pritos vs. baola, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>


<p>timpao vs. quadrsa, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>

<p><strong>Sunday, August 18</strong></p>



<p>Achara vs. timpao, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>

<p>pritos vs. qaudra, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>


<p>timpao vs. Buad, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>
<p>&nbsp;</p>
<p><strong>Monday, August 19</strong></p>


<p>Achara vs. Buad, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> &mdash;&nbsp;Have enjoy it and celebrate it</p>
</p>
<p>&nbsp;</p></div></body></html>

我使用了以下python代码：

import bs4,requests

getnwp = requests.get('https://url')
nwpcontent = getnwp.content
sp2 = bs4.BeautifulSoup(nwpcontent, 'html5lib')
pta = sp2.find('div', class_ = 'thecontent').find_all('p')
        for i in range(len(pta)):
            if pta[i].get_text().find("vs") != -1:
                print (pta[i].get_text())

有了上面的信息，我想只提取团队之间的匹配以及它发生的日期。和下面的小消息：

8月17日星期六

Achara vs. timpao，——享受它并庆祝它

pritos vs. baola，——享受它并庆祝它

timpao vs. quadrsa——享受它并庆祝它

8月18日星期日

Achara vs. timpao，——享受它并庆祝它

pritos vs. qaudra，——享受它并庆祝它

timpao vs. Buad——享受它并庆祝它

8月19日星期一

Achara vs. Buad，-享受它并庆祝它

我的意思是我不想要有关电视广播的信息（或锚标签中的信息）。

标签： pythonweb-scrapingbeautifulsoup

python - 如何在段落文本中使用其他一些标签刮取段落标记中的文本？

问题描述

解决方案

推荐阅读