首页 > 解决方案 > BeautifulSoup soup.select 切断子标签

问题描述

当运行脚本以检索所有带有“FlatParagraph”类的块引用标签时,我似乎切断了块引用标签中的一些子标签。是否有包含所有子标签的查询?问题似乎与一<blockquote><i><a>text<a/><i/>组标签有关。所以不是所有孩子的问题。

我正在使用以下代码

import urllib


from urllib.request import urlopen
from bs4 import BeautifulSoup

fhand = urllib.request.urlopen('https://www.legislation.qld.gov.au/view/whole/html/2018-07-01/sl-2006-0200').read()

soup = BeautifulSoup(fhand, 'html.parser')
fp = soup.select('blockquote[class="FlatParagraph"]')
for i in fp: 
    print(i.text)
    print('---------')

然后我使用 for 循环从每一行检索文本

changedfplist = list()
for i in fp:
    changedfplist.append(i.text.replace(u'\xa0', ' ').encode('utf-8'))

这是我正在解析的示例-

<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—&lt;blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—&lt;blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section&nbsp;28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—&lt;blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section&nbsp;61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section&nbsp;62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule&nbsp;2</a>, <a href="#sch.2-pt.3">part&nbsp;3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—&lt;blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section&nbsp;28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule&nbsp;3</a> of the repealed regulation.</blockquote></blockquote></blockquote>

当我解析这个时,我得到

(1) 本条适用于以下情况 - (a) 在开始之前 - (i) 有人根据已废除条例第 28(1) 条申请批准用于所述建筑工程的拟议消防工程设计概要;和

(ii) 该服务的一名授权代表出席了与批准拟议的消防工程设计简介有关的前一次消防工程简介会议;和

(iii) 服务处没有决定是否批准拟议的消防工程设计简介;和

(b) 该人未为该服务代表出席前一次消防工程设计简介会议支付前一次消防工程设计简介会议费用。

(2) 为评估所述建筑工程的消防工程设计概要—— (a) 第 61 条适用,就好像对消防工程概要的提述是对拟议消防工程设计概要的提述一样;和

(b) 第 62(1)(d) 条适用,就好像对每次消防工程简报会议的提述包括对每个以前的消防工程简报会议的提述一样;和

(c) 附表 2 第 3 部分第 3 项适用,就好像对会议的提及包括对前消防工程简要会议的提及一样。

(3)本节——前消防工程简会

最后一行的末尾缺少文本。它已被切断

<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> 

更新 - 我试图避免有一个类,所以使用 .FlatParagraph 没有奏效。我试图避免 class=FlatParagraph view-history-note。FlatParagraph view-history-note 是 Fl​​atParagraph 类标签的子标签的一个类。

我已经用 lxml 和 html.parser 尝试了上面的代码,我得到了 lxml 的所有文本,以及 html.parser 的截断文本。如果有人知道为什么,我很想听听!

标签: pythonpython-3.xbeautifulsoup

解决方案


您可以使用select()find()查看下面的代码,我正在获取全文!

html = '''
<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—&lt;blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—&lt;blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section&nbsp;28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—&lt;blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section&nbsp;61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section&nbsp;62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule&nbsp;2</a>, <a href="#sch.2-pt.3">part&nbsp;3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—&lt;blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section&nbsp;28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule&nbsp;3</a> of the repealed regulation.</blockquote></blockquote></blockquote>
'''
soup = BeautifulSoup(html,'lxml')
fp = soup.select('.FlatParagraph')
for i in fp:
    print(i.text)

或者

fp = soup.find('blockquote',attrs={'class':'FlatParagraph'})
print(fp.text)

输出:

(1)This section applies if—(a)before the commencement—(i)a person applied under section 28(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and
(ii)an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and
(iii)the service had not decided whether or not to approve the proposed fire engineering design brief; and

(b)the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.
(2)For assessing the fire engineering design brief for the stated building work—(a)section 61 applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and
(b)section 62(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and
(c)schedule 2, part 3, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.
(3)In this section—former fire engineering brief meeting means a fire engineering brief meeting under section 28(2)(d) of the repealed regulation.former fire engineering design brief meeting fee means the fire engineering design brief meeting fee stated in schedule 3 of the repealed regulation.

推荐阅读