首页 > 解决方案 > Pyhthon bs4 得到杂散文本

问题描述

    <li><a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>

                                    <span class="lists-rundown-no">(16)</span>
                                </a>
    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>




<span class="lists-rundown-no">(16)</span>
<a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>
                                    HERE!!
                                    <span class="lists-rundown-no">(16)</span>
                                </a></li>

我需要参加这里写的部分!!在 python 上使用漂亮的汤,但它是一个杂散的文本,所以它没有选择器或其他东西。有可能得到吗?

我试过了。

import requests
from bs4 import BeautifulSoup

r = requests.get('anywebsite')
source = BeautifulSoup(r.content,"lxml")

for child in source.select("#atc-wrapper > ul"):
    for child2 in child.findChildren():
        print(child2)

标签: pythonhtmlbeautifulsoup

解决方案


您可以使用 CSS 选择器a:last-of-type i来选择<i>最后一个元素内的元素<a>。然后使用find_next()参数text=True

data = '''    <li><a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>
                                    A - Gastrointestinal kanal ve metabolizma
                                    <span class="lists-rundown-no">(16)</span>
                                </a>
    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>


                                    A - Gastrointestinal kanal ve metabolizma

<span class="lists-rundown-no">(16)</span>
<a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>
                                    HERE!!
                                    <span class="lists-rundown-no">(16)</span>
                                </a></li>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

# select last i
i = soup.select_one('a:last-of-type i')

# select next text
print(i.find_next(text=True).strip())

印刷:

HERE!!

进一步阅读:

CSS 选择器参考


推荐阅读