首页 > 解决方案 > How to efficiently extract the most inner content inside this class?

问题描述

I want to replace the value of href with the inner value of the class lienarticle in the following text

<a class="lienarticle" href="/dictionnaires/francais/aimer/1925">mono</a>
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925"><i>aimer</i></a>
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925"><b>you</b></a>

My method of achieving my goal is rudimentary as follows

from bs4 import BeautifulSoup

text = '''
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925">mono</a>
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925"><i>aimer</i></a>
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925"><b>you</b></a>
'''

soup = BeautifulSoup(text, 'html.parser')

for a in soup.select('.lienarticle'):
    a['href'] = 'entry://' + str(a.contents[0]).replace('<b>', '').replace('</b>', '').replace('<i>', '').replace('</i>', '')

The desired result is

<a class="lienarticle" href="entry://mono">mono</a>
<a class="lienarticle" href="entry://aimer"><i>aimer</i></a>
<a class="lienarticle" href="entry://you"><b>you</b></a>

I would like to ask for a more efficient way to do so, not just replacing string as mine. Thank you so much!

标签: python-3.xbeautifulsoup

解决方案


这是一种使用方法的.text方法

前任:

from bs4 import BeautifulSoup

text = '''
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925">mono</a>
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925"><i>aimer</i></a>
<a class="lienarticle" href="/dictionnaires/francais/aimer/1925"><b>you</b></a>
'''

soup = BeautifulSoup(text, 'html.parser')

for a in soup.select('.lienarticle'):
    a['href'] = f'entry://{a.text}'
    print(a)

输出:

<a class="lienarticle" href="entry://mono">mono</a>
<a class="lienarticle" href="entry://aimer"><i>aimer</i></a>
<a class="lienarticle" href="entry://you"><b>you</b></a>

推荐阅读