首页 > 解决方案 > 如何使用 BeautifulSoup 进行更改和更新源代码

问题描述

soup=BeautifulSoup(page,'html.parser')
result=soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'}).text
print(soup.prettify())

n=list(d.values())[0]
print(n)
result=n
soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'}).text=result
#print(soup.prettify())

我收到如下错误:

soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'}).text=result
     64 #print(soup.prettify())

**AttributeError: can't set attribute**

基本上我想对源代码进行一些更改,然后用更新的源代码更新 url

有可能吗?

标签: pythonweb-scrapingbeautifulsoup

解决方案


我想你想做这样的事情来替换字符串元素:

soup=BeautifulSoup(page,'html.parser')
result=soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'}).text
print(soup.prettify())

n=list(d.values())[0]
print(n)
result=n
ele = soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'}).findChildren()[-1]
ele.string.replace_with(n)
#print(soup.prettify())

示例 1:

from bs4 import BeautifulSoup

d = {0:'Bar',2:'Baz',3:'Qux'}


page = """
<div class="entry-text my-2 px-2 px-sm-4">Foo</a>
"""

soup=BeautifulSoup(page,'html.parser')
result=soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'}).text
print(soup.prettify())

n=list(d.values())[0]
#print(n)
result=n
ele = soup.find('div',attrs={'class':'entry-text my-2 px-2 px-sm-4'})
ele.string.replace_with(n)
print(soup.prettify())

输出:

<div class="entry-text my-2 px-2 px-sm-4">
 Foo
</div>

<div class="entry-text my-2 px-2 px-sm-4">
 Bar
</div>

示例 2:

ele.string这将在返回时引发该错误,None因为下面有多个标签,<article>因此无法获取.string属性

from bs4 import BeautifulSoup

d = {0:'Bar',2:'Baz',3:'Qux'}


page = """
<article> <div><p>Hello</p></div> </article> 
"""

soup=BeautifulSoup(page,'html.parser')
result=soup.find('article').text
print(soup.prettify())

n=list(d.values())[0]
#print(n)
result=n
ele = soup.find('article')
ele.string.replace_with(n)
print(soup.prettify())

但这将起作用,因为ele.string返回' Hello '

from bs4 import BeautifulSoup

d = {0:'Bar',2:'Baz',3:'Qux'}


page = """
<article> <div><p>Hello</p></div> </article> 
"""

soup=BeautifulSoup(page,'html.parser')
result=soup.find('article').text
print(soup.prettify())

n=list(d.values())[0]
#print(n)
result=n
ele = soup.find('p')
ele.string.replace_with(n)
print(soup.prettify())

最后,您可以使用.findChildren(), 来访问该端节点。


推荐阅读