首页 > 解决方案 > Beautiful Soup 不能多次在 for 循环中工作

问题描述

我正在尝试运行 for 循环以从结果集中获取更多信息,但它在第一次运行后中断。

classid = ["ccid_10132191", "ccid_10132192", 'ccid_9829339', 'ccid_9829337']

with open('output.html') as fp:
soup = BeautifulSoup(fp, 'html.parser')
for classes in classid:
  grade = str(soup.find_all(id = '' + classes))
  soup = BeautifulSoup(grade, 'html.parser')
  print(soup.find(class_='bold').contents)

当我运行它时,我会得到想要的结果,然后grade = str(soup.find_all(id = '' + classes))停止工作并返回“[]”

str(soup.find_all(id = '' + classes) find blocks of HTML all similar to

<tr class="center" id="ccid_10132191">
<td>1(A)</td>
<td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td>
<td align="left">CERAMICS BEGINNING&nbsp;<br><a href="teacherinfo.html?frn=0019424&nolink=true" title="Details about" class="button mini dialogM"><em class="ui-icon ui-icon-white ui-icon-contact"></em></a>&nbsp;<a href="mailto:">Email</a>&nbsp;-&nbsp;Rm: 156</td>
<td class="colorMyGrade"><a href="scores.html?frn=004615841&begdate=09/01/2024&enddate=11/14/2026&fg=Q1&schoolid=" class="bold">A<br>100</a></td>
<td class="colorMyGrade"><a href="scores.html?frn=0046041&fg=Q2&schoolid=">[ i ]</a></td>
<td class="colorMyGrade"><a href="scores.html?frn=004615041&begdate=09/01/2021&enddate=01/26/2022&fg=S1&schoolid=" class="bold">A<br>100</a></td>
<td class="notInSession">&nbsp;<span class="screen_readers_only">Not available</span></td><td class="notInSession">&nbsp;<span class="screen_readers_only">Not available</span></td><td class="notInSession">&nbsp;<span class="screen_readers_only">Not available</span></td><td>0</td>
<td>0</td>
</tr>

唯一的区别是 id 和 text 内容

标签: pythonfor-loopbeautifulsoup

解决方案


正如@BoarGules 所说,发生这种情况是因为在 for 循环中重新定义了汤

解决方案是以下代码

classid = ["ccid_10132191", "ccid_10132192", 'ccid_9829339', 'ccid_9829337']
classhtml = []

with open('output.html') as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    for classes in classid:
      grade = str(soup.find_all(id = '' + classes))
      classhtml.append(grade) 

for classs in classhtml:
  soup = BeautifulSoup(classs, 'html.parser')
  print(soup.find(class_='bold').contents)

推荐阅读