python-3.x - Beautifulsoup 对数组中文本的抓取问题
问题描述
数据=
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
输入=
source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")
输出=
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...
我只想提取“屋顶计划”我应该如何编辑我的代码?我尝试了 drawing_no.text 和 drawing_no.value 但它说“没有属性”。谢谢你的帮助!
解决方案
尝试跟随代码
source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())
推荐阅读
- azure - 用户 ' 登录失败
' 但在 Data Studio 中工作 - javascript - 从网页播放音频的javascript方法
- python - 如何解决python/pip/pip3 不识别为内部或外部命令 | 更改路径后python命令错误?
- php - 在 PHP 和 cURL 中处理重定向和 cookie 以在 php 中运行 curl -b -c -d -Lvk
- postgresql - 在克隆期间更改 AWS RDS 的密码
- c# - 如何使用 TraceProcessing 库获取重传事件?
- google-bigquery - BigQuery:即使在应用别名后列名也不明确
- multithreading - cpu、多处理器、多核之间的关系;我的电脑有多少个内核
- python - Slack API chat.update internal_error 通过请求,通过 https://api.slack.com/ 工作
- php - 条纹结帐按钮未传递电子邮件地址