python - 如果找到匹配项,如何搜索预定义的字符串并返回整行
问题描述
该片段部分工作,因为它可以产生一些结果。我需要帮助才能使其完全正常工作。我正在搜索 url 中的字符串,如果找到部分匹配,则将返回整行。
from bs4 import BeautifulSoup as bs
import requests
addrlist = ['0xe56842ed550ff2794f010738554db45e60730371',
'0xe1fd7b4c9debac3c490d8a553c455da4979482e4',
'0x88c20beda907dbc60c56b71b102a133c1b29b053']
queries = ["Website", "Telegram", "https://www.", "Twitter", "https://t.me"]
url = "https://bscscan.com/address/"
for i in addrlist:
url = str(url) + str(i)
r = requests.get(url)
soup = bs(r.text,'lxml')
pre = soup.select_one('pre.js-sourcecopyarea.editor')
ss = (list(pre.stripped_strings)[0]).split('*')
for s in ss:
for query in queries:
if query in s:
print(s)
电流输出:
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
AttributeError: 'NoneType' object has no attribute 'stripped_strings'
想要的输出:
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
// Telegram : https://t.me/stackdogebsc
// Website : https://www.stack-doge.com
*Website: www.shibuttinu.com
*Telegram: https://t.me/Shibuttinu
解决方案
问题是url
可变的。您将每个连接addrlist
到上一个 url:
# 1st iteration:
https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e60730371
# 2nd iteration:
https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e607303710xe1fd7b4c9debac3c490d8a553c455da4979482e4
# 3rd iteration:
https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e607303710xe1fd7b4c9debac3c490d8a553c455da4979482e40x88c20beda907dbc60c56b71b102a133c1b29b053
像这样更改您的代码:
# url = "https://bscscan.com/address/"
baseurl = "https://bscscan.com/address/"
# url = str(url) + str(i)
url = str(baseurl) + str(i)
更新
使用正则表达式提取信息。
完整代码:
from bs4 import BeautifulSoup as bs
import requests
import re
addrlist = ['0xe56842ed550ff2794f010738554db45e60730371',
'0xe1fd7b4c9debac3c490d8a553c455da4979482e4',
'0x88c20beda907dbc60c56b71b102a133c1b29b053']
baseurl = "https://bscscan.com/address/"
pattern = r'(Website|Telegram|Twitter)\s*:\s*([^\s]+)'
for i in addrlist:
url = str(baseurl) + str(i)
r = requests.get(url)
soup = bs(r.text,'lxml')
pre = soup.select_one('pre.js-sourcecopyarea.editor')
print(url)
for match in re.findall(pattern, str(pre)):
print(f"{match[0]}: {match[1]}")
print()
输出:
https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e60730371
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
https://bscscan.com/address/0xe1fd7b4c9debac3c490d8a553c455da4979482e4
Telegram: https://t.me/stackdogebsc
Website: https://www.stack-doge.com
https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053
Website: www.shibuttinu.com
Telegram: https://t.me/Shibuttinu
推荐阅读
- java - 需要帮助自动化多因素身份验证测试?
- angular - Angular 6 动态表单渲染
- javascript - 使用网格布局 - HTML 水平均匀定位框
- asp.net-mvc-5 - 使用 grid.Mvc6 创建 gridview 时显示错误
- php - 在 WooCommerce 产品类别中包含静态页面
- pytorch - cuDNN 错误:CUDNN_STATUS_BAD_PARAM。有人可以解释为什么我会收到此错误以及如何纠正它?
- android - React 本机应用程序以接收来自 youtube 等应用程序的共享链接
- hybris - Hybris OOB代码开发中如何访问OOB代码
- c++ - C++ 正则表达式帮助创建用于替换字母之间特殊字符的代码(所有字母,但不是最后一个)
- android - 无法在 Network Profiler 的连接视图中选择时间线