python - 如果找到匹配项,如何在 url 中搜索字符串并返回整行
问题描述
该片段部分有效,并且还会产生冗余输出。我需要帮助才能使其完全正常工作。我正在搜索页面中的字符串,如果找到部分匹配或完全匹配,则将返回整行。
from bs4 import BeautifulSoup as bs
import requests
addrlist = ['0xe56842ed550ff2794f010738554db45e60730371',
'0xe1fd7b4c9debac3c490d8a553c455da4979482e4',
'0x88c20beda907dbc60c56b71b102a133c1b29b053']
queries = ["Website", "Telegram", "https://www.", "Twitter", "https://t.me"]
baseurl = "https://bscscan.com/address/"
for i in addrlist:
url = str(baseurl) + str(i)
r = requests.get(url)
soup = bs(r.text,'lxml')
pre = soup.select_one('pre.js-sourcecopyarea.editor')
ss = (list(pre.stripped_strings)[0]).split('*')
for s in ss:
for query in queries:
if query in s:
print(s)
电流输出:
Website: https://binemon.io #output repeated 4x in actual run
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
// SPDX-License-Identifier: UNLICENSED #output repeated 4x in actual run
// IERC20.sol
Website: www.shibuttinu.com #output repeated 1x only
Telegram: https://t.me/Shibuttinu
想要的输出:
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
// Telegram : https://t.me/stackdogebsc
// Website : https://www.stack-doge.com
*Website: www.shibuttinu.com
*Telegram: https://t.me/Shibuttinu
解决方案
您可以使用正则表达式来提取 URL:
import re
import requests
from bs4 import BeautifulSoup as bs
addrlist = [
"0xe56842ed550ff2794f010738554db45e60730371",
"0xe1fd7b4c9debac3c490d8a553c455da4979482e4",
"0x88c20beda907dbc60c56b71b102a133c1b29b053",
]
queries = ["Website", "Telegram", "https://www.", "Twitter", "https://t.me"]
baseurl = "https://bscscan.com/address/"
r_pat = re.compile("|".join("{}.*".format(re.escape(q)) for q in queries))
for i in addrlist:
url = str(baseurl) + str(i)
r = requests.get(url)
soup = bs(r.text, "lxml")
pre = soup.select_one("pre.js-sourcecopyarea.editor")
print(url)
print()
for m in r_pat.findall(pre.string):
print(m.strip())
print("-" * 80)
印刷:
https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e60730371
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
--------------------------------------------------------------------------------
https://bscscan.com/address/0xe1fd7b4c9debac3c490d8a553c455da4979482e4
Telegram : https://t.me/stackdogebsc
Website : https://www.stack-doge.com
--------------------------------------------------------------------------------
https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053
Website: www.shibuttinu.com
Telegram: https://t.me/Shibuttinu
--------------------------------------------------------------------------------
推荐阅读
- forms - 将 inputs_for 与 @conn 一起使用
- java - NIO 客户端/服务器安全地验证凭据
- mysql - 无法使用 R 连接到 AWS 上托管的 MySQL
- oracle - 如何存储和调用表列中的 sql 查询?
- python - 将格式为“yyyymmdd”的 8 位字符串更改为年月日整数值
- git - github远程标签推送,无需克隆repo
- c# - 什么是 C# 中的 SQL Server 2005 IMAGE 数据类型的等价物
- apache - 允许来自特定推荐人的访问并重定向其余的?
- c# - 如何在 RavenDB 中的索引上使用 GroupBy?
- node.js - 我该如何处理错误,只允许一种类型的恢复选项,但找到了多个?