python - See what for loop has checked
问题描述
I don't really know what to call this issue, sorry for the undescriptive title.
My program checks if a element exists on multiple paths of a website. The program has a base url that gets different paths of the domain to check, which are located in a json file (name.json).
In this current state of my program, it prints 1 if the element is found and 2 if not. I want it to print the url instead of 1 or 2. But my problem is that the id's gets saved before the final for loop. When trying to print fullurl
I'm only getting the last id in my json file printed multiple times(because it isnt being saved), instead of the unique url.
import json
import grequests
from bs4 import BeautifulSoup
idlist = json.loads(open('name.json').read())
baseurl = 'https://steamcommunity.com/id/'
complete_urls = []
for uid in idlist:
fullurl = baseurl + uid
complete_urls.append(fullurl)
rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)
for r in resp:
soup = BeautifulSoup(r.text, 'lxml')
if soup.find('span', class_='actual_persona_name'):
print('1')
else:
print('2')
解决方案
由于 grequests.map 按请求的顺序返回响应(请参阅this),因此您可以使用枚举将每个请求的完整 URL 与响应匹配。
import json
import grequests
from bs4 import BeautifulSoup
idlist = json.loads(open('name.json').read())
baseurl = 'https://steamcommunity.com/id/'
for uid in idlist:
fullurl = baseurl + uid
complete_urls = []
for uid in idlist:
fullurl = baseurl + uid
complete_urls.append(fullurl)
rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)
for index,r in enumerate(resp): # use enumerate to get the index of response
soup = BeautifulSoup(r.text, 'lxml')
print(complete_urls[index]) # using the index of responses to access the already existing list of complete_urls
if soup.find('span', class_='actual_persona_name'):
print('1')
else:
print('2')
推荐阅读
- javascript - 包含生成的 id 的 firebase 规则路径
- sql-server - 如何在 Windows 调度程序中使用密码运行 SSIS 包?
- sql - 如何设置 SSIS 从 Postgres 数据库中提取数据
- azure-devops - 通过 AzureDevOps Pipeline 中的 newman 自动记录每个通过 postman 运行的响应时间
- r - 将 NetCDF 文件中的时间维度单位从几个月更改为几个月
- postgresql - 找不到wal备份文件
- groovy - Groovy:将动态编译的类写入磁盘
- sql - 如果数据存在于第二个表中,则使用标志连接来自两个 postgresql 表的数据
- amazon-web-services - 使用 AWS Lambda 触发 EMR 步骤将数据批量加载到 Hbase 集群
- c# - HazelCast .NET - 动态清除地图