python - 通过 python 中的 namedtuple csv 循环跟踪进度
问题描述
使用collections.namedtuple
,以下 Python 代码通过标识符(名为 的列中的整数ContentItemId
)的 csv 文件来处理数据库中的记录。一个示例记录是https://api.aucklandmuseum.com/id/library/ephemera/21291。
其目的是检查给定 id 的 HTTP 状态并将其写入磁盘:
import requests
from collections import namedtuple
import csv
with open('in.csv', mode='r') as f:
reader = csv.reader(f)
all_records = namedtuple('rec', next(reader))
records = [all_records._make(row) for row in reader]
#Create output file
with open('out.csv', mode='w+') as o:
w = csv.writer(o)
w.writerow(["ContentItemId","code"])
count = 1
for r in records:
id = r.ContentItemId
url = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
req = requests.get(url, allow_redirects=False)
code = req.status_code
w.writerow([id, code])
如何通过后一个循环将代码的进度(理想情况下在 25%、50% 和 75% 的接合点)打印到控制台?另外,如果我在底部添加一个未缩进print("Complete")
,会到达那条线吗?
提前致谢。
编辑:感谢所有帮助。我的(工作!)代码现在看起来像这样:
import csv
import requests
import pandas
import time
from collections import namedtuple
from tqdm import tqdm
with open('active_true_pub_no.csv', mode='r') as f:
reader = csv.reader(f)
all_records = namedtuple('rec', next(reader))
records = [all_records._make(row) for row in reader]
with open('out.csv', mode='w+') as o:
w = csv.writer(o)
w.writerow(["ContentItemId","code"])
num = len(records)
print("Checking {} records...\n".format(num))
with tqdm(total=num, bar_format="{percentage:3.0f}% {bar} [{n_fmt}/{total_fmt}] ", ncols=64) as pbar:
for r in records:
pbar.update(1)
id = r.ContentItemId
url = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
req = requests.get(url, allow_redirects=False)
code = req.status_code
w.writerow([id, code])
# time.sleep(.25)
print ('\nSummary: ')
df = pandas.read_csv("out.csv")
print(df['code'].value_counts())
我用pandas
'value_counts
来总结最后的结果。
解决方案
我假设您指的是已处理记录的百分比。您也可以print("Complete")
在循环中执行。
count = 0
for r in records:
id = r.ContentItemId
url = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
req = requests.get(url, allow_redirects=False)
code = req.status_code
w.writerow([id, code])
count += 1
if count == len(records):
print("Complete")
# Need the round in case list of records isn't divisible by 4
elif count % round(len(records) / 4) == 0:
# Round fraction to two decimal points and multiply by 100 for
# integer percentage
progress = int(round(count / len(records), 2) * 100)
print("{}%".format(progress))
推荐阅读
- matplotlib - 将 import matplotlib.pyplot 作为 plt 导入时收到错误消息
- r - 在字符串的每个元素之间添加分隔符
- java - 如何从用户的输入中调用图像?
- c# - 如何将密钥发送到非活动的 DirectX“ePSXe”窗口?
- android - 如何让 Android Studio 忘记 CMake.exe 的位置?
- android - 从谷歌地图中删除巴士站图标
- mysql - 具有不同子查询的聚合查询会杀死每个连接
- java - 如何使用 java.sql 包连接到 Docker 上的 Postgres?
- javascript - body.findText() 不返回正在搜索的内容
- c - 我应该如何处理大型数据集