首页 > 解决方案 > 通过 python 中的 namedtuple csv 循环跟踪进度

问题描述

使用collections.namedtuple,以下 Python 代码通过标识符(名为 的列中的整数ContentItemId)的 csv 文件来处理数据库中的记录。一个示例记录是https://api.aucklandmuseum.com/id/library/ephemera/21291

其目的是检查给定 id 的 HTTP 状态并将其写入磁盘:

import requests
from collections import namedtuple
import csv

with open('in.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    #Create output file
    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        count = 1
        for r in records:
            id   = r.ContentItemId
            url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
            req  = requests.get(url, allow_redirects=False)
            code = req.status_code
            w.writerow([id, code])

如何通过后一个循环将代码的进度(理想情况下在 25%、50% 和 75% 的接合点)打印到控制台?另外,如果我在底部添加一个未缩进print("Complete"),会到达那条线吗?

提前致谢。


编辑:感谢所有帮助。我的(工作!)代码现在看起来像这样:

import csv
import requests
import pandas
import time
from collections import namedtuple
from tqdm import tqdm

with open('active_true_pub_no.csv', mode='r') as f:
    reader = csv.reader(f)

    all_records = namedtuple('rec', next(reader))
    records = [all_records._make(row) for row in reader]

    with open('out.csv', mode='w+') as o:
        w = csv.writer(o)
        w.writerow(["ContentItemId","code"])

        num = len(records)
        print("Checking {} records...\n".format(num))

        with tqdm(total=num, bar_format="{percentage:3.0f}% {bar} [{n_fmt}/{total_fmt}]  ", ncols=64) as pbar:
            for r in records:
                pbar.update(1)
                id   = r.ContentItemId
                url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
                req  = requests.get(url, allow_redirects=False)
                code = req.status_code
                w.writerow([id, code])
                # time.sleep(.25)

print ('\nSummary: ')
df = pandas.read_csv("out.csv")
print(df['code'].value_counts())

我用pandas'value_counts来总结最后的结果。

标签: pythonloopscountpercentage

解决方案


我假设您指的是已处理记录的百分比。您也可以print("Complete")在循环中执行。

count = 0
for r in records:
    id   = r.ContentItemId
    url  = "https://api.aucklandmuseum.com/id/library/ephemera/" + id
    req  = requests.get(url, allow_redirects=False)
    code = req.status_code
    w.writerow([id, code])
    count += 1
    if count == len(records):
        print("Complete")
    # Need the round in case list of records isn't divisible by 4
    elif count % round(len(records) / 4) == 0:
        # Round fraction to two decimal points and multiply by 100 for
        # integer percentage
        progress = int(round(count / len(records), 2) * 100)
        print("{}%".format(progress))

推荐阅读