首页 > 解决方案 > 异步 dns 查找,但结果转到同一个文件

问题描述

我必须进行大量 DNS NAPTR 查找(想想每分钟数千次)。我使用 dnspython 运行 Python 脚本,读取一个文件并写回另一个文件。请求速率约为 300 个请求/秒。我尝试将异步 DNS 与 Python aiodns一起使用,但数字相同。我的脚本可能有缺陷。请看下文。这是 Python 3.4。

但是,如果结果必须返回到一个文件,是否甚至可以异步进行查找?

import asyncio
import aiodns

...

loop = asyncio.get_event_loop()
resolver = aiodns.DNSResolver(loop=loop)
resolver.nameservers = ['x.y.w.z']

...

@asyncio.coroutine
def getsip(number):

    try:
        strQuery = str(dns.e164.from_e164("+" + number))
        answer = yield from resolver.query(strQuery, 'NAPTR')

        for rdata in answer:
            return rdata.regex

    except:
        return ""

with open(filename, 'r') as fread, open(filenameOut, 'w') as fwrite:
    reader = csv.DictReader(fread, delimiter='|', quoting=csv.QUOTE_NONE)
    reader.fieldnames = fieldnamesIn

    writer = csv.DictWriter(fwrite, fieldnames = fieldnamesOut, delimiter='|')

    for row in reader:
        sys.stdout.write("Processing record number: %d \r" % (total) )
        sys.stdout.flush()
        total+=1
        answer = loop.run_until_complete(getsip(row['NUM']))
        if answer == "":
            missingAnswers+=1

        writer.writerow({'NUM': row['NUM'], 'SIP': answer})

print("Records not found: " + str(missingAnswers) + " of total " +  str(total) + " records.")

标签: pythonperformancelookuppython-asyncio

解决方案


但是,如果结果必须返回到一个文件,是否甚至可以异步进行查找?

如果您不关心结果的顺序,则可以直接实现异步查找。例如,您可以使用asyncio.as_completed调度所有协程并行运行并在每个完成时收到通知:

@asyncio.coroutine
def process():
    with open(filename, 'r') as fread:
        reader = csv.DictReader(fread, delimiter='|', quoting=csv.QUOTE_NONE)
        reader.fieldnames = fieldnamesIn
        rows = list(reader)

    with open(filenameOut, 'w') as fwrite:
        writer = csv.DictWriter(fwrite, fieldnames=fieldnamesOut, delimiter='|')
        missingAnswers = 0

        loop = asyncio.get_event_loop()
        tasks = [loop.create_task(getsip(row['NUM'])) for row in rows]
        for done_coro in asyncio.as_completed(tasks):
            answer = yield from done_coro
            if answer == ""
                missingAnswers += 1
            writer.writerow({'NUM': row['NUM'], 'SIP': answer})

    print("Records not found: %d of total %d records"
          % (missingAnswers, len(rows)))

loop = asyncio.get_event_loop()
loop.run_until_complete(process())

推荐阅读