首页 > 解决方案 > 循环无法正常工作且键输出错误

问题描述

好的,我将开始显示我的代码:

import requests
import json
import csv
import pandas as pd

with open('AcoesURLJsonCompleta.csv', newline='') as csvfile:
    urlreader = csv.reader(csvfile, delimiter=',')
    for obj_id in urlreader:

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
jsonData = requests.get(row, headers=headers).json()

mapper = (
    ('Ticker', 'ric'),
    ('Beta', 'beta'),
    ('DY', 'current_dividend_yield_ttm'),
    ('VOL', 'share_volume_3m'),
    ('P/L', 'pe_normalized_annual'),
    ('Cresc5A', 'eps_growth_5y'),
    ('LPA', 'eps_normalized_annual'),
    ('VPA', 'book_value_share_quarterly'),
    ('LAST', 'last')
)

data = {}
for dataKey, jsonDataKey in mapper: 
    d = jsonData.get(jsonDataKey, '') 
    try:
        flt_d = float(d)
    except ValueError:
        d = ''
    finally:
        data[dataKey] = [d]

table = pd.DataFrame(data, columns=['Ticker', 'Beta', 'DY', 'VOL', 'P/L', 'Cresc5A', 'LPA', 'VPA', 'Last'])
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)

好的,让我们开始吧:

  1. 我的第一个循环for rows in Urls是正确的?我想遍历存储在我的 CSV 文件中的 Urls,但我不知道我是否正确使用了拆分和剥离。
  2. 我的 json 请求可以吗?
  3. 如果其中任何一个jsonData请求返回 NaN 或 Null 或未找到任何内容,我应该如何将其放在我的代码中,以便在发生这种情况时它会跳到另一个 URL 并附加“”(什么都没有)?

整个代码的输出是line 25, in <module> Beta = jsonData['beta'] KeyError: 'beta'

谢谢

标签: pythonjsonloopscsvkeyerror

解决方案


更新代码

我已经获取了您提供的几行 URL,并针对它运行了以下代码并打印了结果。此版本使用多个线程来获取 URL 和requests会话。这大大加快了处理速度。

代码顶部附近有一个常量NUMBER_OF_CONCURRENT_URL_REQUESTS,它确定将发出的并发 URL 获取请求的数量。我尝试了从 8 到 30 的各种数字。这是我学到的(或似乎是真的):

  1. 无论 的设置如何NUMBER_OF_CONCURRENT_URL_REQUESTS,如果您连续快速运行该程序两次,您将获得相同的结果。看起来服务器正在缓存请求结果一段时间。
  2. 但是,如果您等待的时间足够长,以至于缓存没有发挥作用,您会得到不同的结果,即,就数据丢失而言,会出现不同的错误。为什么会这样,我不能说。
  3. 的值越大NUMBER_OF_CONCURRENT_URL_REQUESTS,程序运行得越快。可能有一些值非常大,以至于服务器可能会感到不安,并认为您正在尝试实施拒绝服务攻击。我看不出有任何理由让这个值大于 30。
  4. NUMBER_OF_CONCURRENT_URL_REQUESTS较大的值和丢失数据的可能性之间是否存在相关性?我不能肯定地说,但似乎是这样,这对我来说毫无意义。您可以尝试不同的价值观,并以一种或另一种方式亲眼看看。

编码:

import csv, requests, pandas as pd
from decimal import Decimal, DecimalException
from collections import defaultdict
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from time import sleep

NUMBER_OF_CONCURRENT_URL_REQUESTS = 8

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}

def request_getter(session, url):
    ric = url.split('/')[-1] # in case results does not contain 'ric' key
    for t in (0, 1000, 2000, 4000, 4000):
        if t:
            sleep(t)
            print(f"Retrying request '{ric}' ...", flush=True)
        data = session.get(url, headers=headers).json()
        if 'retry' not in data:
            break
    return ric, data


mapper = (
    ('Ticker', 'ric'),
    ('Beta', 'beta'),
    ('DY', 'current_dividend_yield_ttm'),
    ('VOL', 'share_volume_3m'),
    ('P/L', 'pe_normalized_annual'),
    ('Cresc5A', 'eps_growth_5y'),
    ('LPA', 'eps_normalized_annual'),
    ('VPA', 'book_value_share_quarterly'),
    ('LAST', 'last')
)

data = defaultdict(list)
with open('AcoesURLJsonCompleta.csv', newline='') as csvfile:
    urlreader = csv.reader(csvfile, delimiter=',')
    # set max_workers to # cpu processors you have and use a requests Session for even more perofrmance
    with ThreadPoolExecutor(max_workers=NUMBER_OF_CONCURRENT_URL_REQUESTS) as executor, requests.Session() as session:
        request_getter_with_session = partial(request_getter, session)
        for ric, results in executor.map(request_getter_with_session, (row[0] for row in urlreader)):
            if 'market_data' not in results:
                print(f"Missing 'market_data' key for request '{ric}'", flush=True)
                for k, v in results.items():
                    print(f'    {repr(k)} -> {repr(v)}', flush=True)
                print(flush=True)
                continue
            market_data = results['market_data']
            if 'ric' not in market_data:
                # see if any of the mapper keys are present:
                found = False
                for _, jsonDataKey in mapper:
                    if jsonDataKey in market_data:
                        found = True
                        break
                if not found:
                    print(f"Request '{ric}' has nothing recognizable in market_data:", flush=True)
                    for k, v in market_data.items():
                        print(f'    {repr(k)} -> {repr(v)}', flush=True)
                    print(flush=True)
                    continue
                # We have at least one data value present
                print(f"Results missing 'ric' key; inferring 'ric' value '{ric}' from request URL.", flush=True)
                market_data['ric'] = ric
            for dataKey, jsonDataKey in mapper: # for example, 'Ticker', 'ric'
                d = market_data.get(jsonDataKey)
                if d is None:
                    print(f"Data missing for request = '{ric}', key = '{jsonDataKey}'", flush=True)
                    d = '' if jsonDataKey == 'ric' else Decimal('NaN')
                else:
                    try:
                        if jsonDataKey != 'ric': d = Decimal(d)
                    except DecimalException:
                        print(f"Bad value for '{jsonDataKey}': {repr(d)}", flush=True)
                        d = Decimal('NaN') # Decimal class has it's own version
                data[dataKey].append(d) # add to data

table = pd.DataFrame(data)
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)
"""
# to read back table:
table2 = pd.read_csv('CompleteData.csv', sep=',', encoding='utf-8', converters={
    'Ticker': str,
    'Beta': Decimal,
    'DY': Decimal,
    'VOL': Decimal,
    'P/L': Decimal,
    'Cresc5A': Decimal,
    'LPA': Decimal,
    'VPA': Decimal,
    'LAST': Decimal
})
print(table2)
"""

印刷:

Missing 'market_data' key for request CPLE6.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request EQMA3B.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Data missing for ric GNDI3.sa, key beta
Data missing for ric GNDI3.sa, key current_dividend_yield_ttm
Data missing for ric GNDI3.sa, key share_volume_3m
Data missing for ric GNDI3.sa, key pe_normalized_annual
Data missing for ric GNDI3.sa, key eps_growth_5y
Data missing for ric GNDI3.sa, key eps_normalized_annual
Data missing for ric GNDI3.sa, key book_value_share_quarterly
Missing 'market_data' key for request MDNE3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request MMXM11.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request PCAR3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Results missing ric key; inferring ric value from request URL.
Data missing for ric RAIL3.sa, key last
Results missing ric key; inferring ric value from request URL.
Data missing for ric SANB4.sa, key last
Missing 'market_data' key for request TIMP3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request VIVT3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

       Ticker     Beta        DY        VOL           P/L       Cresc5A       LPA       VPA       LAST
1    AALR3.sa  1.04339   0.80591   11.00223      26.44449  -99999.99000   0.39668  10.83966  10.490000
2    ABCB4.sa  1.20526   7.34780   18.61900       5.78866       5.42894   2.46862  18.87782  14.290000
3    ABEV3.sa  0.46311   4.32628  688.21043      15.04597      -0.71223   0.75369   3.89563  11.340000
4    ADHM3.sa  1.69780   0.00000    2.36460  -99999.99000  -99999.99000  -0.65331  -2.61497   2.480000
5    AGRO3.sa  0.35568   4.53332    2.54323      41.17127  -99999.99000   0.49792  17.47838  20.500000
..        ...      ...       ...        ...           ...           ...       ...       ...        ...
255  WEGE3.sa  0.50580   1.02429  165.72543      50.11481      17.06485   0.79697   4.59658  39.940000
256  WHRL3.sa  0.59263   8.86991    1.24990      12.72584       0.65648   0.50920   2.00868   6.700000
257  WHRL4.sa  0.59263   8.86991    1.24990      12.72584       0.65648   0.50920   2.00868   6.480000
258  WIZS3.sa  0.76719  12.18673   19.00407       6.67135      21.23109   1.36704   1.16978   9.120000
259  YDUQ3.sa  1.42218   1.68099   94.00410      13.83419       9.13751   2.19384  10.31845  30.350000

[259 rows x 9 columns]

下一次运行:

Missing 'market_data' key for request CPLE6.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request EQMA3B.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request MDNE3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request MMXM11.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request PCAR3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request TIMP3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

Missing 'market_data' key for request VIVT3.sa
status -> {}
message -> service returned code:
rcom_service_message -> None

       Ticker     Beta        DY        VOL           P/L       Cresc5A       LPA       VPA       LAST
1    AALR3.sa  1.04339   0.80591   11.00223      26.44449  -99999.99000   0.39668  10.83966  10.490000
2    ABCB4.sa  1.20526   7.34780   18.61900       5.78866       5.42894   2.46862  18.87782  14.290000
3    ABEV3.sa  0.46311   4.32628  688.21043      15.04597      -0.71223   0.75369   3.89563  11.340000
4    ADHM3.sa  1.69780   0.00000    2.36460  -99999.99000  -99999.99000  -0.65331  -2.61497   2.480000
5    AGRO3.sa  0.35568   4.53332    2.54323      41.17127  -99999.99000   0.49792  17.47838  20.500000
..        ...      ...       ...        ...           ...           ...       ...       ...        ...
255  WEGE3.sa  0.50580   1.02429  165.72543      50.11481      17.06485   0.79697   4.59658  39.940000
256  WHRL3.sa  0.59263   8.86991    1.24990      12.72584       0.65648   0.50920   2.00868   6.700000
257  WHRL4.sa  0.59263   8.86991    1.24990      12.72584       0.65648   0.50920   2.00868   6.480000
258  WIZS3.sa  0.76719  12.18673   19.00407       6.67135      21.23109   1.36704   1.16978   9.120000
259  YDUQ3.sa  1.42218   1.68099   94.00410      13.83419       9.13751   2.19384  10.31845  30.350000

[259 rows x 9 columns]

讨论

通过使用线程和请求 Session 对象使代码变得更加复杂,但是复杂性对于大大减少程序的运行时间是必要的。

要理解代码,您需要了解ThreadPoolExecutormap函数(ThreadPoolExcecutor.map方法是 this 的变体,它分配一个线程来执行函数调用)和functools.partial,这是必需的,因为map它的函数参数是一个接受单个参数的函数我们需要request_getter使用两个参数进行调用,一个requests永远不变的 Session 对象和一个 URL。partial允许我们将一个接受两个参数的函数转换为一个接受一个参数并自动提供另一个参数的函数。例如:

def foo(x, y):
    return x + y

def foo7(y):
    return partial(foo, 7) # the first argument to foo now will always be 7

foo7(9) # equivalent to foo(7, 9)

要读回 csv 文件:

from decimal import Decimal
import pandas as pd

table = pd.read_csv('CompleteData.csv', sep=',', encoding='utf-8', converters={
    'Ticker': str,
    'Beta': Decimal,
    'DY': Decimal,
    'VOL': Decimal,
    'P/L': Decimal,
    'Cresc5A': Decimal,
    'LPA': Decimal,
    'VPA': Decimal,
    'LAST': Decimal    
})

推荐阅读