首页 > 解决方案 > 带有for循环的dict理解中的“未正确调用DataFrame构造函数”

问题描述

编辑问题以更好地呈现问题。

我正在学习数据分析,无法弄清楚这里有什么问题。

我通过 API 获取数据并从中生成一个 df,其中行代表一个匹配项,其中一列包含有关 dota 匹配中所有玩家的各种信息,形式为嵌套字典列表(原始字典有点巨大所以如果需要的话,我不知道如何在此处包含它)。

我想做的是为每个游戏的特定玩家创建一个包含详细统计数据的 df。为此,我正在尝试:

  1. 循环遍历原始df中“玩家”列中的每一行(每行代表一个游戏)
  2. 为每个人创建 dfs 并将它们存储在一个字典中(现在我们有一个 dfs 字典,每个 dfs 由 10 行组成,用于游戏中的 10 名玩家,列代表他们的统计数据)
  3. 遍历这些存储的 df 以在其中找到所需的行(按 player_id)并将其附加到最终的 df。

现在问题来了:

所以

pd.DataFrame(in_df.players[1])

确实可以自己工作并创建一个df。

{i: pd.DataFrame(in_df.players[i]) for i in range(10)}

也可以按预期工作。但是这个:

names_for_dfs = [i for i in range(len(in_df))]
{name: pd.DataFrame(in_df.players[name]) for name in names_for_dfs}

不起作用。有问题的功能:

def get_player_stats(in_df, cols_to_keep, player_id):
#create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
#find a row with player_id for player in each game (each df) and append it to out_df
out_df = pd.DataFrame()

names_for_dfs = [row for row in range(len(in_df))]
     
dfs = {
name : pd.DataFrame(in_df.loc[name, 'players'])
for name in names_for_dfs
} 

for name, df in dfs.items():
    out_df = out_df.append(df[df.account_id.isin([player_id])], ignore_index=True)  # get a row by id and append to final df
return out_df[cols_to_keep]

我收到一个错误:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-1a40ba2737e6> in <module>
      7     return dfs
      8 
----> 9 dfs = get_player_stats(matches_data, core_stats, 34505203)
     10 dfs

<ipython-input-27-1a40ba2737e6> in get_player_stats(in_df, cols_to_keep, player_id)
      3     dfs = {
      4     name : pd.DataFrame(in_df.loc[name, 'players'])
----> 5     for name in names_for_dfs
      6     }
      7     return dfs

<ipython-input-27-1a40ba2737e6> in <dictcomp>(.0)
      3     dfs = {
      4     name : pd.DataFrame(in_df.loc[name, 'players'])
----> 5     for name in names_for_dfs
      6     }
      7     return dfs

~\miniconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    507                 )
    508             else:
--> 509                 raise ValueError("DataFrame constructor not properly called!")
    510 
    511         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

所以我开始创建一个测试数据呈现给你进行复制。我要求提供原始 df 和 .to_dict() 的样本,以便更好地了解它是如何构造的。我基于此提出了以下示例数据:

data = {'match_id': {0: 5490791923.0, 1: 5490651026.0, 2: 5490555360.0},
 'players': {0: [{'match_id': 5490791923, 'stat1': 101, 'stat2': [1, 2, 3], 'stat3': {1: 1, 2: 2, 3: [1, 2, 3]}}],
             1: [{'match_id': 5490791923, 'stat1': 101, 'stat2': [1, 2, 3], 'stat3': {1: 1, 2: 2, 3: [1, 2, 3]}}],
             2: [{'match_id': 5490791923, 'stat1': 101, 'stat2': [1, 2, 3], 'stat3': {1: 1, 2: 2, 3: [1, 2, 3]}}]
            }
       }

所以然后我从中创建了 df ,它看起来有点像原始的。

stats = pd.DataFrame(data = data)

然后我使用与上述相同的步骤来确保一切正常,但事情进展顺利且没有错误。

in_df = stats
names_for_dfs = [i for i in range(len(in_df))]
dfs = {name: pd.DataFrame(in_df.loc[name, 'players']) for name in names_for_dfs}

打印出这个

{0:      match_id  stat1      stat2                       stat3
 0  5490791923    101  [1, 2, 3]  {1: 1, 2: 2, 3: [1, 2, 3]},
 1:      match_id  stat1      stat2                       stat3
 0  5490791923    101  [1, 2, 3]  {1: 1, 2: 2, 3: [1, 2, 3]},
 2:      match_id  stat1      stat2                       stat3
 0  5490791923    101  [1, 2, 3]  {1: 1, 2: 2, 3: [1, 2, 3]}}

所以现在我开始思考阻止解决方案最初工作的区别是什么?我得到原始数据的代码:

def get_player_ids(team_id: int):
    players = requests.get(f'https://api.opendota.com/api/teams/{team_id}/players').json()
    ids = []
    keys = ['account_id', 'name']
    for player in players:
        for k, v in player.items():
            if k in keys:
                ids.append({k: v})
    print(ids)
    return ids

def get_team_id(team_name: str):
    teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
    team_id = int(teams.team_id[teams.name.str.lower() == team_name.lower()])
    get_player_ids(team_id)
    return team_id

columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv',
           'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
def get_match_data_for_team(team_id: int):    
    l = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
    match_ids = [d['match_id'] for d in l]
    matches_data = []
    for m_id in match_ids:
        matches_data.append(requests.get('http://api.opendota.com/api/matches/' + f'{m_id}').json())
    
    return pd.DataFrame(matches_data)[columns]

matches_data = get_match_data_for_team(get_team_id('nigma'))

编辑:已修复,以下代码现在可以使用:

def get_player_stats(in_df, cols_to_keep, player_id):
    #create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
    #find a row with player_id for MC in each game (df) and append it to out_df
    out_df = pd.DataFrame()
    dfs = {}

    names_for_dfs = [row for row in range(len(in_df))]
    for name in names_for_dfs:
        for player_dict in in_df.players[name]:
            if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
                df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
                dfs.update({name: df})

    for name, df in dfs.items():
        out_df = out_df.append(df)
        
    return out_df[cols_to_keep]

但我错过了一些行

if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:

似乎在这种情况下,因为matches_data有 193 行,但out_df只有 143 行。这样

out_df = pd.DataFrame()
dfs = {}
for match_number in range(len(matches_data)):
    for player_dict in matches_data.players[match_number]:
        if isinstance(player_dict, dict):
            df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
            dfs.update({match_number: df})
for name, df in dfs.items():
    out_df = out_df.append(df[df.account_id.isin([34505203])], ignore_index=True)

我得到的甚至更少 - 138 行。如何在这些巢穴中正确搜索所需的玩家?

标签: pythonpython-3.xpandas

解决方案


我会尝试:

  • 将您的功能简化为如下所示的内容,这可以满足您DataFrame在所有比赛中制作给定球员的编译统计数据的目标。
  • 字典逻辑有助于在从嵌套dicts的 a中提取数据时减轻一些索引复杂性DataFrame,因此此函数采用DataFrameasin_dfdict使用该DataFrame.to_dict()方法将其更改为 a。

代码:

import pandas as pd

def get_player_stats(in_df, player_id):
    
    df = pd.DataFrame()

    for match, players in in_df.to_dict()['players'].items():
        # {match1: {players}}

        for player, info in players.items():
            # {player1: {info}}

            if info['account_id'] == player_id:
                # {player1: {'account_id': player_id}}

                df = df.append(pd.Series(data=info, name=match))

    cols_to_keep = [col for col in df.columns if col != 'account_id']

    return df[cols_to_keep]

# I assume your data looks something like this:
matches_2020 = {

    'date': {
        'match1': '2020-06-01',
        'match2': '2020-06-02'
    },
    'players': {
        'match1': {
            'player1': {'account_id': 'FAKER', 'cs': 700, 'champ': 'Zoe'},
            'player2': {'account_id': 'BJERGSON', 'cs': 500, 'champ': 'Talon'}
        },
        'match2': {
            'player1': {'account_id': 'FAKER', 'cs': 800, 'champ': 'Syndra'},
            'player2': {'account_id': 'REDMERCY', 'cs': 500, 'champ': 'Zed'}
        }
    }
}

in_df = pd.DataFrame(matches_2020)

# Let's pull Faker's stats:
faker = get_player_stats(in_df, 'FAKER')
print(faker)

输出:

         champ     cs
match1     Zoe  700.0
match2  Syndra  800.0

推荐阅读