python - 带有for循环的dict理解中的“未正确调用DataFrame构造函数”
问题描述
编辑问题以更好地呈现问题。
我正在学习数据分析,无法弄清楚这里有什么问题。
我通过 API 获取数据并从中生成一个 df,其中行代表一个匹配项,其中一列包含有关 dota 匹配中所有玩家的各种信息,形式为嵌套字典列表(原始字典有点巨大所以如果需要的话,我不知道如何在此处包含它)。
我想做的是为每个游戏的特定玩家创建一个包含详细统计数据的 df。为此,我正在尝试:
- 循环遍历原始df中“玩家”列中的每一行(每行代表一个游戏)
- 为每个人创建 dfs 并将它们存储在一个字典中(现在我们有一个 dfs 字典,每个 dfs 由 10 行组成,用于游戏中的 10 名玩家,列代表他们的统计数据)
- 遍历这些存储的 df 以在其中找到所需的行(按 player_id)并将其附加到最终的 df。
现在问题来了:
所以
pd.DataFrame(in_df.players[1])
确实可以自己工作并创建一个df。
{i: pd.DataFrame(in_df.players[i]) for i in range(10)}
也可以按预期工作。但是这个:
names_for_dfs = [i for i in range(len(in_df))]
{name: pd.DataFrame(in_df.players[name]) for name in names_for_dfs}
不起作用。有问题的功能:
def get_player_stats(in_df, cols_to_keep, player_id):
#create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
#find a row with player_id for player in each game (each df) and append it to out_df
out_df = pd.DataFrame()
names_for_dfs = [row for row in range(len(in_df))]
dfs = {
name : pd.DataFrame(in_df.loc[name, 'players'])
for name in names_for_dfs
}
for name, df in dfs.items():
out_df = out_df.append(df[df.account_id.isin([player_id])], ignore_index=True) # get a row by id and append to final df
return out_df[cols_to_keep]
我收到一个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-1a40ba2737e6> in <module>
7 return dfs
8
----> 9 dfs = get_player_stats(matches_data, core_stats, 34505203)
10 dfs
<ipython-input-27-1a40ba2737e6> in get_player_stats(in_df, cols_to_keep, player_id)
3 dfs = {
4 name : pd.DataFrame(in_df.loc[name, 'players'])
----> 5 for name in names_for_dfs
6 }
7 return dfs
<ipython-input-27-1a40ba2737e6> in <dictcomp>(.0)
3 dfs = {
4 name : pd.DataFrame(in_df.loc[name, 'players'])
----> 5 for name in names_for_dfs
6 }
7 return dfs
~\miniconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 )
508 else:
--> 509 raise ValueError("DataFrame constructor not properly called!")
510
511 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
所以我开始创建一个测试数据呈现给你进行复制。我要求提供原始 df 和 .to_dict() 的样本,以便更好地了解它是如何构造的。我基于此提出了以下示例数据:
data = {'match_id': {0: 5490791923.0, 1: 5490651026.0, 2: 5490555360.0},
'players': {0: [{'match_id': 5490791923, 'stat1': 101, 'stat2': [1, 2, 3], 'stat3': {1: 1, 2: 2, 3: [1, 2, 3]}}],
1: [{'match_id': 5490791923, 'stat1': 101, 'stat2': [1, 2, 3], 'stat3': {1: 1, 2: 2, 3: [1, 2, 3]}}],
2: [{'match_id': 5490791923, 'stat1': 101, 'stat2': [1, 2, 3], 'stat3': {1: 1, 2: 2, 3: [1, 2, 3]}}]
}
}
所以然后我从中创建了 df ,它看起来有点像原始的。
stats = pd.DataFrame(data = data)
然后我使用与上述相同的步骤来确保一切正常,但事情进展顺利且没有错误。
in_df = stats
names_for_dfs = [i for i in range(len(in_df))]
dfs = {name: pd.DataFrame(in_df.loc[name, 'players']) for name in names_for_dfs}
打印出这个
{0: match_id stat1 stat2 stat3
0 5490791923 101 [1, 2, 3] {1: 1, 2: 2, 3: [1, 2, 3]},
1: match_id stat1 stat2 stat3
0 5490791923 101 [1, 2, 3] {1: 1, 2: 2, 3: [1, 2, 3]},
2: match_id stat1 stat2 stat3
0 5490791923 101 [1, 2, 3] {1: 1, 2: 2, 3: [1, 2, 3]}}
所以现在我开始思考阻止解决方案最初工作的区别是什么?我得到原始数据的代码:
def get_player_ids(team_id: int):
players = requests.get(f'https://api.opendota.com/api/teams/{team_id}/players').json()
ids = []
keys = ['account_id', 'name']
for player in players:
for k, v in player.items():
if k in keys:
ids.append({k: v})
print(ids)
return ids
def get_team_id(team_name: str):
teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
team_id = int(teams.team_id[teams.name.str.lower() == team_name.lower()])
get_player_ids(team_id)
return team_id
columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv',
'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
def get_match_data_for_team(team_id: int):
l = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
match_ids = [d['match_id'] for d in l]
matches_data = []
for m_id in match_ids:
matches_data.append(requests.get('http://api.opendota.com/api/matches/' + f'{m_id}').json())
return pd.DataFrame(matches_data)[columns]
matches_data = get_match_data_for_team(get_team_id('nigma'))
编辑:已修复,以下代码现在可以使用:
def get_player_stats(in_df, cols_to_keep, player_id):
#create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
#find a row with player_id for MC in each game (df) and append it to out_df
out_df = pd.DataFrame()
dfs = {}
names_for_dfs = [row for row in range(len(in_df))]
for name in names_for_dfs:
for player_dict in in_df.players[name]:
if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
dfs.update({name: df})
for name, df in dfs.items():
out_df = out_df.append(df)
return out_df[cols_to_keep]
但我错过了一些行
if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
似乎在这种情况下,因为matches_data
有 193 行,但out_df
只有 143 行。这样
out_df = pd.DataFrame()
dfs = {}
for match_number in range(len(matches_data)):
for player_dict in matches_data.players[match_number]:
if isinstance(player_dict, dict):
df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
dfs.update({match_number: df})
for name, df in dfs.items():
out_df = out_df.append(df[df.account_id.isin([34505203])], ignore_index=True)
我得到的甚至更少 - 138 行。如何在这些巢穴中正确搜索所需的玩家?
解决方案
我会尝试:
- 将您的功能简化为如下所示的内容,这可以满足您
DataFrame
在所有比赛中制作给定球员的编译统计数据的目标。 - 字典逻辑有助于在从嵌套
dicts
的 a中提取数据时减轻一些索引复杂性DataFrame
,因此此函数采用DataFrame
asin_df
但dict
使用该DataFrame.to_dict()
方法将其更改为 a。
代码:
import pandas as pd
def get_player_stats(in_df, player_id):
df = pd.DataFrame()
for match, players in in_df.to_dict()['players'].items():
# {match1: {players}}
for player, info in players.items():
# {player1: {info}}
if info['account_id'] == player_id:
# {player1: {'account_id': player_id}}
df = df.append(pd.Series(data=info, name=match))
cols_to_keep = [col for col in df.columns if col != 'account_id']
return df[cols_to_keep]
# I assume your data looks something like this:
matches_2020 = {
'date': {
'match1': '2020-06-01',
'match2': '2020-06-02'
},
'players': {
'match1': {
'player1': {'account_id': 'FAKER', 'cs': 700, 'champ': 'Zoe'},
'player2': {'account_id': 'BJERGSON', 'cs': 500, 'champ': 'Talon'}
},
'match2': {
'player1': {'account_id': 'FAKER', 'cs': 800, 'champ': 'Syndra'},
'player2': {'account_id': 'REDMERCY', 'cs': 500, 'champ': 'Zed'}
}
}
}
in_df = pd.DataFrame(matches_2020)
# Let's pull Faker's stats:
faker = get_player_stats(in_df, 'FAKER')
print(faker)
输出:
champ cs
match1 Zoe 700.0
match2 Syndra 800.0
推荐阅读
- amazon-web-services - 为什么 AWS 安全组不允许 sg-ID 的入站 http 流量
- apache-spark - 为什么 spark sketch BloomFilter 速度慢并且误报率高
- node.js - 更新 nuxt.config.js 时开发服务器崩溃
- python - 什么是测量嵌套正多边形偏移的好方法?
- ruby-on-rails - 如何在 ruby on rails 中的 Net::sftp 中使用 PRIVATE RSA 密钥文件
- swift - 谷歌地图标记在swift 5中显示在路外
- laravel - 覆盖显示这些凭据的默认登录方法与我们的记录不匹配 Laravel 6
- c++ - 使用类级和方法级模板参数定义方法
- javascript - 将 javascript 数组保存到 Postgres 多边形字段中
- apache - 移除 utm_source 参数并用 htaccess 替换 index