首页 > 解决方案 > 如何使用 python pandas 或 dask 找到棒球运动员的相似之处?

问题描述

以下是查找棒球运动员之间相似性的规则公式:

One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.

我有一些球员像:

一些玩家看重

你能解释一下如何在 python 中使用 pandas 或 dask 制作公式吗?我被它困住了。

标签: pythonpandasdask

解决方案


不太确定您要问什么,因此假设这些“分数”适用于每个玩家/行,然后您可以通过查找接近分数来找到相似的玩家。

为此,只需设置一个循环来计算它(我会通过矢量化列来做,因为它比按行迭代更快。)

您可能需要调整列名以匹配您在rules我创建的字典中的数据集中的列名。

import pandas as pd
import requests
import numpy as np


# Sample Data
mlb_api = 'https://bdfed.stitch.mlbinfra.com/bdfed/stats/player'
payload = {
'stitch_env': 'prod',
'season': '2020',
'sportId': '1',
'stats': 'season',
'group': 'hitting',
'gameType': 'R',
'offset': '0',
'sortStat': 'onBasePlusSlugging',
'order': 'desc'}

jsonData = requests.get(mlb_api, params=payload).json()
df = pd.DataFrame(jsonData['stats'])


# Here is the Code you'll need
rules = {
'gamesPlayed':20,
'atBats':75,
'runs':10,
'hits':15,
'doubles':5,
'triples':4,
'homeRuns':2,
'rbi':10,
'baseOnBalls':25,
'strikeOuts':150,
'stolenBases':20,
'avg':.001,
'slg':.002}


df['Score'] = 0
for col, points in rules.items():
    df['Score'] += np.floor(df[col].astype(float) / points)

df = df.sort_values('Score', ascending=False).reset_index(drop=True)

输出:

print(df[['playerName','Score']])
            playerName  Score
0            Juan Soto  719.0
1      Freddie Freeman  691.0
2        Marcell Ozuna  687.0
3          DJ LeMahieu  680.0
4           Jose Abreu  658.0
5          Trea Turner  657.0
6        Dominic Smith  646.0
7         Jose Ramirez  623.0
8          Nelson Cruz  623.0
9         Corey Seager  623.0
10       Manny Machado  622.0
11           Wil Myers  614.0
12           Luke Voit  610.0
13          Mike Trout  607.0
14    Mike Yastrzemski  602.0
15    Michael Conforto  600.0
16   Teoscar Hernandez  600.0
17        Eloy Jimenez  598.0
18        Mookie Betts  597.0
19  Fernando Tatis Jr.  590.0
20        Brandon Lowe  568.0
21    Ronald Acuna Jr.  562.0
22        Bryce Harper  561.0
23     George Springer  556.0
24      Anthony Rendon  552.0

推荐阅读