首页 > 解决方案 > 从 groupby 数据框创建数据框

问题描述

我有这个包含股票数据的DataFrame,我想对其进行迭代以为每个股票代码创建一个df(例如PETR4.SA)。我已经用groupbyand手动完成了.get_group,但我不知道如何用foror来做def。现在的代码是:

import pandas as pd
import yfinance as yf
import pandas_datareader.data as pdr

yf.pdr_override()
tickers = ['PETR4.SA', 'FLRY3.SA', 'ODPV3.SA', 'CREM3.SA', 'BPHA3.SA']
acoes = pdr.get_data_yahoo(tickers) 

# This line until the end  is to transform the multiindex of tickers into a column
acoes.index.name = 'date'  
long_form = acoes.reset_index().melt('date', var_name=['var','ticker'])  
df = long_form.pivot_table(index=['date', 'ticker'], columns='var', values='value').reset_index()

# Grouping the df by tickers
grupos = DF.groupby('ticker')
grupos.groups # The groups are presented in dicts, don't know if this helps or not

标签: pythonpandasloopsdataframedata-science

解决方案


您可以在 groupby的groups属性的帮助下做到这一点。groups 返回一个Dict {group name -> group labels}

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

现在申请 groupby:

>>> grouped = df.groupby('Year')


>>> grouped.groups
{2014: Int64Index([0, 2, 4, 9], dtype='int64'),
 2015: Int64Index([1, 3, 5, 10], dtype='int64'),
 2016: Int64Index([6, 8], dtype='int64'),
 2017: Int64Index([7, 11], dtype='int64')}

现在您可以遍历这些组,并且可以为每个组创建一个单独的 DataFrame。

>>> a = [grouped.get_group(group) for group in grouped.groups]

现在 a 每年都包含单独的 DataFrame。

>>>a[0]
Team    Rank    Points
0   Riders  1   876
2   Devils  2   863
4   Kings   3   741
9   Royals  4   701


>>> type(a[0])
pandas.core.frame.DataFrame

希望这可以解决您的问题。


推荐阅读