首页 > 解决方案 > 为熊猫数据框中的每个用户选择前 50 行 - Python 3.x

问题描述

对不起,如果这是一个重复的问题;我没有发现其他有类似解决方案的人。

我有一个非常大的熊猫数据框,名为csv_table

print(csv_table.shape)产量(1155522, 6)

数据框如下所示:

                username                                              tweet following followers is_retweet is_bot
0             narutouz16  RT @GetMadz: Sound design in this game is 10/1...        59        20          1      0
1             narutouz16                         @hbthen3rd I know I don't.        59        20          0      0
2             narutouz16  @TonyKelly95 I'm still not satisfied in the en...        59        20          0      0

我需要做的是Dataframe为每个用户名创建一个只有前 20 行的较小的用户名,并跳过没有至少 20 行的用户名。

我看过这个问题,它建议使用以下内容:

df.groupby('username').head(20).reset_index(drop=True)

这会产生相当好的结果:

              username                                              tweet following followers is_retweet is_bot
0           narutouz16  RT @GetMadz: Sound design in this game is 10/1...        59        20          1      0
1           narutouz16                         @hbthen3rd I know I don't.        59        20          0      0
2           narutouz16  @TonyKelly95 I'm still not satisfied in the en...        59        20          0      0
3           narutouz16  I'm currently in second place in my leaderboar...        59        20          0      0
4           narutouz16  @TheRealRotimi live footage of us at spin. htt...        59        20          0      0
5           narutouz16  Duolingo has more content than I thought, add ...        59        20          0      0
6           narutouz16                       @TonyKelly95 It dont go down        59        20          0      0
7           narutouz16  This is my meme day, where I explore the inter...        59        20          0      0
8           narutouz16            RT @DitzyFlama: ygHeAZSQkA        59        20          1      0
9           narutouz16  When you turn around and someone went from sin...        59        20          0      0
10          narutouz16  How dare you leave me with a cliffhanger in ch...        59        20          0      0
11          narutouz16                     I'm entering my popular phase.        59        20          0      0
12          narutouz16  RT @gurugurugravity: #ThankYouGameFreak for th...        59        20          1      0
13          narutouz16  @TonyKelly95 I'm pretty sure that guy was just...        59        20          0      0
14          narutouz16  Yeah, when christmas time comes, I'm about to ...        59        20          0      0
15          narutouz16  I don't like higher education, because the las...        59        20          0      0
16          narutouz16  I found a spotify playlist called childhood Bo...        59        20          0      0
17          narutouz16  Theres two type of people in this world. Peopl...        59        20          0      0
18          narutouz16  I just want to let people know just dance 2020...        59        20          0      0
19          narutouz16           RT @AAAAAGGHHHH: PxD2vdLelo        59        20          1      0
20       GamerGrowthHQ  RT @zFakes_: Looking for an editor to make My ...     73508    130115          1      0
21       GamerGrowthHQ  RT @Ltdanmagicleg: I don't just want you in my...     73508    130115          1      0
22       GamerGrowthHQ  RT @MissAliCatt: I'm so tired of people's dram...     73508    130115          1      0
23       GamerGrowthHQ  RT @FrostedCaribou: �NEW VIDEO�\n\nPulling MOA...     73508    130115          1      0
24       GamerGrowthHQ  RT @adron_foe: People get so up in arms about ...     73508    130115          1      0
25       GamerGrowthHQ  RT @guccipoptart346: Jumping on the #ModernWar...     73508    130115          1      0
26       GamerGrowthHQ  RT @adron_foe: If my dick and my hand are frie...     73508    130115          1      0
27       GamerGrowthHQ  RT @lebazmada: Time for my #livestream on #twi...     73508    130115          1      0
28       GamerGrowthHQ  RT @GamerGrowthHQ: What is your favorite game ...     73508    130115          1      0
29       GamerGrowthHQ  What is your favorite game to play when ur on ...     73508    130115          0      0

我不明白的是,如果用户名在数据框中的行数少于 20 行,如何在检查中添加不计算用户名。

标签: pythonpandasdataframe

解决方案


我们可以做的transform

n=20
s=df.groupby('username').username.transform('count')
yourdf=df[s>=n].groupby('username').head(n).reset_index(drop=True)

推荐阅读