python - 为熊猫数据框中的每个用户选择前 50 行 - Python 3.x
问题描述
对不起,如果这是一个重复的问题;我没有发现其他有类似解决方案的人。
我有一个非常大的熊猫数据框,名为csv_table
:
print(csv_table.shape)
产量(1155522, 6)
数据框如下所示:
username tweet following followers is_retweet is_bot
0 narutouz16 RT @GetMadz: Sound design in this game is 10/1... 59 20 1 0
1 narutouz16 @hbthen3rd I know I don't. 59 20 0 0
2 narutouz16 @TonyKelly95 I'm still not satisfied in the en... 59 20 0 0
我需要做的是Dataframe
为每个用户名创建一个只有前 20 行的较小的用户名,并跳过没有至少 20 行的用户名。
我看过这个问题,它建议使用以下内容:
df.groupby('username').head(20).reset_index(drop=True)
这会产生相当好的结果:
username tweet following followers is_retweet is_bot
0 narutouz16 RT @GetMadz: Sound design in this game is 10/1... 59 20 1 0
1 narutouz16 @hbthen3rd I know I don't. 59 20 0 0
2 narutouz16 @TonyKelly95 I'm still not satisfied in the en... 59 20 0 0
3 narutouz16 I'm currently in second place in my leaderboar... 59 20 0 0
4 narutouz16 @TheRealRotimi live footage of us at spin. htt... 59 20 0 0
5 narutouz16 Duolingo has more content than I thought, add ... 59 20 0 0
6 narutouz16 @TonyKelly95 It dont go down 59 20 0 0
7 narutouz16 This is my meme day, where I explore the inter... 59 20 0 0
8 narutouz16 RT @DitzyFlama: ygHeAZSQkA 59 20 1 0
9 narutouz16 When you turn around and someone went from sin... 59 20 0 0
10 narutouz16 How dare you leave me with a cliffhanger in ch... 59 20 0 0
11 narutouz16 I'm entering my popular phase. 59 20 0 0
12 narutouz16 RT @gurugurugravity: #ThankYouGameFreak for th... 59 20 1 0
13 narutouz16 @TonyKelly95 I'm pretty sure that guy was just... 59 20 0 0
14 narutouz16 Yeah, when christmas time comes, I'm about to ... 59 20 0 0
15 narutouz16 I don't like higher education, because the las... 59 20 0 0
16 narutouz16 I found a spotify playlist called childhood Bo... 59 20 0 0
17 narutouz16 Theres two type of people in this world. Peopl... 59 20 0 0
18 narutouz16 I just want to let people know just dance 2020... 59 20 0 0
19 narutouz16 RT @AAAAAGGHHHH: PxD2vdLelo 59 20 1 0
20 GamerGrowthHQ RT @zFakes_: Looking for an editor to make My ... 73508 130115 1 0
21 GamerGrowthHQ RT @Ltdanmagicleg: I don't just want you in my... 73508 130115 1 0
22 GamerGrowthHQ RT @MissAliCatt: I'm so tired of people's dram... 73508 130115 1 0
23 GamerGrowthHQ RT @FrostedCaribou: �NEW VIDEO�\n\nPulling MOA... 73508 130115 1 0
24 GamerGrowthHQ RT @adron_foe: People get so up in arms about ... 73508 130115 1 0
25 GamerGrowthHQ RT @guccipoptart346: Jumping on the #ModernWar... 73508 130115 1 0
26 GamerGrowthHQ RT @adron_foe: If my dick and my hand are frie... 73508 130115 1 0
27 GamerGrowthHQ RT @lebazmada: Time for my #livestream on #twi... 73508 130115 1 0
28 GamerGrowthHQ RT @GamerGrowthHQ: What is your favorite game ... 73508 130115 1 0
29 GamerGrowthHQ What is your favorite game to play when ur on ... 73508 130115 0 0
我不明白的是,如果用户名在数据框中的行数少于 20 行,如何在检查中添加不计算用户名。
解决方案
我们可以做的transform
n=20
s=df.groupby('username').username.transform('count')
yourdf=df[s>=n].groupby('username').head(n).reset_index(drop=True)
推荐阅读
- reactjs - 我想使用 react hook 'useState' 来保存从 API 获取的信息并显示在屏幕上。但是,我不能在课堂上使用,我该怎么办?
- swift - 从另一个可观察对象创建可观察对象
- javascript - 计数功能和数量
- angular - 单击网格外部的外部按钮后,角度 ag 网格更新/设置列搜索特定列的“agTextColumnFilter”
- kotlin - Kotlin `toInt()` 二进制补码
- android-studio - 在 gradle 终端中运行命令时出现错误 206
- asp.net - HtmlInputText 名称属性不起作用并由 ID 替换
- python - 如何使用 Beautiful Soup 从网页中的表格中提取数据
- html - 如何在响应式标题的右侧居中标题和图像
- kubernetes - kubernetes:什么是有效的节点字段