pandas - 如何在 groupby 计算()之后保留 Dask DataFrame
问题描述
我有一个 Dask DataFrame,当我有一个 groupby 时,我发现在使用 compute() 之前我没有处理列,但是在使用 compute() 时 Dask DataFrame 更改为 Pandas DataFrame,所以 Dask DataFrame 没有优势,我想保留 Dask一直是DataFrame,查看详情:</p>
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({"name":["Jack","Jack","Willom","Willom","James","James","Morgan"],
"fix_num":[50,50,70,70,90,90,100],
"score1":[50,60,70,80,90,40,60],
"score2":[90,50,30,40,100,80,80]})
ddf = dd.from_pandas(df, npartitions=1)
ddf.compute()
name fix_num score1 score2
0 Jack 50 50 90
1 Jack 50 60 50
2 Willom 70 70 30
3 Willom 70 80 40
4 James 90 90 100
5 James 90 40 80
6 Morgan 100 60 80
def _element_coment(t):
a = t["score1"].sum()
b = t["score2"].sum()
return pd.Series((a, b), index=['sum_1', 'sum_2'])
grp = ddf.groupby(['name','fix_num'])\
.apply(_element_coment,meta={'sum_1':int, 'sum_2':int})\
.reset_index()
judg = grp.fix_num <= grp.sum_2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\core.py", line 3387, in __getattr__
raise AttributeError("'DataFrame' object has no attribute %r" % key)
AttributeError: 'DataFrame' object has no attribute 'fix_num'
grp.columns #I found no fix_num in columns
Index(['index', 'sum_1', 'sum_2'], dtype='object')
grp_2 = grp.compute()
grp_2
name fix_num sum_1 sum_2
0 Jack 50 110 140
1 James 90 130 180
2 Morgan 100 60 80
3 Willom 70 150 70
# grp_2 have fix_num in columns, but grp_2 is pandas DataFrame
jud g = grp2_2.fix_num<=grp2_2.sum_2
grp_2.dtypes
name object
fix_num int64
sum_1 int64
sum_2 int64
dtype: object**
那么如何保留 Dask DataFrame 进行处理呢?
解决方案
推荐阅读
- javascript - Ag 网格导出到 csv 将前导零删除到数字 1 而不是 00001
- testng - preserve-order=true 会影响测试用例的并行性吗
- java - Spring Boot 独立 CommandLineRunner 不会随 spring-starter-amqp 返回
- android - 使用现有证书为 React Native Android 应用程序生成签名的 apk
- reactjs - 在反应中传递来自 JSON 的图像路径
- microsoft-cognitive - 在某些版本的 Windows 中连接 SR-300 相机时遇到麻烦
- php - 遍历数组并检查哪个字符串长度更大
- maven - 使用 Docker 和 Kotlin 缓存 Maven
- sql-server - 运行此查询的更好性能
- python - 如何通过附加操作将列表推导用于 for 循环