python - 分发在黑暗中如何运作?
问题描述
我有一个数据框:
import numpy as np
import pandas as pd
import dask.dataframe as dd
a = {'b':['cat','bat','cat','cat','bat','No Data','bat','No Data'],
'c':['str1','str2','str3', 'str4','str5','str6','str7', 'str8']
}
df11 = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])
我尝试使用 lamda 函数在行基础和普通数据帧上提取每个元素,如下所示:
def elementsearch(term1, term2):
print(term1, term2 )
return term1
df11.apply(lambda x: elementsearch(x.b,x.c), axis =1)
这工作正常。但是当我使用 dask 库时:
ddf = dd.from_pandas(df11,npartitions=8)
ddf.map_partitions(lambda df : df.apply(lambda x : elementsearch((x.b,x.c),axis=1)))
它抛出了如下错误:
ValueError: Metadata inference failed in `lambda`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
AttributeError("'Series' object has no attribute 'c'", 'occurred at index b')
Traceback:
---------
File "/opt/conda/lib/python3.6/site-packages/dask/dataframe/utils.py", line 137, in raise_on_meta_error
yield
File "/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py", line 3477, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "<ipython-input-198-8857a48ba1e5>", line 2, in <lambda>
ddf.map_partitions(lambda df : df.apply(lambda x : elementsearch((x.b,x.c),axis=1)))
File "/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py", line 6014, in apply
return op.get_result()
File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 318, in get_result
return super(FrameRowApply, self).get_result()
File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 142, in get_result
return self.apply_standard()
File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 248, in apply_standard
self.apply_series_generator()
File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
results[i] = self.f(v)
File "<ipython-input-198-8857a48ba1e5>", line 2, in <lambda>
ddf.map_partitions(lambda df : df.apply(lambda x : elementsearch((x.b,x.c),axis=1)))
File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 4376, in __getattr__
return object.__getattribute__(self, name)
我在堆栈溢出中提到了这个问题,但它对我不起作用: 在 Dask DataFrame.apply() 上,在处理实际行之前接收值 1 的 n 行
我该如何解决?
解决方案
我建议只在 dask 数据帧上使用 apply 方法,就像你对 Pandas 代码所做的那样
df11.apply(lambda x: elementsearch(x.b,x.c), axis =1)
推荐阅读
- keras - 精度不变
- javascript - Jest 在快速中间件中模拟回调函数
- r - 使用 miselect 包在测试和训练数据集中 MI 后拆分堆叠数据集用于 MI Lasso / Elastic Net
- python - pyodbc.ProgrammingError: ('SQL 包含 2 个参数标记,但提供了 1 个参数', 'HY000')
- python - Python:如何轻松地使特定的字符串字符成为唯一可能的输入
- typescript - TypeScript:扩展 Object.prototype 并引用 this
- flutter - 如何将API调用中的值分配给flutter中的变量
- java - 是否可以在java中将注释设置为Executable?
- angular - ngModel 不能用于使用父 formGroup 指令 p-dropDown 注册表单控件(prime ng element !)
- mongodb - Mongodb 中的查询操作,在同一集合的两个查询之间从 find left outer 中查找前 10 行