dataframe - dask 中的计算()不起作用
问题描述
我正在 Dask 中尝试一个简单的并行计算。这是我的代码。
import time
import dask as dask
import dask.distributed as distributed
import dask.dataframe as dd
import dask.delayed as delayed
from dask.distributed import Client,progress
client = Client('localhost:8786')
df = dd.read_csv('file.csv')
ddf = df.groupby(['col1'])[['col2']].sum()
ddf = ddf.compute()
print ddf
从文档中看起来很好,但在运行时我得到了这个:
Traceback (most recent call last):
File "dask_prg1.py", line 17, in <module>
ddf = ddf.compute()
File "/usr/local/lib/python2.7/site-packages/dask/base.py", line 156, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/lib/python2.7/site-packages/dask/base.py", line 402, in compute
results = schedule(dsk, keys, **kwargs)
File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 2159, in get
direct=direct)
File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 1562, in gather
asynchronous=asynchronous)
File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 652, in sync
return sync(self.loop, func, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/distributed/utils.py", line 275, in sync
six.reraise(*error[0])
File "/usr/local/lib/python2.7/site-packages/distributed/utils.py", line 260, in f
result[0] = yield make_coro()
File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/usr/local/lib/python2.7/site-packages/tornado/concurrent.py", line 260, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 1439, in _gather
traceback)
File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 122, in read_block_from_file
with lazy_file as f:
File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 166, in __enter__
f = SeekableFile(self.fs.open(self.path, mode=mode))
File "/usr/local/lib/python2.7/site-packages/dask/bytes/local.py", line 58, in open
return open(self._normalize_path(path), mode=mode)
IOError: [Errno 2] No such file or directory: 'file.csv'
我不明白出了什么问题。请帮我解决这个问题。提前谢谢你。
解决方案
您可能希望将绝对文件路径传递给read_csv
. 原因是,您正在将打开和读取文件的工作交给 dask 工作人员,并且您可能没有开始使用与脚本/会话相同的工作目录。
推荐阅读
- python-3.x - 如何在 sys.argv[] 代码中传递参数-Python
- reactjs - Office-JS 缺少更多 TypeScript 定义
- python - 如何减少 matplotlib 图上显示的 xticks 数量?
- python - 想知道为什么这两个的结果不同?
- ios - ARKit 从屏幕平移创建节点
- powershell - 禁用拆分隧道
- java - “SQLServerException:字符串或二进制数据将被截断”问题 - 已配置 H2 DB 进行诊断,但列大小更改未生效?
- c# - DateTime.ParseExact 引发以下错误“字符串未被识别为有效的日期时间”
- java - 使用正则表达式断言 RestAssured 响应正文
- css - SVG 文件在 Safari 和移动 Safari 中模糊