首页 > 解决方案 > dask 中的计算()不起作用

问题描述

我正在 Dask 中尝试一个简单的并行计算。这是我的代码。

  import time
  import dask as dask
  import dask.distributed as distributed
  import dask.dataframe as dd
  import dask.delayed as delayed
  from dask.distributed import Client,progress

  client = Client('localhost:8786')
  df = dd.read_csv('file.csv')
  ddf = df.groupby(['col1'])[['col2']].sum() 
  ddf = ddf.compute()
  print ddf

从文档中看起来很好,但在运行时我得到了这个:

    Traceback (most recent call last):
    File "dask_prg1.py", line 17, in <module>
    ddf = ddf.compute()
    File "/usr/local/lib/python2.7/site-packages/dask/base.py", line 156, in compute
   (result,) = compute(self, traverse=False, **kwargs)
    File "/usr/local/lib/python2.7/site-packages/dask/base.py", line 402, in compute
   results = schedule(dsk, keys, **kwargs)
   File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 2159, in get
direct=direct)
  File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 1562, in gather
asynchronous=asynchronous)
 File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 652, in sync
return sync(self.loop, func, *args, **kwargs)
 File "/usr/local/lib/python2.7/site-packages/distributed/utils.py", line 275, in sync
six.reraise(*error[0])
 File "/usr/local/lib/python2.7/site-packages/distributed/utils.py", line 260, in f
result[0] = yield make_coro()
   File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
 File "/usr/local/lib/python2.7/site-packages/tornado/concurrent.py", line 260, in result
raise_exc_info(self._exc_info)
 File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
 File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 1439, in _gather
traceback)
File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 122, in read_block_from_file
with lazy_file as f:
File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 166, in __enter__
f = SeekableFile(self.fs.open(self.path, mode=mode))
 File "/usr/local/lib/python2.7/site-packages/dask/bytes/local.py", line 58, in open
return open(self._normalize_path(path), mode=mode)
 IOError: [Errno 2] No such file or directory: 'file.csv'

我不明白出了什么问题。请帮我解决这个问题。提前谢谢你。

标签: dataframedaskdask-distributed

解决方案


您可能希望将绝对文件路径传递给read_csv. 原因是,您正在将打开和读取文件的工作交给 dask 工作人员,并且您可能没有开始使用与脚本/会话相同的工作目录。


推荐阅读