首页 > 解决方案 > 使用 pyspark lib 构建 docker 映像时出现问题

问题描述

我正在尝试使用s2iJenkins 构建一个 docker 映像。requirement.txt我在文件中有以下依赖项

scikit-learn==0.21.2
scipy==0.18.1
pandas==0.24.2
seldon-core==0.3.0
pypandoc
pyspark==2.4.1

但是当我尝试安装 pyspark 并显示以下错误消息时,我的构建过程失败

Downloading https://repo.company.com/repository/pypi-all/packages/f2/64/a1df4440483df47381bbbf6a03119ef66515cf2e1a766d9369811575454b/pyspark-2.4.1.tar.gz (215.7MB)
Complete output from command python setup.py egg_info:
Could not import pypandoc - required to package PySpark
Download error on https://pypi.org/simple/pypandoc/: [Errno 97] Address 
family not supported by protocol -- Some packages may not be found!
Couldn't find index page for 'pypandoc' (maybe misspelled?)
Download error on https://pypi.org/simple/: [Errno 97] Address family not 
supported by protocol -- Some packages may not be found!
No local packages or working download links found for pypandoc
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-dra7nhke/pyspark/setup.py", line 224, in <module>
'Programming Language :: Python :: Implementation :: PyPy']
File "/usr/local/lib/python3.6/site-packages/setuptools/__init__.py", line 
144, in setup
_install_setup_requires(attrs)
File "/usr/local/lib/python3.6/site-packages/setuptools/__init__.py", line 
139, in _install_setup_requires
...

虽然我pypandoc之前已经pyspark在requirments.txt文件中列出了,但是看起来在安装pyspark的时候,pypandoc还没有安装,这是一个依赖。有什么问题?

标签: dockerjenkinspysparkpips2i

解决方案


我通过执行传递了这个错误:

pip install pypandoc

在安装 pyspark 之前。我尝试了很多在 requirements.txt 中使用 pypandoc==1.4 但它不起作用。

在那个来源人们使用相同的方式: https://hub.docker.com/r/takaomag/test-0/dockerfile https://www.ibm.com/support/knowledgecenter/el/SSWTQQ_2.0.3/install/t_si_pythonpackagesoffline .html 无法安装 pyspark


推荐阅读