pip - Airflow 1.10 安装失败
问题描述
我有一个使用 Airflow 版本 1.9 的工作 Airflow 环境,该环境在 Amazon EC2 实例上运行。我需要升级到最新版本的 Airflow,即 1.10。我可以选择从 1.9 版升级或在新服务器上全新安装 1.10。Airflow 1.10 版未在 Pip 上列出,因此我通过此命令从 Git 安装它,
pip-3.6 install git+git://github.com/apache/incubator-airflow.git@v1-10-stable
此命令成功安装 Airflow 1.10 版。您可以通过运行命令airflow version
并查看输出来看到,
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.0
当我尝试启动 Airflow 调度程序时,airflow scheduler
出现以下异常,
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,195] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,195] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
state = task.state
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
session = self.ResultSession()
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
**self.engine_options)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
engine, session = self.create_session(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
engine = self.get_engine(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
return create_engine(dburi, poolclass=NullPool)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
return strategy.create(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,196] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,196] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
state = task.state
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
session = self.ResultSession()
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
**self.engine_options)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
engine, session = self.create_session(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
engine = self.get_engine(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
return create_engine(dburi, poolclass=NullPool)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
return strategy.create(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,197] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,197] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
state = task.state
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
session = self.ResultSession()
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
**self.engine_options)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
engine, session = self.create_session(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
engine = self.get_engine(dburi, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
return create_engine(dburi, poolclass=NullPool)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
return strategy.create(*args^C[2018-08-14 14:03:16,424] {jobs.py:1585} INFO - Exited execute loop
[2018-08-14 14:03:16,433] {jobs.py:1599} INFO - Terminating child PID: 13615
这是我的 lib 文件夹的内容,
[/usr/local/lib/python3.6/site-packages]# cd /usr/local/lib64/python3.6/site-packages/sqlalchemy/
root@ip-1-2-3-4
[/usr/local/lib64/python3.6/site-packages/sqlalchemy]# ll
total 320
drwxr-xr-x 3 root root 4096 Aug 13 17:17 connectors
-rwxr-xr-x 1 root root 40456 Aug 13 17:17 cprocessors.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 1 root root 51408 Aug 13 17:17 cresultproxy.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x 1 root root 21944 Aug 13 17:17 cutils.cpython-36m-x86_64-linux-gnu.so
drwxr-xr-x 3 root root 4096 Aug 13 17:17 databases
drwxr-xr-x 10 root root 4096 Aug 13 17:17 dialects
drwxr-xr-x 3 root root 4096 Aug 13 17:17 engine
drwxr-xr-x 3 root root 4096 Aug 13 17:17 event
-rwxr-xr-x 1 root root 49746 Mar 6 14:01 events.py
-rwxr-xr-x 1 root root 12030 Mar 6 14:01 exc.py
drwxr-xr-x 4 root root 4096 Aug 13 17:17 ext
-rwxr-xr-x 1 root root 2249 Mar 6 14:01 __init__.py
-rwxr-xr-x 1 root root 3093 Mar 6 14:01 inspection.py
-rwxr-xr-x 1 root root 10967 Mar 6 14:01 interfaces.py
-rwxr-xr-x 1 root root 6712 Mar 6 14:01 log.py
drwxr-xr-x 3 root root 4096 Aug 13 17:17 orm
-rwxr-xr-x 1 root root 49883 Mar 6 14:01 pool.py
-rwxr-xr-x 1 root root 5217 Mar 6 14:01 processors.py
drwxr-xr-x 2 root root 4096 Aug 13 17:17 __pycache__
-rwxr-xr-x 1 root root 1200 Mar 6 14:01 schema.py
drwxr-xr-x 3 root root 4096 Aug 13 17:17 sql
drwxr-xr-x 5 root root 4096 Aug 13 17:17 testing
-rwxr-xr-x 1 root root 1713 Mar 6 14:01 types.py
drwxr-xr-x 3 root root 4096 Aug 13 17:17 util
root@ip-1-2-3-4
[/usr/local/lib64/python3.6/site-packages/sqlalchemy]# pwd
/usr/local/lib64/python3.6/site-packages/sqlalchemy
root@ip-1-2-3-4
[/usr/local/lib64/python3.6/site-packages/sqlalchemy]# cd /usr/local/lib/python3.6/site-packages/sqlalchemy/
bash: cd: /usr/local/lib/python3.6/site-packages/sqlalchemy/: No such file or directory
我只是很困惑为什么 Airflow 的安装没有处理它所需的所有依赖项。我是否错误地安装了 Airflow?我真的需要在 1.10 版上,因为 1.9 版中有一个重大错误,正如在此处和此处发现的那样。
解决方案
在进行全新安装时,可以提供许多安装附加功能(“可选依赖项”)。Airflow 默认情况下不会全部安装它们,因为有几十个,有些需要特殊的依赖项,如 Mesos 或 Kubernetes。
https://airflow.readthedocs.io/en/stable/installation.html#extra-packages
请注意,对于 1.10.0-1.10.2,您现在需要在安装命令前添加或导出此 env var:
export SLUGIFY_USES_TEXT_UNIDECODE=yes
1.10.3及更高版本不再需要此功能。
一旦 1.10 发布,您将能够安装这样的附加功能:
pip install apache-airflow[celery,devel,postgres]
从 git 安装时,安装 extras 的 pip 语法有点复杂:
pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[celery,devel,postgres]
如果您尝试安装支持 MySQL 的 Airflow,则可以包含以下内容mysql
:
pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[mysql]
如果您确实想安装所有附加功能,则可以使用all
附加功能:
pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[all]
注意:如果您之前apache-airflow
在 PyPI 上安装了 1.9 的任何附加功能,则在从 GitHub 安装 1.10 时需要在此处再次提供它们,因为 pip 不会将 GitHub 存储库与 PyPI 包关联。
问题
- 你在运行 Python 3.6.5 吗?
- 如果您
mysql
在安装时包含额外的内容,您是否仍然会收到相同的错误?
推荐阅读
- sql - 如何将所有列名放入一列?
- amazon-web-services - AWS elasticache redis集群在禁用集群模式时在线调整大小
- c# - 如何捕获防伪异常并重定向到错误页面
- html - 为什么我的页面元素不尊重我的本地 Site.css 文件?
- css - 导航栏 - 子元素不继承宽度和重叠
- python - 我的 django web 应用程序保留了内存中的旧图像数据。每次提交后如何清除?
- java - MultiLayerNetwork 的所有方法上的“无法解析符号”
- javascript - 如何从另一个 Vue 组件触发 Mapbox 事件?
- mrtk - MRTK (V2.5) 总是无法使用构建窗口部署到 Hololens 2
- algorithm - Number of unique elements only using equality comparisons