首页 > 解决方案 > 可以在 vaex 数据框上使用功能工具吗?

问题描述

我正在尝试使用自动化特征工程——我已经让它在原始数据帧上工作,但我不确定在内存不足的数据帧上做它,比如 vaex。我的目的是找到一种在数据框超出内存时使用自动化特征工程的方法。

我想知道有没有人成功过?这是我正在做的/代码:

#playing with vaex
#install items
# !pip install vaex
# !pip install --upgrade ipython
# !pip install numpy --upgrade
#if using colab you may have to restart your runtime
!pip install featuretools

#import items
import featuretools as ft
import vaex
import pandas as pd
vaex.multithreading.thread_count_default = 8
import vaex.ml

# Load the titanic dataset
df = vaex.ml.datasets.load_titanic()

# See the description
df.info()

# let's try to use featuretools to scale out the features on vaex
es = ft.EntitySet(id = 'titanic_data')
es = es.entity_from_dataframe(entity_id = 'df', dataframe = df.drop(['Survived']), 
                              variable_types = 
                              {
                                  'Embarked': ft.variable_types.Categorical,
                                  'Sex': ft.variable_types.Boolean,
                                  'Title': ft.variable_types.Categorical,
                                  'Family_Size': ft.variable_types.Numeric,
                                  'LastName': ft.variable_types.Categorical
                              },
                              index = 'PassengerId')

我收到此错误:

KeyError                                  Traceback (most recent call last)
<ipython-input-6-55607b93fccd> in <module>
      1 # let's try to use featuretools to scale out the features on vaex
      2 es = ft.EntitySet(id = 'titanic_data')
----> 3 es = es.entity_from_dataframe(entity_id = 'df', dataframe = df.drop(['Survived']), 
      4                               variable_types =
      5                               {

1 frames
/usr/local/lib/python3.7/dist-packages/vaex/dataframe.py in drop(self, columns, inplace, check)
   4758                 df._hide_column(column)
   4759             else:
-> 4760                 df._real_drop(column)
   4761         return df
   4762 

/usr/local/lib/python3.7/dist-packages/vaex/dataframe.py in _real_drop(self, item)
   4733             self.column_names.remove(name)
   4734         else:
-> 4735             raise KeyError('no such column or virtual_columns named %r' % name)
   4736         self.signal_column_changed.emit(self, name, "delete")
   4737         if hasattr(self, name):

KeyError: "no such column or virtual_columns named 'Survived'"

有可能做我正在做的事情吗?我的方法有错误吗?或者以另一种方式来做?

标签: pythonpandasfeaturetoolsvaex

解决方案


目前,Featuretools 不适用于 vaex 数据框。对 dask 或 koalas 数据帧有一些支持。


推荐阅读