首页 > 解决方案 > 为什么实体索引显示为 id 而不是索引

问题描述

下面定义了一个EntitySet。我已在事务表上声明did为,但它注册为,而不是。这是为什么?IndextxIdIndex

目标是删除下面的警告。

在什么情况下,Index分配将被覆盖为(主键与外部键?),并且注册为与警告相关Id的事实?didId

一个表中uid可以有多个dids tx

es = ft.EntitySet(id="the_entity_set")

# hse
es = es.entity_from_dataframe(entity_id="hse",
                              dataframe=hse,
                              index="uid",
                              variable_types={"Gender": ft.variable_types.Categorical,
                                              "Income": ft.variable_types.Numeric,
                                              "dob"   : ft.variable_types.Datetime})

# types
es = es.entity_from_dataframe(entity_id="types",
                              dataframe=types,
                              index="type_id",
                              variable_types={"type": ft.variable_types.Categorical})

# files
es = es.entity_from_dataframe(entity_id="files",
                              dataframe=files,
                              index="file_id",
                              variable_types={"file": ft.variable_types.Categorical})

# uid_donations
es = es.entity_from_dataframe(entity_id="uid_txlup",
                              dataframe=uid_txlup,
                              index="did",
                              variable_types={"uid": ft.variable_types.Categorical})

# transactions
es = es.entity_from_dataframe(entity_id="tx",
                              dataframe=tx,
                              index="did",
                              time_index="dt",
                              variable_types={"file_id": ft.variable_types.Categorical,
                                              "type_id": ft.variable_types.Categorical,
                                              "amt":     ft.variable_types.Numeric})

rels = [
    ft.Relationship(es["files"]["file_id"],es["tx"]["file_id"]),
    ft.Relationship(es["types"]["type_id"],es["tx"]["type_id"]),
    ft.Relationship(es["hse"]["uid"],      es["uid_txlup"]["uid"]),
    ft.Relationship(es["uid_txlup"]["did"],es["tx"]["did"])
]

es.add_relationships( rels )

这就是 EntitySet 的样子

Entityset: the_entity_set
  Entities:
    hse [Rows: 100, Columns: 4]
    types [Rows: 8, Columns: 2]
    files [Rows: 2, Columns: 2]
    uid_txlup [Rows: 336, Columns: 2]
    tx [Rows: 336, Columns: 5]
  Relationships:
    tx.file_id -> files.file_id
    tx.type_id -> types.type_id
    uid_txlup.uid -> hse.uid
    tx.did -> uid_txlup.did


es.entities

[Entity: hse
   Variables:
     uid (dtype: index)
     Gender (dtype: categorical)
     Income (dtype: numeric)
     dob (dtype: datetime)
   Shape:
     (Rows: 100, Columns: 4), Entity: types
   Variables:
     type_id (dtype: index)
     type (dtype: categorical)
   Shape:
     (Rows: 8, Columns: 2), Entity: files
   Variables:
     file_id (dtype: index)
     file (dtype: categorical)
   Shape:
     (Rows: 2, Columns: 2), Entity: uid_txlup
   Variables:
     did (dtype: index)
     uid (dtype: categorical)
   Shape:
     (Rows: 336, Columns: 2), Entity: tx
   Variables:
     did (dtype: id)            ### <<< external key ???
     dt (dtype: datetime)
     file_id (dtype: categorical)
     type_id (dtype: categorical)
     amt (dtype: numeric)
   Shape:
     (Rows: 336, Columns: 5)]

为什么我打电话时did显示为Id而不是?Indexfts

这是警告:

feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_entity="hse",
                                      agg_primitives=["sum","mode","percent_true"],
                                      where_primitives=["count", "avg_time_between"],
                                      max_depth=2)

feature_defs


.../anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/entityset/entityset.py:432: FutureWarning: 'did' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
  end_entity_id=child_eid)

标签: featuretools

解决方案


实体集中的关系总是在Id父实体中的Index变量和子实体中的变量之间。Index因此,无论您指定什么,当您添加关系时,featuretools 都会自动将变量从子实体转换为类型。

如果实体之间存在一对一关系,则变量可能既是 anIndex又是 an 。Id在这种情况下,您应该将两个实体合二为一。


推荐阅读