首页 > 解决方案 > 从数据透视表 pandas 中提取较小的表

问题描述

我想将以下数据透视表拆分为训练和测试集(以评估推荐系统),并考虑提取两个具有非重叠索引(userID)和列值(ISBN)的表。我怎样才能正确拆分它?谢谢你。

在此处输入图像描述

标签: pythonpandaspivot-tabletrain-test-split

解决方案


As suggested by @moys, can use train_test_split from scikit-learn after splitting your dataframe columns first for the non-overlapping column names.

Example:

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split

Generate data:

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Split df columns in some way, eg half:

cols = int(len(df.columns)/2) df_A = df.iloc[:, 0:cols] df_B = df.iloc[:, cols:]

Use train_test_split:

train_A, test_A = train_test_split(df_A, test_size=0.33) train_B, test_B = train_test_split(df_B, test_size=0.33)


推荐阅读