python - 如何根据标签训练/测试/拆分数据?
问题描述
如何根据标签将数据拆分为训练和测试数据集?标签是 1 和 0,我想将所有 1 用作训练数据集,将 0 用作测试数据集。csv 文件如下所示:
1 Pixar classic is one of the best kids' movies of all time.
1 Apesar de representar um imenso avanço tecnológico, a força do filme reside no carisma de seus personagens e no charme de sua história.
1 When Woody perks up in the opening scene, it's not only the toy cowboy who comes alive - we're watching the rebirth of an art form.
0 The humans are wooden, the computer-animals have that floating, jerky gait of animated fauna.
1 Introduced not one but two indelible characters to the pop culture pantheon: cowboy rag-doll Woody (Tom Hanks) and plastic space ranger Buzz Lightyear (Tim Allen). [Blu-ray]
1 it is easy to see how virtually everything that is good in animation right now has some small seed in Toy Story
0 All the effects in the world can't disguise the thin plot.
1 Though some of the animation seems dated compared to later Pixar efforts and not nearly as detailed, what's here is done impeccably well.
解决方案
通常你不想这样做,但是,以下解决方案可以工作。我尝试了一个非常小的数据框,但似乎可以完成这项工作。
import pandas as pd
Df = pd.DataFrame()
Df['label'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
Df['value'] = [1, 2, 3, 4, 5, 6, 7, 8]
Df
X = Df[Df.label== 'S']
Y = Df[Df.label == 'P']
from sklearn.model_selection import train_test_split
xtrain, ytrain = train_test_split(X, test_size=0.3,random_state=25, shuffle=True)
xtest, ytest = train_test_split(Y, test_size=0.3,random_state=25, shuffle=True)
我得到了以下结果
xtrain
label value
5 S 6
2 S 3
7 S 8
xtest
label value
6 P 7
3 P 4
ytest
label value
4 P 5
ytrain
label value
0 S 1
1 S 2
推荐阅读
- c++ - vscode代码静态分析时找不到'opencv2/opencv.hpp'文件
- winapi - WinAPI - 如何使所有者绘制按钮表现得像一个按钮?
- vim - 如何在vim中回到以前的模式
- django - 如何为 Django 编写 QuerySet 以获取模型计数
- javascript - 如何在 ReactJS 中将 blob 转换为 MP3?
- java - 错误:在类中找不到主要方法。请将主要方法定义为:public static void main(String[] args)
- postgresql - 插入语句中不存在的地方 - POSTGRESQL
- wordpress - 如何禁用模板中声明的 Gutenberg InnerBlock 元素的工具栏?
- python - 函数完成后的Python启动代码
- vhdl - VHDL 过程声明