python - 训练和测试中的 Json 数据拆分
问题描述
我正在尝试使 CNN 适合 huffpost 新闻数据集https://www.kaggle.com/rmisra/news-category-dataset。我使用的数据集是 json 格式。我的数据格式是这样的
[{"category": "CRIME", "headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV", "authors": "Melissa Jeltsen", "link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89", "short_description": "She left her husband. He killed their children. Just another day in America.", "date": "2018-05-26"} , {"category": "ENTERTAINMENT", "headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song", "authors": "Andy McDonald", "link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201", "short_description": "Of course it has a song.", "date": "2018-05-26"} ]
这是我正在尝试的代码代码源是https://www.kaggle.com/kredy10/simple-lstm-for-text-classification 我想在这些数据上拟合 LSTM
import pandas as pd
import json
with open('News_Category_Dataset_v2.json', 'r') as f:
train = json.load(f)
现在我想泄露训练和测试数据,但我不知道如何使用数组来拆分数据..有人可以帮忙吗?
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.15)
解决方案
我是这样做的:我首先使用 train_test_split 设置训练(70%)和测试(30%),然后在测试中使用相同的命令设置测试(50%)和验证(50%)。
from sklearn.model_selection import train_test_split
with open('file_name') as f:
lines = f.readlines()
train, test = train_test_split(lines, test_size=0.3)
val, test = train_test_split(test, test_size=0.5)
希望这可以帮助!
推荐阅读
- c - C - 如何摆脱内存泄漏?
- python-3.x - 如何在 matplotlib 的水平条形图顶部添加标签?
- android - 旋转后无法恢复recyclerview滚动位置
- postgresql - org.postgresql.util.PSQLException:致命:从 Spring Boot 连接到 postgress 时用户“postgres”的密码身份验证失败
- javascript - React - 遍历数组和事件
- mongodb - 字符串正则表达式 vs 数组产品目录 mongodb 性能
- python - 如何在 python pandas 中使用数据透视表进行以下输入?
- reactjs - onAuthStateChanged firebase auth 创建用户两次 React Nextjs
- vb.net - Tabcontrol.Pages 来自 GUI 设计器后,如何在 Tabcontrol 上动态添加复选框元素?
- r - R H2O - 损失与时代