python - Pandas 数据到 pytorch 张量
问题描述
我正在尝试将 pandas 数据帧转换为 pytorch 张量以运行 LSTM 模型,但我不断收到以下错误消息,指出存在值错误并且无法确定对象类型“系列”的形状。然后它引用以下代码:
class MicroESDataset(Dataset):
def __init__(self, sequences):
self.sequences = sequences
def __len__(self):
return len(self.sequences)
def __getitem__(self, idx):
sequence, label = self.sequences[idx]
return dict (
sequence=torch.Tensor(sequence.to_numpy()),
label = torch.tensor(label).float ()
)
我错过了一些完全明显的东西吗?谢谢
这是确切的错误消息和回溯:
ValueError Traceback (most recent call last)
<ipython-input-46-fb5c7eb803e1> in <module>()
----> 1 for item in data_module.train_dataloader():
2 print(item["sequence"].shape)
3 print(item["label"].shape)
4 # print(item["label"])
5 break
3 frames
/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
427 # have message field
428 raise self.exc_type(message=msg)
--> 429 raise self.exc_type(msg)
430
431
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-30-36c44aae196d>", line 13, in __getitem__
label = torch.tensor(label).float()
ValueError:无法确定对象类型“系列”的形状
解决方案
2列
首先,idx
inDataset
应该是指 row inside pd.DataFrame
。
从中获取行的方法是df.iloc[idx]
代替[idx]
(它将获取索引指定的列,这可能不是你想要的,如果是你应该转置你的数据)。
鉴于此,我们可以这样做(pd.DataFrame
只有2
列的虚拟,请参阅代码注释):
import pandas as pd
import torch
class MicroESDataset(torch.utils.data.Dataset):
def __init__(self):
# Dummy sequences dataframe
self.sequences = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
def __len__(self):
return len(self.sequences)
def __getitem__(self, idx):
sequence, label = self.sequences.iloc[idx]
return dict(
# torch.tensor infers dtype, torch.Tensor is always float
sequence=torch.tensor(sequence),
label=torch.tensor(label).float(),
)
dataset = MicroESDataset()
print(dataset[0])
更多专栏
如果您有更多列(假设series
可能是指多个值),您必须:
- 先得到行
- 按适当的列切片
鉴于上述一个可以做到(在这种情况下4
,列,最后一个是标签,请参阅代码注释):
class MicroESDataset(torch.utils.data.Dataset):
def __init__(self):
# Dummy sequences dataframe
self.sequences = pd.DataFrame(
{"col1": [1, 2], "col2": [3, 4], "col3": [5, 6], "col4": [7, 8]}
)
def __len__(self):
return len(self.sequences)
def __getitem__(self, idx):
# No magic unpacking here!
row = self.sequences.iloc[idx]
# Now only columns are left and we can slice with the indices
# One could also slice using : "col3", but I think this is better in ur case
sequence, label = row.iloc[:-1], row.iloc[-1]
return dict(
sequence=torch.tensor(sequence),
label=torch.tensor(label).float(),
)
dataset = MicroESDataset()
print(dataset[0])
推荐阅读
- html - SVG / CSS3 Transition issue in IE11 and Edge
- apache-kafka - KeeperErrorCode NoNode for /config/changes/isr_change_0000XXXX
- multithreading - 在 go 例程中更新后未返回更新的值
- java - java 程序如何发现自己经历了长时间的 GC 暂停?
- python - JSON file will not completely load into a python file
- javascript - 我在为 vue 数据对象赋值时遇到了一些问题
- erlang - 在更好的erlang中,处理字典或其状态
- python - How do I change all column values into a same value in Python's 2D-List?
- database - 抢劫数据库最大区别查询与sqlite
- inno-setup - 使用 Inno-Setup 将特定任务的文件编译为附加安装文件(setup.exe、files.bin)?