首页 > 解决方案 > 在 Numpy Python 中从 csv 文件读取数组时出错

问题描述

我在使用 numpy 读取 csv 文件的第一列时遇到问题。第一列的所有值都返回为nan而不是[ 2. 4. 1120.]等。

import genfromtxt from numpy 
my_data = genfromtxt('input.csv', delimiter=',')
first_column = len(my_data[:,0]) - 1 

在 csv 文件中:

[   2.    4. 1120.],67.8,63.7,-676.1,-365.2,0.0,0.0,0.0,0.0,0.0,-608.3000000000001,-301.5
[  2.    4.5 100. ],0.0,0.0,-0.30000000000000004,-0.7,0.0,0.0,99.7002,0.0,0.0,-0.30000000000000004,-0.7
[   2.    4. 1130.],70.8,52.2,-672.7,-346.5,0.0,0.0,0.0,0.0,0.0,-601.9000000000001,-294.3
[  2.    4.5 110. ],0.0,0.2,-0.7,-0.1,0.0,0.0,99.3010995,0.0,0.0,-0.7,0.1

标签: pythonarraysnumpycsvfile

解决方案


首先,您的导入语句是倒置的。它应该是:

from numpy import genfromtxt

其次,显然genfromtxt()不能将字符串转换'[ 2. 4. 1120.]'为浮点数,因为它与数组中的所有其他值一样,所以这就是它返回的原因nan。同样的情况也发生在numpy.loadtxt().

不“丢失”这些值的选项可以通过以下方式读取 csv 文件pandas

import numpy as np
import pandas as pd

my_data = pd.read_csv('data.csv').to_numpy()

其中my_data包含:

array([['[  2.    4.5 100. ]', 0.0, 0.0, -0.30000000000000004, -0.7, 0.0,
        0.0, 99.7002, 0.0, 0.0, -0.30000000000000004, -0.7],
       ['[   2.    4. 1130.]', 70.8, 52.2, -672.7, -346.5, 0.0, 0.0, 0.0,
        0.0, 0.0, -601.9000000000002, -294.3],
       ['[  2.    4.5 110. ]', 0.0, 0.2, -0.7, -0.1, 0.0, 0.0,
        99.3010995, 0.0, 0.0, -0.7, 0.1]], dtype=object)

尽管您仍然需要解析第一列上的每个值以将它们转换为 numpy 数组。为此,您可以使用,np.fromstring但您需要避免使用括号字符才能使其按预期工作。

如果不避免使用括号,您将看到一条错误消息:

np.fromstring(my_data[:, 0], sep=' ')
<ipython-input-65-7d75c8d121f5>:1: DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
  np.fromstring(my_data[:, 0], sep=' ')

不幸的是,为了避免使用括号,您需要循环数组:

for i, row in enumerate(my_data[:, 0]):
    my_data[i, 0] = np.fromstring(data[i, 0][1:-1], sep=' ').astype(np.float32)

通过使用[1:-1], 进行索引是在将值传递给之前“删除”括号字符np.fromstring

之后,my_data将在第一列中包含 numpy 数组:

array([[array([  2. ,   4.5, 100. ], dtype=float32), 0.0, 0.0,
        -0.30000000000000004, -0.7, 0.0, 0.0, 99.7002, 0.0, 0.0,
        -0.30000000000000004, -0.7],
       [array([   2.,    4., 1130.], dtype=float32), 70.8, 52.2, -672.7,
        -346.5, 0.0, 0.0, 0.0, 0.0, 0.0, -601.9000000000002, -294.3],
       [array([  2. ,   4.5, 110. ], dtype=float32), 0.0, 0.2, -0.7,
        -0.1, 0.0, 0.0, 99.3010995, 0.0, 0.0, -0.7, 0.1]], dtype=object)

所以第一列会有:

print(my_data[:, 0])
array([array([  2. ,   4.5, 100. ], dtype=float32),
       array([   2.,    4., 1130.], dtype=float32),
       array([  2. ,   4.5, 110. ], dtype=float32)], dtype=object)

虽然是一个精心设计的解决方案,但它确实有效。也许有一种更好或更简单的方法,而无需循环数组即可进行转换。


推荐阅读