python - 在 Numpy Python 中从 csv 文件读取数组时出错
问题描述
我在使用 numpy 读取 csv 文件的第一列时遇到问题。第一列的所有值都返回为nan
而不是[ 2. 4. 1120.]
等。
import genfromtxt from numpy
my_data = genfromtxt('input.csv', delimiter=',')
first_column = len(my_data[:,0]) - 1
在 csv 文件中:
[ 2. 4. 1120.],67.8,63.7,-676.1,-365.2,0.0,0.0,0.0,0.0,0.0,-608.3000000000001,-301.5
[ 2. 4.5 100. ],0.0,0.0,-0.30000000000000004,-0.7,0.0,0.0,99.7002,0.0,0.0,-0.30000000000000004,-0.7
[ 2. 4. 1130.],70.8,52.2,-672.7,-346.5,0.0,0.0,0.0,0.0,0.0,-601.9000000000001,-294.3
[ 2. 4.5 110. ],0.0,0.2,-0.7,-0.1,0.0,0.0,99.3010995,0.0,0.0,-0.7,0.1
解决方案
首先,您的导入语句是倒置的。它应该是:
from numpy import genfromtxt
其次,显然genfromtxt()
不能将字符串转换'[ 2. 4. 1120.]'
为浮点数,因为它与数组中的所有其他值一样,所以这就是它返回的原因nan
。同样的情况也发生在numpy.loadtxt()
.
不“丢失”这些值的选项可以通过以下方式读取 csv 文件pandas
:
import numpy as np
import pandas as pd
my_data = pd.read_csv('data.csv').to_numpy()
其中my_data
包含:
array([['[ 2. 4.5 100. ]', 0.0, 0.0, -0.30000000000000004, -0.7, 0.0,
0.0, 99.7002, 0.0, 0.0, -0.30000000000000004, -0.7],
['[ 2. 4. 1130.]', 70.8, 52.2, -672.7, -346.5, 0.0, 0.0, 0.0,
0.0, 0.0, -601.9000000000002, -294.3],
['[ 2. 4.5 110. ]', 0.0, 0.2, -0.7, -0.1, 0.0, 0.0,
99.3010995, 0.0, 0.0, -0.7, 0.1]], dtype=object)
尽管您仍然需要解析第一列上的每个值以将它们转换为 numpy 数组。为此,您可以使用,np.fromstring
但您需要避免使用括号字符才能使其按预期工作。
如果不避免使用括号,您将看到一条错误消息:
np.fromstring(my_data[:, 0], sep=' ')
<ipython-input-65-7d75c8d121f5>:1: DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
np.fromstring(my_data[:, 0], sep=' ')
不幸的是,为了避免使用括号,您需要循环数组:
for i, row in enumerate(my_data[:, 0]):
my_data[i, 0] = np.fromstring(data[i, 0][1:-1], sep=' ').astype(np.float32)
通过使用[1:-1]
, 进行索引是在将值传递给之前“删除”括号字符np.fromstring
。
之后,my_data
将在第一列中包含 numpy 数组:
array([[array([ 2. , 4.5, 100. ], dtype=float32), 0.0, 0.0,
-0.30000000000000004, -0.7, 0.0, 0.0, 99.7002, 0.0, 0.0,
-0.30000000000000004, -0.7],
[array([ 2., 4., 1130.], dtype=float32), 70.8, 52.2, -672.7,
-346.5, 0.0, 0.0, 0.0, 0.0, 0.0, -601.9000000000002, -294.3],
[array([ 2. , 4.5, 110. ], dtype=float32), 0.0, 0.2, -0.7,
-0.1, 0.0, 0.0, 99.3010995, 0.0, 0.0, -0.7, 0.1]], dtype=object)
所以第一列会有:
print(my_data[:, 0])
array([array([ 2. , 4.5, 100. ], dtype=float32),
array([ 2., 4., 1130.], dtype=float32),
array([ 2. , 4.5, 110. ], dtype=float32)], dtype=object)
虽然是一个精心设计的解决方案,但它确实有效。也许有一种更好或更简单的方法,而无需循环数组即可进行转换。
推荐阅读
- conditional-statements - 在 lldb 中设置条件断点
- python - 在 Django 中使用命令行设置密钥以在 Windows 上进行销售或安装
- laravel - How do you set default attribute values for a custom pivot table?
- flutter - Flutter bottomNavigationBar(仅更改正文部分)
- angular - How to use the activated route's data DIRECTLY in the HTML template containing the RouterOutlet?
- python - Linked list creating a new node for an item that already exists in list
- batch-file - An unknown issue when trying to print a Sub-string in Batch script
- python - Defining a custom type that works with `typing.get_type_hints()`
- angular - 如何与 *ngfor 循环的角度组件通信
- javascript - Add dynamic overlays from array