首页 > 解决方案 > 如何将列添加到 numpy recarry

问题描述

我想用重新格式化的日期向我的 numpy recarray 添加额外的原始数据。

我有一个csv:

<DATE>  <TIME>  <OPEN>  <HIGH>  <LOW>   <CLOSE> <TICKVOL>   <VOL>   <SPREAD>
2020.08.17  00:00:00    44.920  44.920  44.900  44.910  4   0   10
2020.08.17  00:01:00    44.910  44.910  44.850  44.860  10  0   10
2020.08.17  00:02:00    44.860  44.870  44.860  44.860  3   0   10
2020.08.17  00:03:00    44.860  44.860  44.850  44.850  2   0   10

我的代码:

def datetostr(datenp):
    ts = pd.to_datetime(str(datenp)) 
    d = ts.strftime('%Y.%m.%d %H:%M:%S')
    return d

colnames = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Tickvol', 'Vol', 'Spread']
stocks = pd.read_csv(infile, sep='\t', parse_dates=[['Date', 'Time']], header=0, names=colnames).to_records(index=False)
plotly_date = np.array([datetostr(xi) for xi in stocks['Date_Time']])

在股票数组中:

('Date_Time', 'Open', 'High', 'Low', 'Close', 'Tickvol', 'Vol', 'Spread')
initial_array :  [('2020-08-14T00:00:00.000000000', 44.96, 45.  , 44.94, 44.97, 14, 0, 10)
 ('2020-08-14T00:01:00.000000000', 44.97, 44.99, 44.92, 44.95, 19, 0, 10)
 ('2020-08-14T00:02:00.000000000', 44.94, 44.94, 44.89, 44.91, 16, 0, 10)

在 plotly_date 中:

plotly_date_array :  ['2020.08.14 00:00:00' '2020.08.14 00:01:00' '2020.08.14 00:02:00' ...
 '2020.08.18 20:57:00' '2020.08.18 20:58:00' '2020.08.18 20:59:00']

我想用 textformat 数据向股票添加一个新列,存储在 plotly_date

result = np.column_stack((stocks, plotly_date)) 

它给我一个错误:

TypeError:无效的类型提升

我做错了什么?以及如何正确添加一个名为“日期”的新列?

标签: pythonnumpy

解决方案


# convert plotly_date from an ndarray into a recarray
plotly_date_rec = np.core.records.fromarrays(plotly_date.reshape((1, 4)), names='pd', formats='<U19')

# create a new dtype, with stocks dtype + plotly_date_rec dtype
new_dt = np.dtype(stocks.dtype.descr + [('pd', '<U19')])

# create an empty results recarray filled with zeros
result = np.zeros(stocks.shape, dtype=new_dt)

# fill the zeros with data from stocks
for name in stocks.dtype.names:
    result[name] = stocks[name]

# add the plotly_date_rec data
result['pd'] = plotly_date_rec['pd']

# print(result)
array([('2020-08-17T00:00:00.000000000', 44.92, 44.92, 44.9 , 44.91,  4, 0, 10, '2020.08.17 00:00:00'),
       ('2020-08-17T00:01:00.000000000', 44.91, 44.91, 44.85, 44.86, 10, 0, 10, '2020.08.17 00:01:00'),
       ('2020-08-17T00:02:00.000000000', 44.86, 44.87, 44.86, 44.86,  3, 0, 10, '2020.08.17 00:02:00'),
       ('2020-08-17T00:03:00.000000000', 44.86, 44.86, 44.85, 44.85,  2, 0, 10, '2020.08.17 00:03:00')],
      dtype=[('Date_Time', '<M8[ns]'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Tickvol', '<i8'), ('Vol', '<i8'), ('Spread', '<i8'), ('pd', '<U19')])

使用熊猫

  • 这更容易
# create dataframe
colnames = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Tickvol', 'Vol', 'Spread']
stocks = pd.read_csv('test.csv', sep='\\s+', parse_dates=[['Date', 'Time']], header=0, names=colnames)

# add plotly_dates column
stocks['plotly_date'] = stocks.Date_Time.dt.strftime('%Y.%m.%d %H:%M:%S')

# create a numpy recarray of the dataframe with all columns
result = stocks.to_records(index=False)

# create a numpy recarray of the dataframe without Date_Time
results = stocks.iloc[:, 1:].to_records(index=False)  # optional depending on your needs


# print(result)  # shown with all columns

rec.array([('2020-08-17T00:00:00.000000000', 44.92, 44.92, 44.9 , 44.91,  4, 0, 10, '2020.08.17 00:00:00'),
           ('2020-08-17T00:01:00.000000000', 44.91, 44.91, 44.85, 44.86, 10, 0, 10, '2020.08.17 00:01:00'),
           ('2020-08-17T00:02:00.000000000', 44.86, 44.87, 44.86, 44.86,  3, 0, 10, '2020.08.17 00:02:00'),
           ('2020-08-17T00:03:00.000000000', 44.86, 44.86, 44.85, 44.85,  2, 0, 10, '2020.08.17 00:03:00')],
          dtype=[('Date_Time', '<M8[ns]'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Tickvol', '<i8'), ('Vol', '<i8'), ('Spread', '<i8'), ('plotly_date', 'O')])

推荐阅读