首页 > 解决方案 > Adding column to numpy array based on if/then of data in array

问题描述

I have a multidimensional numpy array like so:

np.array([("a",1,"x"),("b",2,"y"),("c",1,"z")])

I need to create fourth "column" to the array based on an if then of the 2nd column for example.

If [:,2] == 1 then newcolumn = 'Wow' else 'Dud'

So that it returns something like:

[("a",1,"x","Wow"),("b",2,"y","Dud"),("c",1,"z","Wow")]

As I'm going to be processing around 100 million rows of data speed is of the essence here.

Thanks in advance for any help.

标签: pythonnumpy

解决方案


Try pandas

>> import pandas as pd
>> df = pd.DataFrame([("a",1,"x"),("b",2,"y"),("c",1,"z")], columns=['col1', 'col2', 'col3'])
df
  col1  col2 col3
0    a     1    x
1    b     2    y
2    c     1    z

create a function to operate on rows (doesn't have to be a lambda), and use apply on axis=1 (rows). This will give you the new column.

>> b = lambda row: "Wow" if row['col2'] == 1 else "Dud" 
>> new_col = df.apply(b, axis=1)
new_col
0    Wow
1    Dud
2    Wow
dtype: object

add your new column to the dataframe.

>> df['new_col'] = new_col
df
  col1  col2 col3 new_col
0    a     1    x     Wow
1    b     2    y     Dud
2    c     1    z     Wow

and convert back to list of tuples

tuples = [tuple(x) for x in df[['col1','col2','col3','new_col']].to_numpy()]
[('a', 1, 'x', 'Wow'), ('b', 2, 'y', 'Dud'), ('c', 1, 'z', 'Wow')]

Suggestion: Don't use lists of tuples. Do use dataframes. Let alone for large data.


推荐阅读