python - Adding column to numpy array based on if/then of data in array
问题描述
I have a multidimensional numpy array like so:
np.array([("a",1,"x"),("b",2,"y"),("c",1,"z")])
I need to create fourth "column" to the array based on an if then of the 2nd column for example.
If [:,2] == 1
then newcolumn = 'Wow' else 'Dud'
So that it returns something like:
[("a",1,"x","Wow"),("b",2,"y","Dud"),("c",1,"z","Wow")]
As I'm going to be processing around 100 million rows of data speed is of the essence here.
Thanks in advance for any help.
解决方案
Try pandas
>> import pandas as pd
>> df = pd.DataFrame([("a",1,"x"),("b",2,"y"),("c",1,"z")], columns=['col1', 'col2', 'col3'])
df col1 col2 col3 0 a 1 x 1 b 2 y 2 c 1 z
create a function to operate on rows (doesn't have to be a lambda), and use apply on axis=1
(rows). This will give you the new column.
>> b = lambda row: "Wow" if row['col2'] == 1 else "Dud"
>> new_col = df.apply(b, axis=1)
new_col 0 Wow 1 Dud 2 Wow dtype: object
add your new column to the dataframe.
>> df['new_col'] = new_col
df col1 col2 col3 new_col 0 a 1 x Wow 1 b 2 y Dud 2 c 1 z Wow
and convert back to list of tuples
tuples = [tuple(x) for x in df[['col1','col2','col3','new_col']].to_numpy()]
[('a', 1, 'x', 'Wow'), ('b', 2, 'y', 'Dud'), ('c', 1, 'z', 'Wow')]
Suggestion: Don't use lists of tuples. Do use dataframes. Let alone for large data.
推荐阅读
- jenkins - Jenkins中的黄瓜执行顺序
- c - 我在第 210 行和第 135 行收到未初始化的局部变量错误
- c++ - STL 按客户“<”运算符对向量进行排序。为什么要将'<'运算符定义为const?
- python - 使用现有列标题将 Dataframe1 行合并到 Dataframe2 - Python Pandas
- docker - 如何配置 DotNet Core 3.0 应用程序以接收来自 NGINX 和 Docker 的 HTTPS 请求
- docker - 运行 docker-compose up 命令时我无法收到错误消息
- javascript - Mongoose 查询以提取每个对话的最新文档
- javascript - 读取操作的 React Firebase 澄清问题
- fortran - 我可以安全地使用`spread`从数组切片复制到同一个数组吗?
- powerbi - 如何使用 Power BI 中的 LinkedIn API 连接到 LinkedIn?