python - How to apply a masked array to a very large JSON fast
问题描述
The Data
I am currently working on very large JSON files formated as such
{key: [1000+ * arrays of length 241],
key2: [1000+ * arrays of length 241],
(...repeat 5-8 times...)}
The data is structured in a way that the nth element in each key's array belongs to the nth entity. Think about it as each key being a descriptor such as 'height' or 'pressure'. And therefore to get an entity's 'height' and 'pressure' you would access the entities index n in all the arrays. Therefore all the key's arrays are the same length Z
This, as you can imagine, is a pain to work with as a whole. Therefore, whenever I perform any data manipulation I return a masked array of length Z populated with 1's and 0's. 1 means the data in that index in every key is to be kept and 0 means it should be omitted)
The Problem
Once all of my data manipulation has been performed I need to apply the masked array to the data to return a copy of the original JSON data but where the length of each key's array Z is equal to the number of 1's in the masked array (If the element in the masked array at index n is a 0 then the element in index n will be removed from all of the json key's arrays and vice versa)
My attempt
# mask: masked array
# d: data to apply the mask to
def apply_mask(mask, d):
keys = d.keys()
print(keys)
rem = [] #List of index to remove
for i in range(len(mask)):
if mask[i] == 0:
rem.append(i) #Populate 'rem'
for k in keys:
d[k] = [elem for elem in d[k] if not d[k].index(elem) in rem]
return d
This works as intended but takes a while on such large JSON data
Question
I hope everything above was clear and helps you to understand my question:
Is there a more optimal/quicker way to apply a masked array to data such as this shown above?
Cheers
解决方案
This is going to be slow because
d[k] = [elem for elem in d[k] if not d[k].index(elem) in rem]
is completely recreating the inner list every time.
Since you're already modifying d
in-place, you could just delete the respective elements:
def apply_mask(mask, d):
for i, keep in enumerate(mask):
if not keep:
for key in d:
del d[key][i - len(mask)]
return d
(Negative indices i - len(mask)
are being used because positive indices don't work anymore if the list has already changed its length due to previously removed elements.)
推荐阅读
- javascript - 如何将 Fetch 调用的输出分配给变量 (JavaScript)
- typescript - 未捕获的语法错误:请求的模块“https://deno.land/std/uuid/mod.ts”未提供名为“v4”的导出
- excel - 在协作文档中分配特定列
- android - 我可以看到前台服务正在运行,但在“运行服务选项卡”中我只能看到一个进程正在运行
- javascript - 我可以通过 html 中的脚本标记属性将数组对象传递给 javascript 吗?
- apache-flink - Flink:处理删除在临时视图中过滤掉的记录
- wordpress - 在 WooCommerce 客户完成订单电子邮件通知中添加基于运输方式 ID 的消息
- c - 使用 C 将值打印到文本文件
- android - 在 Android Kotlin 的约束布局中设置适当的约束
- anylogic - 为什么延迟块为空时,Anylogic stopDelay() 函数会抛出错误?