python - Quickest way to assign cell values in Pandas
问题描述
I have a list of tuples:
d = [("a", "x"), ("b", "y"), ("a", "y")]
and the DataFrame
:
y x
b 0.0 0.0
a 0.0 0.0
I would like to replace any 0s
with 1s
if the row and column labels correspond to a tuple in d
, such that the new DataFrame is:
y x
b 1.0 0.0
a 1.0 1.0
I am currently using:
for i, j in d:
df.loc[i, j] = 1.0
This seems to me as the most "pythonic" approach but for a DataFrame
of shape 20000 * 20000 and a list of length 10000, this process literally takes forever. There must be a better way of accomplishing this. Any ideas?
Thanks
解决方案
Approach #1: No bad entries in d
Here's one NumPy based method -
def assign_val(df, d, newval=1):
# Get d-rows,cols as arrays for efficient usage latet on
di,dc = np.array([j[0] for j in d]), np.array([j[1] for j in d])
# Get col and index data
i,c = df.index.values.astype(di.dtype),df.columns.values.astype(dc.dtype)
# Locate row indexes from d back to df
sidx_i = i.argsort()
I = sidx_i[np.searchsorted(i,di,sorter=sidx_i)]
# Locate column indexes from d back to df
sidx_c = c.argsort()
C = sidx_c[np.searchsorted(c,dc,sorter=sidx_c)]
# Assign into array data with new values
df.values[I,C] = newval
# Use df.to_numpy(copy=False)[I,C] = newval on newer pandas versions
return df
Sample run -
In [21]: df = pd.DataFrame(np.zeros((2,2)), columns=['y','x'], index=['b','a'])
In [22]: d = [("a", "x"), ("b", "y"), ('a','y')]
In [23]: assign_val(df, d, newval=1)
Out[23]:
y x
b 1.0 0.0
a 1.0 1.0
Approach #2: Generic one
If there are any bad entries in `d, we need to filter out those. So, a modified one for that generic case would be -
def ssidx(i,di):
sidx_i = i.argsort()
idx_i = np.searchsorted(i,di,sorter=sidx_i)
invalid_mask = idx_i==len(sidx_i)
idx_i[invalid_mask] = 0
I = sidx_i[idx_i]
invalid_mask |= i[I]!=di
return I,invalid_mask
# Get d-rows,cols as arrays for efficient usage latet on
di,dc = np.array([j[0] for j in d]), np.array([j[1] for j in d])
# Get col and index data
i,c = df.index.values.astype(di.dtype),df.columns.values.astype(dc.dtype)
# Locate row indexes from d back to df
I,badmask_I = ssidx(i,di)
# Locate column indexes from d back to df
C,badmask_C = ssidx(c,dc)
badmask = badmask_I | badmask_C
goodmask = ~badmask
df.values[I[goodmask],C[goodmask]] = newval
推荐阅读
- javascript - 如何在 nuxtServerInit() 中获取 cookie?
- module - 导入带有打开的头文件:命名空间/模块错误
- mysql - 通过单个查询从多个表中返回数据
- php - 如何通过PHP在我的txt文件中显示员工记录的搜索结果?
- postgresql - 即时与 ZonedDateTime
- mysql - MySQL 使用 GROUP BY 和 LEFT JOIN 计算行数
- reactjs - 如何在 React.js 中按名称和地址进行过滤
- postgresql - PostgreSQL 和 Kafka Connect 集成问题
- java - 什么是在firebase android中保存具有相同对象列表的对象的最佳方法
- java - 计算 sin cos & tan (java) 缺乏精度