首页 > 解决方案 > 用于处理大整数的多个数字数组

问题描述

我处理非常大的数字,整数,有 10000 位数字,所以我将每个数字拆分为数组。

小数据样本:

#all combinations with length 3 of values in list L
N = 3
L = [[1,9,0]]*N
a = np.array(np.meshgrid(*L)).T.reshape(-1,N)
#it is number so removed first 0 and also last value is always 0
a = a[(a[:, 0] != 0) & (a[:, -1] == 0)]
print (a)
[[1 1 0]
 [1 9 0]
 [1 0 0]
 [9 1 0]
 [9 9 0]
 [9 0 0]]

然后我需要 1.1 标量的多个数字。为了更好地理解:

#joined arrays to numbers
b = np.array([int(''.join(x)) for x in a.astype(str)])[:, None]
print (b)
[[110]
 [190]
 [100]
 [910]
 [990]
 [900]]

#multiple by constant
c = b * 1.1
print (c)
[[ 121.]
 [ 209.]
 [ 110.]
 [1001.]
 [1089.]
 [ 990.]]

但是因为有 10000 位数字,所以这个解决方案是不可能的,因为四舍五入。所以我需要多个数组的解决方案:

我尝试什么:将最后一个 0 '列'添加到第一个,然后求和:

a1 = np.hstack((a[:, [-1]] , a[:, :-1] ))
print (a1)
[[0 1 1]
 [0 1 9]
 [0 1 0]
 [0 9 1]
 [0 9 9]
 [0 9 0]]

print (a1 + a)
[[ 1  2  1]
 [ 1 10  9]
 [ 1  1  0]
 [ 9 10  1]
 [ 9 18  9]
 [ 9  9  0]]

但问题是,如果值更像9是需要移动下一位(如旧学校论文求和),则预期输出为:

c1 = np.array([list(str(x).split('.')[0].zfill(4)) for x in np.ravel(c)]).astype(int)
print (c1)
[[0 1 2 1]
 [0 2 0 9]
 [0 1 1 0]
 [1 0 0 1]
 [1 0 8 9]
 [0 9 9 0]]

是否有可能使用一些快速矢量化解决方案c1从数组生成a数组?

编辑:我尝试通过@yatu引发错误的另一个数据进行测试和解决:

ValueError:无法将浮点 NaN 转换为整数


from itertools import product,zip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

#real data
#M = 100000
#N = 500
#loop by chunks by length 5
M = 20
N = 5
v = [0]*M
for i in grouper(product([9, 0], repeat=M), N, v):
    a = np.array(i)
#    print (a)
    #it is number so removed first 0 and also last value is always 0
    a = a[(a[:, 0] != 0) & (a[:, -1] == 0)]
    print (a)
#

    s = np.arange(a.shape[1]-1, -1, -1)
    # concat digits along cols, and multiply
    b = (a * 10**s).sum(1)*1.1
    # highest amount of digits in b
    n_cols = int(np.log10(b.max()))
    # broadcast division to reverse
    c = b[:, None] // 10**np.arange(n_cols, -1, -1)
    # keep only last digit
    c1 = (c%10).astype(int)
    print (c1)

标签: pythonarraysperformancenumpyconstants

解决方案


这是一个从a. 这个想法是将每列乘以10**seqseq作为一个范围,直到列数,并按降序排列。一旦我们沿着第二个轴,这将作为沿列的数字的串联。sum最后,我们可以通过应用相同的逻辑来反转该过程,但在乘以 后除以并广播到结果形状1.1,并对结果取模 10 以仅保留最后一位:

s = np.arange(a.shape[1]-1, -1, -1, dtype=np.float64)
# concat digits along cols, and multiply
b = (a * 10**s).sum(1)*1.1
# highest amount of digits in b
n_cols = int(np.log10(b.max()))
# broadcast division to reverse
c = b[:, None] // 10**np.arange(n_cols, -1, -1, dtype=np.float64)
# keep only last digit
c1 = (c%10).astype(int)

print(c1)

array([[0, 1, 2, 1],
       [0, 2, 0, 9],
       [0, 1, 1, 0],
       [1, 0, 0, 1],
       [1, 0, 8, 9],
       [0, 9, 9, 0]])

更新 -

上述方法适用于不高于支持的整数int64,即:

np.iinfo(np.int64).max
# 9223372036854775807

但是,在这种情况下可以 做的是将数组值保存为 pythonint而不是 numpy dtype。所以我们可以将两者都定义np.arangedtype对象,上面应该适用于共享示例:

s = np.arange(a.shape[1]-1, -1, -1, dtype=object)
# concat digits along cols, and multiply
b = (a * 10**s).sum(1)*1.1
# highest amount of digits in b
n_cols = int(np.log10(b.max()))
# broadcast division to reverse
c = b[:, None] // 10**np.arange(n_cols, -1, -1, dtype=object)
# keep only last digit
c1 = (c%10).astype(int)

推荐阅读