首页 > 解决方案 > 如何将选定的数据转换为相同的长度(形状)

问题描述

我正在读取多个 .csv 文件作为具有相同形状的 panda DataFrame。对于某些索引,某些值为零,因此我想选择具有相同形状的每个索引的值,并为相同的索引输入零值并删除零以成为相同的形状:

a = pd.DataFrame(pd.read_csv("path_a",index_col=0))
b = pd.DataFrame(pd.read_csv("path_b",index_col=0))
c = pd.DataFrame(pd.read_csv("path_c",index_col=0))
print a,"\n",b,"\n",c
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
for i in range (0,X):
    xdata  = a[i]
    xdata1 = b[i]
    xdata2 = c[i]
    xdata  = np.where(xdata2==0,0,xdata)
    xdata1 = np.where(xdata2==0,0,xdata1)
    xdata1 = np.where(xdata==0,0,xdata1)
    xdata2 = np.where(xdata==0,0,xdata2)
    xdata  = np.where(xdata1==0,0,xdata)
    xdata2 = np.where(xdata1==0,0,xdata2)
    indexX  = np.argwhere(xdata==0)
    index1X = np.argwhere(xdata1==0)
    index2X = np.argwhere(xdata2==0)
    xdata  = np.delete(xdata,indexX)
    xdata1 = np.delete(xdata1,index1X)
    xdata2 = np.delete(xdata2,index2X)
    print d[i],"\n",xdata,"\n",xdata1,"\n",xdata2
     1980  1985  1990  1995  2000  2005  2010
ISO3                                          
AFG    0.0   0.0   3.8   0.0   0.0   9.8   0.0
AGO    2.0   0.0   3.0   4.0   0.0   0.0   0.0
ALB    0.0   0.2   0.5   0.2   1.3   1.6   2.7
AND    0.0   0.0   0.0   0.0   0.0   0.0   0.0
ARE    0.7   0.8   0.9   1.7   2.3   2.7   3.0
ARG    3.1   6.7   5.3  15.1  17.2  18.2  18.7
ARM    0.4   0.5   0.5   0.5   0.4   1.2   1.3 
      1980  1985  1990  1995  2000  2005  2010
ISO3                                          
AFG    2.5   0.0   0.0   4.7   0.0   0.0   0.0
AGO   13.1  14.9  15.8  16.4  16.9  17.6  18.1
ALB    1.4   1.5   1.6   1.6   1.6   1.6   1.7
AND    0.2   0.2   0.2   0.2   0.1   0.4   0.6
ARE    0.0   0.0   0.0   0.0   0.0   0.0   0.0
ARG    1.8   1.8   1.7   1.8   1.8   1.9   1.9
ARM    1.8   1.8   1.7   0.0   1.8   1.9   1.5 
      1980  1985  1990  1995  2000  2005  2010
ISO3                                          
AFG    0.0   0.0   0.0   0.0   0.0   0.0   0.0
AGO    0.0   0.0   4.7   5.8   6.0   0.0   0.0
ALB    0.0   0.2   0.5   0.2   1.3   1.6   2.7
AND    1.4   1.8   2.3   3.7   0.0   0.0   5.4
ARE    0.7   0.8   0.9   1.7   2.3   2.7   3.0
ARG    3.1   6.7   5.3  15.1  17.2  18.2  18.7
ARM    0.4   0.5   0.5   0.5   0.4   1.2   1.3

AFG 
[] 
[] 
[]
AGO 
[ 3.  4.] 
[ 15.8  16.4] 
[ 4.7  5.8]
ALB 
[ 0.2  0.5  0.2  1.3  1.6  2.7] 
[ 1.5  1.6  1.6  1.6  1.6  1.7] 
[ 0.2  0.5  0.2  1.3  1.6  2.7]
AND 
[] 
[] 
[]
ARE 
[] 
[] 
[]
ARG 
[  3.1   6.7   5.3  15.1  17.2  18.2  18.7] 
[ 1.8  1.8  1.7  1.8  1.8  1.9  1.9] 
[  3.1   6.7   5.3  15.1  17.2  18.2  18.7]
ARM 
[ 0.4  0.5  0.5  0.4  1.2  1.3] 
[ 1.8  1.8  1.7  1.8  1.9  1.5] 
[ 0.4  0.5  0.5  0.4  1.2  1.3]

此代码有效,但它是一种暂定方式,并且在数据数量较多时效率不高。您能否建议我一种更有效的方法以及如何根据最小长度索引选择数据?

标签: pythonarrayspandasdataframe

解决方案


一个想法是多个所有 3 个数组,然后对其进行测试0,也可以使用 list 中的 3 个数组进行循环L1。然后也改变了逻辑 - 选择不匹配掩码的值而np.argwhere不是np.delete

L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
m = (a * b * c) != 0
L1 = [a,b,c]

for i in range (0,X):
    for arr in L1:
        xdata  = arr[i][m[i]]
        print (xdata)

如果使用 pandas 0.24+,那么最好使用转换为 numpy 数组to_numpy

L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]

for i in range (0,X):
    for arr in L1:
        xdata  = arr[i][m[i]]
        print (xdata)

编辑:

L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]

for i in range (0,X):
    out = []
    for arr in L1:
        xdata  = arr[i][m[i]]
        out.append(xdata)
    data = np.vstack((out))
    print (data)

[]
[[ 3.   4. ]
 [15.8 16.4]
 [ 4.7  5.8]]
[[0.2 0.5 0.2 1.3 1.6 2.7]
 [1.5 1.6 1.6 1.6 1.6 1.7]
 [0.2 0.5 0.2 1.3 1.6 2.7]]
[]
[]
[[ 3.1  6.7  5.3 15.1 17.2 18.2 18.7]
 [ 1.8  1.8  1.7  1.8  1.8  1.9  1.9]
 [ 3.1  6.7  5.3 15.1 17.2 18.2 18.7]]
[[0.4 0.5 0.5 0.4 1.2 1.3]
 [1.8 1.8 1.7 1.8 1.9 1.5]
 [0.4 0.5 0.5 0.4 1.2 1.3]]

推荐阅读