python - 如何将选定的数据转换为相同的长度(形状)
问题描述
我正在读取多个 .csv 文件作为具有相同形状的 panda DataFrame。对于某些索引,某些值为零,因此我想选择具有相同形状的每个索引的值,并为相同的索引输入零值并删除零以成为相同的形状:
a = pd.DataFrame(pd.read_csv("path_a",index_col=0))
b = pd.DataFrame(pd.read_csv("path_b",index_col=0))
c = pd.DataFrame(pd.read_csv("path_c",index_col=0))
print a,"\n",b,"\n",c
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
for i in range (0,X):
xdata = a[i]
xdata1 = b[i]
xdata2 = c[i]
xdata = np.where(xdata2==0,0,xdata)
xdata1 = np.where(xdata2==0,0,xdata1)
xdata1 = np.where(xdata==0,0,xdata1)
xdata2 = np.where(xdata==0,0,xdata2)
xdata = np.where(xdata1==0,0,xdata)
xdata2 = np.where(xdata1==0,0,xdata2)
indexX = np.argwhere(xdata==0)
index1X = np.argwhere(xdata1==0)
index2X = np.argwhere(xdata2==0)
xdata = np.delete(xdata,indexX)
xdata1 = np.delete(xdata1,index1X)
xdata2 = np.delete(xdata2,index2X)
print d[i],"\n",xdata,"\n",xdata1,"\n",xdata2
1980 1985 1990 1995 2000 2005 2010
ISO3
AFG 0.0 0.0 3.8 0.0 0.0 9.8 0.0
AGO 2.0 0.0 3.0 4.0 0.0 0.0 0.0
ALB 0.0 0.2 0.5 0.2 1.3 1.6 2.7
AND 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ARE 0.7 0.8 0.9 1.7 2.3 2.7 3.0
ARG 3.1 6.7 5.3 15.1 17.2 18.2 18.7
ARM 0.4 0.5 0.5 0.5 0.4 1.2 1.3
1980 1985 1990 1995 2000 2005 2010
ISO3
AFG 2.5 0.0 0.0 4.7 0.0 0.0 0.0
AGO 13.1 14.9 15.8 16.4 16.9 17.6 18.1
ALB 1.4 1.5 1.6 1.6 1.6 1.6 1.7
AND 0.2 0.2 0.2 0.2 0.1 0.4 0.6
ARE 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ARG 1.8 1.8 1.7 1.8 1.8 1.9 1.9
ARM 1.8 1.8 1.7 0.0 1.8 1.9 1.5
1980 1985 1990 1995 2000 2005 2010
ISO3
AFG 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AGO 0.0 0.0 4.7 5.8 6.0 0.0 0.0
ALB 0.0 0.2 0.5 0.2 1.3 1.6 2.7
AND 1.4 1.8 2.3 3.7 0.0 0.0 5.4
ARE 0.7 0.8 0.9 1.7 2.3 2.7 3.0
ARG 3.1 6.7 5.3 15.1 17.2 18.2 18.7
ARM 0.4 0.5 0.5 0.5 0.4 1.2 1.3
AFG
[]
[]
[]
AGO
[ 3. 4.]
[ 15.8 16.4]
[ 4.7 5.8]
ALB
[ 0.2 0.5 0.2 1.3 1.6 2.7]
[ 1.5 1.6 1.6 1.6 1.6 1.7]
[ 0.2 0.5 0.2 1.3 1.6 2.7]
AND
[]
[]
[]
ARE
[]
[]
[]
ARG
[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]
[ 1.8 1.8 1.7 1.8 1.8 1.9 1.9]
[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]
ARM
[ 0.4 0.5 0.5 0.4 1.2 1.3]
[ 1.8 1.8 1.7 1.8 1.9 1.5]
[ 0.4 0.5 0.5 0.4 1.2 1.3]
此代码有效,但它是一种暂定方式,并且在数据数量较多时效率不高。您能否建议我一种更有效的方法以及如何根据最小长度索引选择数据?
解决方案
一个想法是多个所有 3 个数组,然后对其进行测试0
,也可以使用 list 中的 3 个数组进行循环L1
。然后也改变了逻辑 - 选择不匹配掩码的值而np.argwhere
不是np.delete
:
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
for arr in L1:
xdata = arr[i][m[i]]
print (xdata)
如果使用 pandas 0.24+,那么最好使用转换为 numpy 数组to_numpy
:
L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
for arr in L1:
xdata = arr[i][m[i]]
print (xdata)
编辑:
L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]
for i in range (0,X):
out = []
for arr in L1:
xdata = arr[i][m[i]]
out.append(xdata)
data = np.vstack((out))
print (data)
[]
[[ 3. 4. ]
[15.8 16.4]
[ 4.7 5.8]]
[[0.2 0.5 0.2 1.3 1.6 2.7]
[1.5 1.6 1.6 1.6 1.6 1.7]
[0.2 0.5 0.2 1.3 1.6 2.7]]
[]
[]
[[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]
[ 1.8 1.8 1.7 1.8 1.8 1.9 1.9]
[ 3.1 6.7 5.3 15.1 17.2 18.2 18.7]]
[[0.4 0.5 0.5 0.4 1.2 1.3]
[1.8 1.8 1.7 1.8 1.9 1.5]
[0.4 0.5 0.5 0.4 1.2 1.3]]
推荐阅读
- jmx - ActiveMQ Artemis JMX 访问
- angular - 在角度 2 中使用正则表达式时出错,模块解析失败:严格模式下的八进制文字
- android - 在 Android 中使用 ARCore 保留旧 Frame
- java - 无法在 IntelliJ 中运行
- python - 使用 python openpyxl 库将字典转换为 xls 文件
- ios - 表视图单元格中的计时器计划计时器不断显示错误的索引路径行 swift 4
- c++ - 传递给方法时 std::string 不同
- python - 来自 python 脚本的 ARP 人口不起作用
- c# - 如何声明所需的类型以避免“.. 模糊引用..”
- jhipster - 使用 JHipster 过滤多对多关系时为空