python-3.x - 用相对于行排名的列名替换数据框中的空值
问题描述
我为我的问题设置了以下设置。我有一个数据框GEO
,,
{'index': {0: 2,
1: 3,
2: 25,
3: 26,
4: 27,
5: 28,
6: 32,
7: 33,
8: 78,
9: 83,
10: 91,
11: 96,
12: 98,
13: 99,
14: 138,
15: 155,
16: 156,
17: 1790},
'id': {0: 'A001DS007',
1: 'A001DS01',
2: 'A001GS10',
3: 'A001GS11',
4: 'A001GS12',
5: 'A001GS13',
6: 'A001GS17',
7: 'A001GS18',
8: 'A003DS05',
9: 'A003DS10',
10: 'A003GS02',
11: 'A003GS07',
12: 'A003GS09',
13: 'A003GS10',
14: 'A005DS03',
15: 'A005GS01',
16: 'A005GS02',
17: 'R13C1G'},
'geometry_zone': {0: <shapely.geometry.polygon.Polygon at 0x1602b9d3fa0>,
1: <shapely.geometry.polygon.Polygon at 0x1602b9d3af0>,
2: <shapely.geometry.polygon.Polygon at 0x16026d9cee0>,
3: <shapely.geometry.polygon.Polygon at 0x16026d9cc40>,
4: <shapely.geometry.polygon.Polygon at 0x16026d9c940>,
5: <shapely.geometry.polygon.Polygon at 0x1601dd8e1f0>,
6: <shapely.geometry.polygon.Polygon at 0x1602b9f86a0>,
7: <shapely.geometry.polygon.Polygon at 0x1602b9f8e20>,
8: <shapely.geometry.polygon.Polygon at 0x16026dae790>,
9: <shapely.geometry.polygon.Polygon at 0x16026dae1f0>,
10: <shapely.geometry.polygon.Polygon at 0x16023fd80d0>,
11: <shapely.geometry.polygon.Polygon at 0x16026279d00>,
12: <shapely.geometry.polygon.Polygon at 0x160262797c0>,
13: <shapely.geometry.polygon.Polygon at 0x16026279880>,
14: <shapely.geometry.polygon.Polygon at 0x16026da3820>,
15: <shapely.geometry.polygon.Polygon at 0x1602be25f70>,
16: <shapely.geometry.polygon.Polygon at 0x1602be3dd60>,
17: <shapely.geometry.polygon.Polygon at 0x1602bafdbb0>},
'centroid': {0: <shapely.geometry.point.Point at 0x16029e24e80>,
1: <shapely.geometry.point.Point at 0x16029e24c10>,
2: <shapely.geometry.point.Point at 0x160299e6220>,
3: <shapely.geometry.point.Point at 0x160299e6850>,
4: <shapely.geometry.point.Point at 0x16029befe50>,
5: <shapely.geometry.point.Point at 0x16029befdf0>,
6: <shapely.geometry.point.Point at 0x16029bef1f0>,
7: <shapely.geometry.point.Point at 0x16029befd90>,
8: <shapely.geometry.point.Point at 0x1602bce6160>,
9: <shapely.geometry.point.Point at 0x1602bcc3790>,
10: <shapely.geometry.point.Point at 0x1602bcc3640>,
11: <shapely.geometry.point.Point at 0x1602bcc36a0>,
12: <shapely.geometry.point.Point at 0x1602bcc3730>,
13: <shapely.geometry.point.Point at 0x1602bcc3670>,
14: <shapely.geometry.point.Point at 0x16025cc3af0>,
15: <shapely.geometry.point.Point at 0x1602b9f8100>,
16: <shapely.geometry.point.Point at 0x1602b9f8d30>,
17: <shapely.geometry.point.Point at 0x1601e48eb50>},
'A001DS007': {0: 1,
1: 10,
2: 8,
3: 8,
4: 7,
5: 6,
6: 3,
7: 3,
8: 6,
9: 5,
10: 12,
11: 2,
12: 5,
13: 5,
14: 10,
15: 10,
16: 10,
17: 13},
'A001DS01': {0: 13,
1: 1,
2: 3,
3: 4,
4: 5,
5: 4,
6: 13,
7: 13,
8: 10,
9: 14,
10: 2,
11: 14,
12: 14,
13: 14,
14: 7,
15: 7,
16: 8,
17: 3},
'A001GS10': {0: 14,
1: 2,
2: 1,
3: 3,
4: 4,
5: 5,
6: 14,
7: 14,
8: 18,
9: 17,
10: 7,
11: 16,
12: 16,
13: 16,
14: 11,
15: 11,
16: 16,
17: 6},
'A001GS11': {0: 12,
1: 3,
2: 2,
3: 1,
4: 3,
5: 3,
6: 11,
7: 11,
8: 16,
9: 15,
10: 5,
11: 12,
12: 13,
13: 13,
14: 12,
15: 12,
16: 12,
17: 8},
'A001GS12': {0: 9,
1: 5,
2: 4,
3: 2,
4: 1,
5: 2,
6: 8,
7: 9,
8: 14,
9: 12,
10: 6,
11: 11,
12: 10,
13: 10,
14: 13,
15: 13,
16: 11,
17: 9},
'A001GS13': {0: 7,
1: 7,
2: 5,
3: 5,
4: 2,
5: 1,
6: 7,
7: 7,
8: 11,
9: 10,
10: 8,
11: 9,
12: 9,
13: 9,
14: 14,
15: 14,
16: 14,
17: 11},
'A001GS17': {0: 3,
1: 13,
2: 11,
3: 10,
4: 10,
5: 8,
6: 1,
7: 2,
8: 15,
9: 8,
10: 14,
11: 6,
12: 7,
13: 7,
14: 17,
15: 17,
16: 17,
17: 15},
'A001GS18': {0: 4,
1: 14,
2: 12,
3: 12,
4: 12,
5: 10,
6: 2,
7: 1,
8: 17,
9: 7,
10: 16,
11: 8,
12: 6,
13: 6,
14: 18,
15: 18,
16: 18,
17: 16},
'A003DS05': {0: 8,
1: 8,
2: 10,
3: 11,
4: 11,
5: 12,
6: 10,
7: 10,
8: 1,
9: 6,
10: 4,
11: 4,
12: 8,
13: 8,
14: 6,
15: 6,
16: 5,
17: 10},
'A003DS10': {0: 10,
1: 18,
2: 18,
3: 17,
4: 17,
5: 16,
6: 9,
7: 8,
8: 9,
9: 1,
10: 18,
11: 7,
12: 3,
13: 3,
14: 15,
15: 15,
16: 13,
17: 18},
'A003GS02': {0: 11,
1: 4,
2: 7,
3: 6,
4: 6,
5: 7,
6: 12,
7: 12,
8: 4,
9: 11,
10: 1,
11: 10,
12: 11,
13: 11,
14: 5,
15: 5,
16: 6,
17: 2},
'A003GS07': {0: 2,
1: 9,
2: 9,
3: 9,
4: 9,
5: 9,
6: 4,
7: 6,
8: 2,
9: 4,
10: 11,
11: 1,
12: 4,
13: 4,
14: 8,
15: 8,
16: 7,
17: 12},
'A003GS09': {0: 5,
1: 15,
2: 15,
3: 13,
4: 13,
5: 13,
6: 5,
7: 4,
8: 5,
9: 3,
10: 15,
11: 3,
12: 1,
13: 2,
14: 9,
15: 9,
16: 9,
17: 14},
'A003GS10': {0: 6,
1: 17,
2: 16,
3: 16,
4: 14,
5: 14,
6: 6,
7: 5,
8: 8,
9: 2,
10: 17,
11: 5,
12: 2,
13: 1,
14: 16,
15: 16,
16: 15,
17: 17},
'A005DS03': {0: 18,
1: 16,
2: 17,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 12,
9: 16,
10: 13,
11: 18,
12: 18,
13: 18,
14: 1,
15: 3,
16: 3,
17: 7},
'A005GS01': {0: 17,
1: 11,
2: 13,
3: 15,
4: 16,
5: 17,
6: 17,
7: 17,
8: 7,
9: 13,
10: 10,
11: 15,
12: 15,
13: 15,
14: 2,
15: 1,
16: 2,
17: 4},
'A005GS02': {0: 15,
1: 12,
2: 14,
3: 14,
4: 15,
5: 15,
6: 16,
7: 15,
8: 3,
9: 9,
10: 9,
11: 13,
12: 12,
13: 12,
14: 3,
15: 2,
16: 1,
17: 5},
'R13C1G': {0: 16,
1: 6,
2: 6,
3: 7,
4: 8,
5: 11,
6: 15,
7: 16,
8: 13,
9: 18,
10: 3,
11: 17,
12: 17,
13: 17,
14: 4,
15: 4,
16: 4,
17: 1}}
其中列名与 相同id
,每行中的值指示列名中的项目有多接近id
(它对所有计算的距离进行排名)。
GEO =
index id geometry_zone \
0 2 A001DS007 POLYGON ((53.16200 50.20131, 52.84363 48.45026...
1 3 A001DS01 POLYGON ((59.04953 49.34561, 58.77158 47.52346...
2 25 A001GS10 POLYGON ((58.72525 45.79395, 59.72898 45.57777...
centroid A001DS007 A001DS01 A001GS10 A001GS11 \
0 POINT (53.49869 49.22928) 1 13 14 12
1 POINT (59.29040 48.38586) 10 1 2 3
2 POINT (59.35477 46.54529) 8 3 1 2
A001GS12 A001GS13 A001GS17 A001GS18 A003DS05 A003DS10 A003GS02 \
0 9 7 3 4 8 10 11
1 5 7 13 14 8 18 4
2 4 5 11 12 10 18 7
A003GS07 A003GS09 A003GS10 A005DS03 A005GS01 A005GS02 R13C1G
0 2 5 6 18 17 15 16
1 9 15 17 16 11 12 6
2 9 15 16 17 13 14 6
同时,我有另一个数据框,
item = {'Code': {0: 'A1',
1: 'A2',
2: 'A3',
3: 'A4',
4: 'A5',
5: 'A6',
6: 'A7',
7: 'A8',
8: 'A9',
9: 'A10',
10: 'A11',
11: 'A12',
12: 'A13',
13: 'A14',
14: 'A15',
15: 'A16',
16: 'A17',
17: 'A18',
18: 'A19',
19: 'A20',
20: 'A21',
21: 'A22',
22: 'A23',
23: 'A24',
24: 'A25',
25: 'A26'},
'Manual': {0: 'A003GS07',
1: 'A003GS07',
2: 'A003GS07',
3: 'A003GS02',
4: 'A001GS10',
5: 'A001GS11',
6: 'A001GS18',
7: 'A001GS12',
8: 'R13C1G',
9: 'A001GS12',
10: 'A001GS18',
11: 'A001GS17',
12: 'A001GS13',
13: 'A001GS13',
14: 'A003DS10',
15: 'A005GS02',
16: 'A003DS05',
17: 'A005GS01',
18: 'A003GS09',
19: 'A003GS10',
20: 'A003GS10',
21: 'A003GS10',
22: 'A003GS10',
23: 'A003GS10',
24: 'A003GS10',
25: 'A005DS03'},
'closest_1': {0: '',
1: '',
2: '',
3: '',
4: '',
5: '',
6: '',
7: '',
8: '',
9: '',
10: '',
11: '',
12: '',
13: '',
14: '',
15: '',
16: '',
17: '',
18: '',
19: '',
20: '',
21: '',
22: '',
23: '',
24: '',
25: ''},
'closest_2': {0: '',
1: '',
2: '',
3: '',
4: '',
5: '',
6: '',
7: '',
8: '',
9: '',
10: '',
11: '',
12: '',
13: '',
14: '',
15: '',
16: '',
17: '',
18: '',
19: '',
20: '',
21: '',
22: '',
23: '',
24: '',
25: ''},
'closest_3': {0: '',
1: '',
2: '',
3: '',
4: '',
5: '',
6: '',
7: '',
8: '',
9: '',
10: '',
11: '',
12: '',
13: '',
14: '',
15: '',
16: '',
17: '',
18: '',
19: '',
20: '',
21: '',
22: '',
23: '',
24: '',
25: ''},
'closest_4': {0: '',
1: '',
2: '',
3: '',
4: '',
5: '',
6: '',
7: '',
8: '',
9: '',
10: '',
11: '',
12: '',
13: '',
14: '',
15: '',
16: '',
17: '',
18: '',
19: '',
20: '',
21: '',
22: '',
23: '',
24: '',
25: ''}}
Code Manual closest_1 closest_2 closest_3 closest_4
0 A1 A003GS07
1 A2 A003GS07
2 A3 A003GS07
3 A4 A003GS02
4 A5 A001GS10
5 A6 A001GS11
6 A7 A001GS18
7 A8 A001GS12
8 A9 R13C1G
9 A10 A001GS12
10 A11 A001GS18
11 A12 A001GS17
12 A13 A001GS13
13 A14 A001GS13
14 A15 A003DS10
15 A16 A005GS02
16 A17 A003DS05
17 A18 A005GS01
18 A19 A003GS09
19 A20 A003GS10
20 A21 A003GS10
21 A22 A003GS10
22 A23 A003GS10
23 A24 A003GS10
24 A25 A003GS10
25 A26 A005DS03
现在,我想用四个最接近的项目逐行替换所有空列,所以第一行是
Code Manual closest_1 closest_2 closest_3 closest_4
0 A1 A003GS07 A001DS007 A003GS07 A001GS17 A001GS18
即第一行1、2、3、4排位对应的列名。
我已经尝试合并、加入、为每一行创建字典,然后在该字典中删除任何大于 4 的元素......但没有任何尝试接近工作。
如果有人对如何实现这一目标有任何想法,我将不胜感激。
解决方案
让我们调用第一个 DataFramedf1
和第二个df2
。您可以先编译一个 DataFrame,每行有 4 个最近的点,然后用它来替换df2
. 我使用绝对位置来识别以“A00…”开头的列,但您也可以根据您的用例对名称使用过滤器。
subs = (df1.set_index('id')
.apply(lambda x: pd.Series(x.iloc[2:]
.sort_values()
.iloc[:4]
.index),
axis=1))
df2[['closest_1', 'closest_2', 'closest_3', 'closest_4']] = df2['Manual'].apply(lambda x: subs.loc[x])
输出:
Code Manual closest_1 closest_2 closest_3 closest_4
0 A1 A003GS07 A003GS07 A001DS007 A003GS09 A003DS05
1 A2 A003GS07 A003GS07 A001DS007 A003GS09 A003DS05
2 A3 A003GS07 A003GS07 A001DS007 A003GS09 A003DS05
3 A4 A003GS02 A003GS02 A001DS01 R13C1G A003DS05
4 A5 A001GS10 A001GS10 A001GS11 A001DS01 A001GS12
5 A6 A001GS11 A001GS11 A001GS12 A001GS10 A001DS01
6 A7 A001GS18 A001GS18 A001GS17 A001DS007 A003GS09
…
推荐阅读
- python - TensorFlow How to Initialize Global Step
- angular - canDeactivate 与 Observable
- c++ - 为什么在 C/C++ 中可以多次包含标头?
- python - 从 VBA 启动时导入 python 模块
- react-native - 手机APP认证匹配手机号有哪些安全隐患?
- mysql - 删除此查询中的联合,仍然得到相同的结果
- azure - PowerShell script for restoring the Azure SQL Database from Azure File share
- c++ - 从字符串中读取变量数据
- javascript - 正则表达式 Javascript 转义特殊字符加字符限制
- database-design - Is this a correct database design? (pk fk set and references)