首页 > 解决方案 > 复杂聚合:将三列分为两列,其中包含事件数

问题描述

对于我的短论文,我需要对恐怖主义进行网络分析。要执行此分析,我必须首先将现有数据形成更令人愉悦的外观。任务是对列中每年相同攻击的gname, gname2, gname3总和和计数进行聚合。所以如果两个事件在同一年由同一组两次发生,则为2。nkilln_attacksn_attacks

输入数据

|    |   iyear | gname                                                        | gname2                                    | gname3                                    |   nkill |
|---:|--------:|:-------------------------------------------------------------|:------------------------------------------|:------------------------------------------|--------:|
|  0 |    2017 | Lashkar-e-Jhangvi                                            | Tehrik-i-Taliban Pakistan (TTP)           | Aalmi Majlis-i-Tahfuz-i-Khatam-i-Nubuwaat |       1 |
|  1 |    2010 | Indian Mujahideen                                            | Students Islamic Movement of India (SIMI) | 313 Brigade                               |      17 |
|  2 |    2011 | Zeliangrong United Front                                     | Zeliangrong United Front                  | nan                                       |       7 |
|  3 |    2015 | National Socialist Council of Nagaland-Khaplang (NSCN-K)     | Zeliangrong United Front                  | nan                                       |       2 |
|  4 |    2008 | Communist Party of Nepal- Unified Marxist-Leninist (CPN-UML) | Young Communist League                    | nan                                       |       1 |
|  5 |    2015 | Abu Sayyaf Group (ASG)                                       | Waning Abdusalam Group (WAG)              | nan                                       |       4 |
|  6 |    2015 | Abu Sayyaf Group (ASG)                                       | Waning Abdusalam Group (WAG)              | nan                                       |       6 |
|  7 |    2015 | Baba Ladla Gang                                              | Uzair Baloch Gang                         | nan                                       |       7 |
|  8 |    2014 | Taliban                                                      | Tehrik-i-Taliban Pakistan (TTP)           | nan                                       |      24 |

CSV

示例输出

|    |   iyear | source                                                       | target                                    |   nkill |   n_attacks |
|---:|--------:|:-------------------------------------------------------------|:------------------------------------------|--------:|------------:|
|  0 |    2017 | Lashkar-e-Jhangvi                                            | Tehrik-i-Taliban Pakistan (TTP)           |       1 |           1 |
|  1 |    2010 | Indian Mujahideen                                            | Students Islamic Movement of India (SIMI) |      17 |           1 |
|  2 |    2011 | Zeliangrong United Front                                     | Zeliangrong United Front                  |       7 |           1 |
|  3 |    2015 | National Socialist Council of Nagaland-Khaplang (NSCN-K)     | Zeliangrong United Front                  |       2 |           1 |
|  4 |    2008 | Communist Party of Nepal- Unified Marxist-Leninist (CPN-UML) | Young Communist League                    |       1 |           1 |
|  5 |    2015 | Abu Sayyaf Group (ASG)                                       | Waning Abdusalam Group (WAG)              |      10 |           2 |
|  6 |    2015 | Baba Ladla Gang                                              | Uzair Baloch Gang                         |       7 |           1 |
|  7 |    2014 | Taliban                                                      | Tehrik-i-Taliban Pakistan (TTP)           |      24 |           1 |
|  8 |    2017 | Aalmi Majlis-i-Tahfuz-i-Khatam-i-Nubuwaat                    | Lashkar-e-Jhangvi                         |       1 |           1 |
|  9 |    2017 | Aalmi Majlis-i-Tahfuz-i-Khatam-i-Nubuwaat                    | Tehrik-i-Taliban Pakistan (TTP)           |       1 |           1 |
| 10 |    2010 | 313 Brigade                                                  | Students Islamic Movement of India (SIMI) |      17 |           1 |
| 11 |    2010 | 313 Brigade                                                  | Indian Mujahideen                         |      17 |           1 |

其中
iyear = 事件发生的年份;
来源 = 来源恐怖组织;
目标 = 盟友恐怖组织;
nkill = 恐怖袭击中的受害者人数;
n_attacks = 同一年源和目标执行的攻击次数。

问题是我不知道如何使用 Pandas 执行此类操作。我想过只是 for 循环和半手动进行聚合,但老实说我不知道​​。任何提示或代码片段都会有所帮助。谢谢。
UPD: 可重复的示例

|    |   year | gname   | gname2   | gname3   |   nkill |
|---:|-------:|:--------|:---------|:---------|--------:|
|  0 |   1991 | A       | B        | C        |       3 |
|  1 |   1991 | C       | A        | nan      |       1 |
|  2 |   1991 | B       | C        | nan      |       2 |
|  3 |   1991 | D       | A        | B        |       5 |

样本输出

year | source | target | nkill   | n_attacks
1991 |    A   |   B    |    8    |    2
1991 |    B   |   C    |    5    |    2
1991 |    C   |   A    |    4    |    2
1991 |    D   |   A    |    5    |    1
etc...

标签: pythonpandaspandas-groupbyaggregate

解决方案


修改后的答案,基于更新的问题

好的,基于带有示例数据的更新示例,我终于弄清楚了 OP 想要什么:所有 3gname列的 2 排列。

from itertools import permutations

combos = [{x: 'source', y: 'target', z: 'ignored'} for x, y, z in permutations('gname gname2 gname3'.split())]
out = pd.concat([df.rename(columns=mapper) for mapper in combos]).groupby(
    ['iyear', 'source', 'target']
)['nkill'].agg([sum, 'count']).rename(
    columns={'sum':'nkill', 'count':'n_attacks'})

out较小样本输入的结果 ( ):

                     nkill  n_attacks
iyear source target                  
1991  A      B           8          2
             C           4          2
             D           5          1
      B      A           8          2
             C           5          2
             D           5          1
      C      A           4          2
             B           5          2
      D      A           5          1
             B           5          1

(纯粹主义者的注意事项:为什么不更精确... for x, y in permutations(..., r=2)呢?因为那样我们最终会得到一个gname没有重命名的列,并且concat剩下的列会变得更混乱gname——它的工作原理是一样的,但是concat更宽) .

初始答案(对于初始问题...)

尝试这个:

df.fillna('').groupby(
    'iyear gname gname2 gname3'.split()
)['nkill'].agg([sum, 'count'])

输出:

                                                                      sum  count
iyear gname                gname2               gname3                          
2008  Communist Party o... Young Communist L...                         1      1
2010  Indian Mujahideen    Students Islamic ... 313 Brigade            17      1
2011  Zeliangrong Unite... Zeliangrong Unite...                         7      1
2014  Taliban              Tehrik-i-Taliban ...                        24      1
2015  Abu Sayyaf Group ... Waning Abdusalam ...                        10      2
      Baba Ladla Gang      Uzair Baloch Gang                            7      1
      National Socialis... Zeliangrong Unite...                         2      1
2017  Lashkar-e-Jhangvi    Tehrik-i-Taliban ... Aalmi Majlis-i-Ta...    1      1

说明:

  • .fillna('')是因为您在其他文本列中有 NaN;
  • groupby(...)只是表达你想要的;
  • .agg()在一列或多列上接受多个聚合。

PS:顺便说一句,您可以通过添加:.rename(columns={'sum':'nkill', 'count':'n_attacks'}).


推荐阅读