首页 > 解决方案 > 如果是熊猫数据框中的条件并提取列值

问题描述

我有这个数据框(df),看起来像

+-----------------+-----------+----------------+---------------------+--------------+-------------+
|      Gene       | Gene name |     Tissue     |      Cell type      |    Level     | Reliability |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| ENSG00000001561 | ENPP4     | adipose tissue | adipocytes          | Low          | Approved    |
| ENSG00000001561 | ENPP4     | adrenal gland  | glandular cells     | High         | Approved    |
| ENSG00000001561 | ENPP4     | appendix       | glandular cells     | Medium       | Approved    |
| ENSG00000001561 | ENPP4     | appendix       | lymphoid tissue     | Low          | Approved    |
| ENSG00000001561 | ENPP4     | bone marrow    | hematopoietic cells | Medium       | Approved    |
| ENSG00000002586 | CD99      | adipose tissue | adipocytes          | Low          | Supported   |
| ENSG00000002586 | CD99      | adrenal gland  | glandular cells     | Medium       | Supported   |
| ENSG00000002586 | CD99      | appendix       | glandular cells     | Not detected | Supported   |
| ENSG00000002586 | CD99      | appendix       | lymphoid tissue     | Not detected | Supported   |
| ENSG00000002586 | CD99      | bone marrow    | hematopoietic cells | High         | Supported   |
| ENSG00000002586 | CD99      | breast         | adipocytes          | Not detected | Supported   |
| ENSG00000003056 | M6PR      | adipose tissue | adipocytes          | High         | Approved    |
| ENSG00000003056 | M6PR      | adrenal gland  | glandular cells     | High         | Approved    |
| ENSG00000003056 | M6PR      | appendix       | glandular cells     | High         | Approved    |
| ENSG00000003056 | M6PR      | appendix       | lymphoid tissue     | High         | Approved    |
| ENSG00000003056 | M6PR      | bone marrow    | hematopoietic cells | High         | Approved    |
+-----------------+-----------+----------------+---------------------+--------------+-------------+

预期输出:


+-----------+--------+-------------------------------+
| Gene name | Level  |            Tissue             |
+-----------+--------+-------------------------------+
| ENPP4     | Low    | adipose tissue, appendix      |
| ENPP4     | High   | adrenal gland, bronchus       |
| ENPP4     | Medium | appendix, breast, bone marrow |
| CD99      | Low    | adipose tissue, appendix      |
| CD99      | High   | bone marrow                   |
| CD99      | Medium | adrenal gland                 |
| ...       | ...    | ...                           |
+-----------+--------+-------------------------------+

使用的代码(从pandas 数据框中的多个 if else 条件中获取帮助并派生多个列):

def text_df(df):
    if (df[df['Level'].str.match('High')]):
        return (df.assign(Level='High') + df['Tissue'].astype(str))
    elif (df[df['Level'].str.match('Medium')]):
        return (df.assign(Level='Medium') + df['Tissue'].astype(str))
    elif (df[df['Level'].str.match('Low')]):
        return (df.assign(Level='Low') + df['Tissue'].astype(str))

df = df.apply(text_df, axis = 1)

错误:KeyError: ('Level', 'occurred at index 172')我不明白我做错了什么。有什么建议吗?

标签: pythonpandasdataframeif-statement

解决方案


尝试:

df.groupby(['Gene name','Level'], as_index=False)['Cell type'].agg(', '.join)

输出:

|    | Gene name   | Level        | Cell type                                                                                                       |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
|  0 | CD99        | High         | hematopoietic cells                                                                                             |
|  1 | CD99        | Low          | adipocytes                                                                                                      |
|  2 | CD99        | Medium       | glandular cells                                                                                                 |
|  3 | CD99        | Not detected | glandular cells     ,  lymphoid tissue     ,  adipocytes                                                        |
|  4 | ENPP4       | High         | glandular cells                                                                                                 |
|  5 | ENPP4       | Low          | adipocytes          ,  lymphoid tissue                                                                          |
|  6 | ENPP4       | Medium       | glandular cells     ,  hematopoietic cells                                                                      |
|  7 | M6PR        | High         | adipocytes          ,  glandular cells     ,  glandular cells     ,  lymphoid tissue     ,  hematopoietic cells |

根据以下评论添加更新:

(df.groupby(['Gene name','Level'], as_index=False)['Cell type']
   .agg(','.join).set_index(['Gene name','Level'])['Cell type']
   .unstack().reset_index())

输出:

| Gene name   |  High                                                                                                           |  Low                                   |  Medium                                    |  Not detected                                            |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99        | hematopoietic cells                                                                                             | adipocytes                             | glandular cells                            | glandular cells     ,  lymphoid tissue     ,  adipocytes |
| ENPP4       | glandular cells                                                                                                 | adipocytes          ,  lymphoid tissue | glandular cells     ,  hematopoietic cells | nan                                                      |
| M6PR        | adipocytes          ,  glandular cells     ,  glandular cells     ,  lymphoid tissue     ,  hematopoietic cells | nan                                    | nan                                        | nan                                                      |

推荐阅读