python - 如果是熊猫数据框中的条件并提取列值
问题描述
我有这个数据框(df),看起来像
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| Gene | Gene name | Tissue | Cell type | Level | Reliability |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| ENSG00000001561 | ENPP4 | adipose tissue | adipocytes | Low | Approved |
| ENSG00000001561 | ENPP4 | adrenal gland | glandular cells | High | Approved |
| ENSG00000001561 | ENPP4 | appendix | glandular cells | Medium | Approved |
| ENSG00000001561 | ENPP4 | appendix | lymphoid tissue | Low | Approved |
| ENSG00000001561 | ENPP4 | bone marrow | hematopoietic cells | Medium | Approved |
| ENSG00000002586 | CD99 | adipose tissue | adipocytes | Low | Supported |
| ENSG00000002586 | CD99 | adrenal gland | glandular cells | Medium | Supported |
| ENSG00000002586 | CD99 | appendix | glandular cells | Not detected | Supported |
| ENSG00000002586 | CD99 | appendix | lymphoid tissue | Not detected | Supported |
| ENSG00000002586 | CD99 | bone marrow | hematopoietic cells | High | Supported |
| ENSG00000002586 | CD99 | breast | adipocytes | Not detected | Supported |
| ENSG00000003056 | M6PR | adipose tissue | adipocytes | High | Approved |
| ENSG00000003056 | M6PR | adrenal gland | glandular cells | High | Approved |
| ENSG00000003056 | M6PR | appendix | glandular cells | High | Approved |
| ENSG00000003056 | M6PR | appendix | lymphoid tissue | High | Approved |
| ENSG00000003056 | M6PR | bone marrow | hematopoietic cells | High | Approved |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
预期输出:
+-----------+--------+-------------------------------+
| Gene name | Level | Tissue |
+-----------+--------+-------------------------------+
| ENPP4 | Low | adipose tissue, appendix |
| ENPP4 | High | adrenal gland, bronchus |
| ENPP4 | Medium | appendix, breast, bone marrow |
| CD99 | Low | adipose tissue, appendix |
| CD99 | High | bone marrow |
| CD99 | Medium | adrenal gland |
| ... | ... | ... |
+-----------+--------+-------------------------------+
使用的代码(从pandas 数据框中的多个 if else 条件中获取帮助并派生多个列):
def text_df(df):
if (df[df['Level'].str.match('High')]):
return (df.assign(Level='High') + df['Tissue'].astype(str))
elif (df[df['Level'].str.match('Medium')]):
return (df.assign(Level='Medium') + df['Tissue'].astype(str))
elif (df[df['Level'].str.match('Low')]):
return (df.assign(Level='Low') + df['Tissue'].astype(str))
df = df.apply(text_df, axis = 1)
错误:KeyError: ('Level', 'occurred at index 172')
我不明白我做错了什么。有什么建议吗?
解决方案
尝试:
df.groupby(['Gene name','Level'], as_index=False)['Cell type'].agg(', '.join)
输出:
| | Gene name | Level | Cell type |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
| 0 | CD99 | High | hematopoietic cells |
| 1 | CD99 | Low | adipocytes |
| 2 | CD99 | Medium | glandular cells |
| 3 | CD99 | Not detected | glandular cells , lymphoid tissue , adipocytes |
| 4 | ENPP4 | High | glandular cells |
| 5 | ENPP4 | Low | adipocytes , lymphoid tissue |
| 6 | ENPP4 | Medium | glandular cells , hematopoietic cells |
| 7 | M6PR | High | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells |
根据以下评论添加更新:
(df.groupby(['Gene name','Level'], as_index=False)['Cell type']
.agg(','.join).set_index(['Gene name','Level'])['Cell type']
.unstack().reset_index())
输出:
| Gene name | High | Low | Medium | Not detected |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99 | hematopoietic cells | adipocytes | glandular cells | glandular cells , lymphoid tissue , adipocytes |
| ENPP4 | glandular cells | adipocytes , lymphoid tissue | glandular cells , hematopoietic cells | nan |
| M6PR | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells | nan | nan | nan |
推荐阅读
- html - 显示嵌套下拉菜单,溢出隐藏设置为父下拉菜单
- git - 有没有办法撤消或恢复到以前的 node_modules
- python - 用原生 python 替换 struct 的解包
- rest - 如何将具有不同参数的 api 调用合并为 1 个调用
- optimization - 在调度中获得价值
- java - 在主集群的完全故障转移期间管理与 couchbase 辅助集群的应用程序连接
- ios - iOS(2021)中的 Firebase crashlytics 脚本无法通过 Testflight 运行
- ajax - 在 Laravel 中切换活动/非活动状态
- java - 如何在 Java 中包含和比较字符串和数组
- c# - 用于编辑 WordprocessingDocument 的 wpf 控件