python - How to add a dataset identifier (like id column) when append two or more datasets?
问题描述
I have multiple datasets in csv format that I would like to import by appending. Each dataset has the same columns name (fields), but different values and length.
For example:
df1
date name surname age address
...
df2
date name surname age address
...
I would like to have
df=df1+df2
date name surname age address dataset
(df1) 1
... 1
(df2) 2
... 2
i.e. I would like to add a new column that is an identifier for dataset (where fields come from, if from dataset 1 or dataset 2).
How can I do it?
解决方案
Is this what you're looking for?
Note: Example has fewer columns that yours but the method is the same.
import pandas as pd
df1 = pd.DataFrame({
'name': [f'Name{i}' for i in range(5)],
'age': range(10, 15)
})
df2 = pd.DataFrame({
'name': [f'Name{i}' for i in range(20, 22)],
'age': range(20, 22)
})
combined = pd.concat([df1, df2])
combined['dataset'] = [1] * len(df1) + [2] * len(df2)
print(combined)
Output
name age dataset
0 Name0 10 1
1 Name1 11 1
2 Name2 12 1
3 Name3 13 1
4 Name4 14 1
0 Name20 20 2
1 Name21 21 2
推荐阅读
- python - Django 从数据库模型创建下拉列表并保存在另一个模型中
- android - ViewBinding - 包含的布局绑定导致未解析的引用
- python - OSError: [WinError 10049] L'adresse demandée n'est pas valide dans son contexte
- mongodb - 在 MongoDB 中从 $match、$and 中动态排除字段
- elasticsearch - 在哪个弹性堆栈版本中引入了观察者?
- android - Android:需要下载文件夹 Uri 而不是路径
- html - 如何使用输入类型范围将文本添加到滑块拇指?
- lua - 如果参数是硬编码字符串或包含所述字符串的变量,则 Nmap NSE 脚本函数的行为会有所不同
- input - Snakemake多个输入文件扩展但没有重复
- node.js - 使用需要几秒钟才能完成的自动完成脚本时,Bash 中的 Spinner 功能