python-3.x - 如何从列表中选择符合条件的行和列
问题描述
假设我有一个 pandas 数据框,如下所示:
df1 = pd.DataFrame({"Item ID":["A", "B", "C", "D", "E"], "Value1":[1, 2, 3, 4, 0],
"Value2":[4, 5, 1, 8, 7], "Value3":[3, 8, 1, 2, 0],"Value4":[4, 5, 7, 9, 4]})
print(df1)
Item_ID Value1 Value2 Value3 Value4
0 A 1 4 3 4
1 B 2 5 8 5
2 C 3 1 1 7
3 D 4 8 2 9
4 E 0 7 0 4
现在我有了第二个数据框,如下所示:
df2 = {"Item ID":["A", "C", "D"], "Value5":[4, 5, 7]}
print(df2)
Item_ID Value5
0 A 4
1 C 5
2 D 7
我想要做的是找到我的两个数据框之间的项目 ID 匹配的位置,然后将“Value5”列值添加到行的交集并且仅来自 df1 的列 Value1 和 Value2 (这些列可以改变每次迭代,所以这些列需要包含在变量中)。
我的输出应该显示:
- 4 添加到 A 行,列“Value1”和“Value2”
- 5 添加到 C 行,列“Value1”和“Value2”
7 添加到 D 行,列“Value1”和“Value2”
Item_ID Value1 Value2 Value3 Value4 0 A 5 8 3 4 1 B 2 5 8 5 2 C 8 6 1 7 3 D 11 15 2 9 4 E 0 7 0 4
当然,我的数据有数千行。我可以使用 for 循环来做到这一点,但这需要的时间太长了。我希望能够以某种方式对其进行矢量化。有任何想法吗?
这就是我根据@sammywemmy 的建议最终做的事情
#Takes columns names and changes them into a list
names = df1.colnames.tolist()
#Merge df1 and df2 based on 'Item_ID'
merged = df1.merge(df2, on='Item_ID', how='outer')
for i in range(len(names)):
#using assign and **, we can bring in variable names with assign.
#Then add our Value 5 column
merged = merged.assign(**{names[i] : lambda x : x[names[i]] + x.Value5})
#Only keep all the columns before and including 'Value4'
df1= merged.loc[:,:'Value4']
解决方案
尝试这个:
#set 'Item ID' as the index
df1 = df1.set_index('Item ID')
df2 = df2.set_index('Item ID')
#create list of columns that you are interested in
list_of_cols = ['Value1','Value2']
#create two separate dataframes
#unselected will not contain the columns you want to add
unselected = df1.drop(list_of_cols,axis=1)
#this will contain the columns you wish to add
selected = df1.filter(list_of_cols)
#reindex df2 so it has the same indices as df1
#then convert to a series
#fill the null values with 0
A = df2.reindex(index=selected.index,fill_value=0).loc[:,'Value5']
#add the series A to selected
selected = selected.add(A,axis='index')
#combine selected and unselected into one dataframe
result = pd.concat([unselected,selected],axis=1)
#this part is extra to get ur dataframe back to the way it was
#assumption here is that it is value1, value 2, bla bla
#so 1>2>3
#if ur columns are not actually Value1, Value2,
#bla bla, then a different sorting has to be used
#alternatively before the calculations,
#you could create a mapping of the columns to numbers
#that will give u a sorting mechanism and
#restore ur dataframe after calculations are complete
columns = sorted(result.columns,key = lambda x : x[-1])
#reindex back to the way it was
result = result.reindex(columns,axis='columns')
print(result)
Value1 Value2 Value3 Value4
Item ID
A 5 8 3 4
B 2 5 8 5
C 8 6 1 7
D 11 15 2 9
E 0 7 0 4
替代解决方案,使用 python 的内置字典:
#create dictionaries
dict1 = (df1
#create temporary column
#and set as index
.assign(temp=df1['Item ID'])
.set_index('temp')
.to_dict('index')
)
dict2 = (df2
.assign(temp=df2['Item ID'])
.set_index('temp')
.to_dict('index')
)
list_of_cols = ['Value1','Value2']
intersected_keys = dict1.keys() & dict2.keys()
key_value_pair = [(key,col) for key in intersected_keys
for col in list_of_cols ]
#check for keys that are in both dict1 and 2
#loop through dict 1 and add values from dict2
#can be optimized with a dict comprehension
#leaving as is for better clarity IMHO
for key, val in key_value_pair:
dict1[key][val] = dict1[key][val] + dict2[key]['Value5']
#print(dict1)
{'A': {'Item ID': 'A', 'Value1': 5, 'Value2': 8, 'Value3': 3, 'Value4': 4},
'B': {'Item ID': 'B', 'Value1': 2, 'Value2': 5, 'Value3': 8, 'Value4': 5},
'C': {'Item ID': 'C', 'Value1': 8, 'Value2': 6, 'Value3': 1, 'Value4': 7},
'D': {'Item ID': 'D', 'Value1': 11, 'Value2': 15, 'Value3': 2, 'Value4': 9},
'E': {'Item ID': 'E', 'Value1': 0, 'Value2': 7, 'Value3': 0, 'Value4': 4}}
#create dataframe
pd.DataFrame.from_dict(dict1,orient='index').reset_index(drop=True)
Item ID Value1 Value2 Value3 Value4
0 A 5 8 3 4
1 B 2 5 8 5
2 C 8 6 1 7
3 D 11 15 2 9
4 E 0 7 0 4
推荐阅读
- unicode - 0xdcf0 的 UTF16 BIG ENDIAN 到 UTF8 转换失败
- c# - 在访问 LazyLoadObject.Value 之前使用 IsValueCreated
- sql - SELECT 语句查找列的 SUM
- ssl - 如何在 IIS 中为特定 URL 的所有请求要求双向 TLS
- javascript - 添加第 3 方子模块的类型
- javascript - 如何:单击按钮,将 html 跨度(文本)更改为任何输入值
- css - 将 font-display 属性注入所有 @font-face
- javascript - DOMException:无法在“元素”上执行“setAttribute”:“\#t”不是有效的属性名称
- c# - Azure AD 作为“外部提供者”?
- python - 如果在 xml 中找不到 xml 部分,有没有办法停止打印“无”