python - JupyterLab / Python / Pandas - Comparing two Dataframes
问题描述
I am trying to import two files and compare value counts in df1 (data by state) and a number in a row for that state in df2.
In other words, in one Excel file I have something that looks like this:
State Food
Arizona Bananas
Arizona Pears
Arizona Pickles
Connecticut Potatoes
Connecticut Apples
Etc.
So from there I am interested in how many times the state appears, the value count.
In another file I have a column of the 50 states and a number.
What I am trying to do is basically create a dataframe that displays, by state, the number of times said state appears in df1 (so here Arizona would be 3) divided by the number in the column corresponding to Arizona in the second data frame. Does that make sense?
The second dataframe contains a total population for each state, so the output of 3/n above would be fruit per capita.
解决方案
The following will work
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'state': ['Arizona', 'Arizona', 'Arizona',
'Connecticut', 'Connecticut'],
'food': ['Bananas', 'Pears', 'Pickles', 'Potatoes', 'Apples']})
df2 = pd.DataFrame({'state':['Arizona', 'Connecticut'],
'population': [7300000, 3565000 ]})
df1 = df1.groupby('state').count().merge(df2.set_index('state'),
how = 'left', left_index = True, right_index = True)
df1['result'] = df1['food']/df1['population']
df1
food population result
state
Arizona 3 7300000 4.109589e-07
Connecticut 2 3565000 5.610098e-07
推荐阅读
- java - VaadinSession 在销毁期间不拥有锁导致 AssertionError
- u-boot - 使用 buildroot 自定义 u-boot 环境变量
- matlab - 如何使用 Octave 对信号进行下采样?
- c# - System.ComponentModel.DataAnnotations.Schema.ColumnAttribute 不一致?
- mysql - ORACLE、SQL-SERVER、HSQL 支持 JDBC 连接中的 rewriteBatchedStatements?
- java - 分组和双重排序列表
- mongodb - Mongo中的子字符串聚合
- mongodb - MongoDb 查询:基于日期间隔中的所有日期和附加条件
- sql - 在存储函数 Postgresql 中选择和更新
- ssl - SSL 握手与 PrivateKey 说明