首页 > 解决方案 > JupyterLab / Python / Pandas - Comparing two Dataframes

问题描述

I am trying to import two files and compare value counts in df1 (data by state) and a number in a row for that state in df2.

In other words, in one Excel file I have something that looks like this:

State      Food
Arizona    Bananas
Arizona    Pears
Arizona    Pickles
Connecticut Potatoes
Connecticut Apples

Etc.

So from there I am interested in how many times the state appears, the value count.

In another file I have a column of the 50 states and a number.

What I am trying to do is basically create a dataframe that displays, by state, the number of times said state appears in df1 (so here Arizona would be 3) divided by the number in the column corresponding to Arizona in the second data frame. Does that make sense?

The second dataframe contains a total population for each state, so the output of 3/n above would be fruit per capita.

标签: pythonpandasdataframe

解决方案


The following will work

import numpy as np
import pandas as pd

df1 = pd.DataFrame({'state': ['Arizona', 'Arizona', 'Arizona', 
                              'Connecticut', 'Connecticut'],
                  'food': ['Bananas', 'Pears', 'Pickles', 'Potatoes', 'Apples']})
df2 = pd.DataFrame({'state':['Arizona', 'Connecticut'],
                  'population': [7300000, 3565000 ]})

df1 = df1.groupby('state').count().merge(df2.set_index('state'), 
                                      how = 'left', left_index = True, right_index = True)
df1['result'] = df1['food']/df1['population']
df1


          food      population   result
state           
Arizona     3       7300000      4.109589e-07
Connecticut 2       3565000      5.610098e-07

推荐阅读