首页 > 解决方案 > calculate the median age for each region from frequency table with python

问题描述

I have a dataframe that is similar to:

enter image description here

I would like to calculate the median age for each city but given that it is a frequency table I'm finding it somewhat tricky. Is there any function in pandas or other that would help me achieve this?

标签: pythonpandasfrequencymedian

解决方案


For each row, find the number of instances there are. Then take that number, divide by 2, and determine what age that would be by checking if the number of people have the age smaller than what we are looking for.

For example, for the row 'alabama', you would add 34 + 67 + ... + 23 = 5463. That, divided by 2, would be 2731.5 ==> 2731. Then, checking each age group, determine where the 2731th person would be.

  • At age 1, since 2731 > 34, check the next.
  • At age 2, since 2731 > 34 + 67, check the next.
  • At age 3, since 2731 > 34 + 67 + 89, check the next.
  • At age 4, since 2731 > 34 + 67 + 89 + 89, check the next.
  • At age 5, since 2731 > 34 + 67 + 89 + 89 + 67, check the next.
  • At age 6, since 2731 > 34 + 67 + 89 + 89 + 67 + 545, check the next.
  • At age 7, since 2731 < 34 + 67 + 89 + 89 + 67 + 545 + 4546, the median has to be in this age group.

Do this repeatedly for each city/state, and you should get the median for each one.


推荐阅读