首页 > 解决方案 > 如何从 python 中的 jupyter notebook 中的数据集中获取三列输出

问题描述

问题:将婚姻状况变量 DMDMARTL 重新标记为具有简短但信息丰富的字符标签。然后为所有人构建这些值的频率表,然后仅针对女性,仅针对男性。然后只使用年龄在 30 到 40 之间的人构建这三个频率表。现在我已经完成了除了男性和女性 DMDMARTL 之间的所有 30 到 40 以下是到目前为止的整个代码,这是数据集的链接:https: //raw.githubusercontent.com/Mauliklm10/Cartwheel.csv/master/datasetNHANES.csv

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np

da = pd.read_csv("nhanes_2015_2016.csv") # this is where the dataset link will be entered

# prints the data in descending order
da.DMDMARTL.value_counts()

# We are now giving the numbers actual variable names
# The new relabeled variable will be a string first
# all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Married, Divorced etc.
da["DMDMARTLV2"] = da.DMDMARTL.replace({1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never_Married",
                                     6:"Living_With_Partner",77:"Refused",99:"Dont_Know"})
da.DMDMARTLV2.value_counts()

# Below is the way to find out the values that have been lost/are missing
pd.isnull(da.DMDMARTLV2).sum()

# We are relabeling the Gender variable as well as we will we working on them as well
# we relabel so that any changes will not be made to the roiginal dataset and 
# also all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Male and Female
da["RIAGENDRV2"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})

# We figure out that the numbers dont add up meaning there are some missing values 
# and so we get all those values by the .fillna method
da["DMDMARTLV2"] = da.DMDMARTLV2.fillna("Missing")
da.DMDMARTLV2.value_counts()

# this is to get the frequency table for Females and Males individually
da.groupby("RIAGENDRV2")["DMDMARTLV2"].value_counts()

# this is to get the agegroup 30 to 40
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")["DMDMARTLV2"].value_counts()
# this is to get the agegroup 30 to 40 with males and females
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")("RIAGENDRV2")["DMDMARTLV2"].value_counts()

上面的代码给了我一个 TypeError: 'DataFrameGroupBy' object is not callable。

标签: pythonpandasjupyter-notebook

解决方案


我得到了答案,不需要再回答这个帖子了:代码行是:da.groupby(["agegrp", "RIAGENDRV2"])["DMDMARTLV2"].value_counts()


推荐阅读