首页 > 解决方案 > 为什么 df.value_counts() 的计数总和与 df 的总行数不同?

问题描述

我有一个数据框day_1,我想在其中计算每个唯一行的计数。我这样做了day_1.value_counts()。很奇怪,day_1.shape[0]与 不同np.sum(day_1.value_counts())

你能详细说明一下这个问题吗?

import pandas as pd

# Import dataset
path = r'https://raw.githubusercontent.com/leanhdung1994/BigData/main/2_days.csv'
trends = pd.read_csv(path, header = 0, low_memory = False)

# Subset to a specific day
day_1 = trends[trends.date == '2021-01-01']

# Remove unused columns
columns_to_drop = ['date', 'hour', 'id', 'year', 'month', 'sentence',
                   'offset', 'span', 'value', 'container']
day_1 = day_1.drop(columns = columns_to_drop)

print('The total number of rows is', day_1.shape[0])
print('The total number of rows of all groups is', np.sum(day_1.value_counts()))

标签: pythonpandas

解决方案


推荐阅读