首页 > 解决方案 > 在 pandas 中使用 groupby 作为日期时间值

问题描述

我正在使用此代码按年份对我的数据进行分组 df = pd.read_csv('../input/companies-info-wikipedia-2021/sparql_2021-11-03_22-25-45Z.csv')

df = pd.read_csv('../input/companies-info-wikipedia-2021/sparql_2021-11-03_22-25-45Z.csv')
df_duplicate_name = df[df.duplicated(['name'])]
df = df.drop_duplicates(subset='name').reset_index()
df = df.drop(['a','type','index'],axis=1).reset_index()
df = df[~df['foundation'].str.contains('[A-Za-z]', na=False)]
df = df.drop([140,214,220])
df['foundation'] = df['foundation'].fillna(0)
df['foundation'] = pd.to_datetime(df['foundation'])
df['foundation'] = df['foundation'].dt.year
df = df.groupby('foundation')

但因此它没有按基础值对其进行分组:

0   0   Deutsche EuroShop AG    1999    http://dbpedia.org/resource/Germany Investment in shopping centers  http://dbpedia.org/resource/Real_property   4   2.964E9 1.25E9  2.241E8 8.04E7
1   1   Industry of Machinery and Tractors  1996    http://dbpedia.org/resource/Belgrade    http://dbpedia.org/resource/Tractors    http://dbpedia.org/resource/Agribusiness    4   4.648E7 0.0 30000.0 -€0.47 million
2   2   TelexFree Inc.  2012    http://dbpedia.org/resource/Massachusetts   99  http://dbpedia.org/resource/Multi-level_marketing   7   did not disclose    did not disclose    did not disclose    did not disclose
3   3   (prev. Common Cents Communications Inc.)    2012    http://dbpedia.org/resource/United_States   99  http://dbpedia.org/resource/Multi-level_marketing   7   did not disclose    did not disclose    did not disclose    did not disclose
4   4   Bionor Holding AS   1993    http://dbpedia.org/resource/Oslo    http://dbpedia.org/resource/Health_care http://dbpedia.org/resource/Biotechnology   18  NOK 253 395 million NOK 203 320 million 1.09499E8   NOK 49 020 million
... ... ... ... ... ... ... ... ... ... ... ...
255 255 Ageas SA/NV 1990    http://dbpedia.org/resource/Belgium http://dbpedia.org/resource/Insurance   http://dbpedia.org/resource/Financial_services  45000   1.0872E11   1.348E10    1.112E10    9.792E8
256 256 Sharp Corporation   1912    http://dbpedia.org/resource/Japan   Televisions, audiovisual, home appliances, inf...   http://dbpedia.org/resource/Consumer_electronics    52876   NaN NaN NaN NaN
257 257 Erste Group Bank AG 2008    Vienna, Austria Retail and commercial banking, investment and ...   http://dbpedia.org/resource/Financial_services  47230   2.71983E11  1.96E10 6.772E9 1187000.0
258 258 Manulife Financial Corporation  1887    200 Asset management, Commercial banking, Commerci...   http://dbpedia.org/resource/Financial_services  34000   750300000000    47200000000 39000000000 4800000000
259 259 BP plc  1909    London, England, UK http://dbpedia.org/resource/Natural_gas http://dbpedia.org/resource/Petroleum_industry

我还尝试再次制作 pd.to_datetime 并按 dt.year 排序 - 但仍然不成功。

列名:

Index(['index', 'name', 'foundation', 'location', 'products', 'sector',
   'employee', 'assets', 'equity', 'revenue', 'profit'],
  dtype='object')

标签: pandasdataframe

解决方案


我认为你误解了它的groupby()工作原理。

你不能这样做df = df.groupby('foundation')groupby()不返回新的DataFrame. 相反,它返回 a GroupBy,它本质上只是从值分组到包含所有共享指定列的值的行的数据帧的映射。

例如,您可以使用以下代码打印每组中有多少行:

groups = df.groupby('foundation')
for val, sub_df in groups:
    print(f'{val}: {sub_df.shape[0]} rows')

推荐阅读