pandas - 在 pandas 中使用 groupby 作为日期时间值
问题描述
我正在使用此代码按年份对我的数据进行分组 df = pd.read_csv('../input/companies-info-wikipedia-2021/sparql_2021-11-03_22-25-45Z.csv')
df = pd.read_csv('../input/companies-info-wikipedia-2021/sparql_2021-11-03_22-25-45Z.csv')
df_duplicate_name = df[df.duplicated(['name'])]
df = df.drop_duplicates(subset='name').reset_index()
df = df.drop(['a','type','index'],axis=1).reset_index()
df = df[~df['foundation'].str.contains('[A-Za-z]', na=False)]
df = df.drop([140,214,220])
df['foundation'] = df['foundation'].fillna(0)
df['foundation'] = pd.to_datetime(df['foundation'])
df['foundation'] = df['foundation'].dt.year
df = df.groupby('foundation')
但因此它没有按基础值对其进行分组:
0 0 Deutsche EuroShop AG 1999 http://dbpedia.org/resource/Germany Investment in shopping centers http://dbpedia.org/resource/Real_property 4 2.964E9 1.25E9 2.241E8 8.04E7
1 1 Industry of Machinery and Tractors 1996 http://dbpedia.org/resource/Belgrade http://dbpedia.org/resource/Tractors http://dbpedia.org/resource/Agribusiness 4 4.648E7 0.0 30000.0 -€0.47 million
2 2 TelexFree Inc. 2012 http://dbpedia.org/resource/Massachusetts 99 http://dbpedia.org/resource/Multi-level_marketing 7 did not disclose did not disclose did not disclose did not disclose
3 3 (prev. Common Cents Communications Inc.) 2012 http://dbpedia.org/resource/United_States 99 http://dbpedia.org/resource/Multi-level_marketing 7 did not disclose did not disclose did not disclose did not disclose
4 4 Bionor Holding AS 1993 http://dbpedia.org/resource/Oslo http://dbpedia.org/resource/Health_care http://dbpedia.org/resource/Biotechnology 18 NOK 253 395 million NOK 203 320 million 1.09499E8 NOK 49 020 million
... ... ... ... ... ... ... ... ... ... ... ...
255 255 Ageas SA/NV 1990 http://dbpedia.org/resource/Belgium http://dbpedia.org/resource/Insurance http://dbpedia.org/resource/Financial_services 45000 1.0872E11 1.348E10 1.112E10 9.792E8
256 256 Sharp Corporation 1912 http://dbpedia.org/resource/Japan Televisions, audiovisual, home appliances, inf... http://dbpedia.org/resource/Consumer_electronics 52876 NaN NaN NaN NaN
257 257 Erste Group Bank AG 2008 Vienna, Austria Retail and commercial banking, investment and ... http://dbpedia.org/resource/Financial_services 47230 2.71983E11 1.96E10 6.772E9 1187000.0
258 258 Manulife Financial Corporation 1887 200 Asset management, Commercial banking, Commerci... http://dbpedia.org/resource/Financial_services 34000 750300000000 47200000000 39000000000 4800000000
259 259 BP plc 1909 London, England, UK http://dbpedia.org/resource/Natural_gas http://dbpedia.org/resource/Petroleum_industry
我还尝试再次制作 pd.to_datetime 并按 dt.year 排序 - 但仍然不成功。
列名:
Index(['index', 'name', 'foundation', 'location', 'products', 'sector',
'employee', 'assets', 'equity', 'revenue', 'profit'],
dtype='object')
解决方案
我认为你误解了它的groupby()
工作原理。
你不能这样做df = df.groupby('foundation')
。groupby()
不返回新的DataFrame
. 相反,它返回 a GroupBy
,它本质上只是从值分组到包含所有共享指定列的值的行的数据帧的映射。
例如,您可以使用以下代码打印每组中有多少行:
groups = df.groupby('foundation')
for val, sub_df in groups:
print(f'{val}: {sub_df.shape[0]} rows')
推荐阅读
- oracle - R2DBC - Oracle 数据库支持
- java - 流如何在内部处理数据?
- angular - 从Angular8中的链接中删除Html编码字符
- cmake - CMake set_property() 似乎不适用于源文件
- javascript - PinchZoom.js 与 iOS 设备上的 Owl Carousel 不兼容
- c# - jqGrid 树网格版本 4.5.2 无法让树工作
- angular - MAT_DATEPICKER_VALIDATORS 的错误代码是什么?
- sql-server - Docker 上 SQLserver 中 FILESTREAM 功能的替代方案
- javascript - Fabric.js loadFromJSON 将对象分配给未正确转换的自定义属性
- postgresql - PostgreSQL 将 bytea 数据直接转储到文件