python - pandas:使用字典的键对 Serie 的值进行聚类
问题描述
我正在研究一个 DataFrame,就像这个:
DF_draft = pd.DataFrame(data={"subdivision_name" : ["01","02","03","04","05"],
"day": ["2020-03-18","2020-03-18","2020-03-18","2020-03-18","2020-03-18"],
"data":[0,11,12,13,2]})
subdivision_name day data
0 01 2020-03-18 0
1 02 2020-03-18 11
2 03 2020-03-18 12
3 04 2020-03-18 13
4 05 2020-03-18 2
我试图通过遵循这样的字典来保留一些 subdivision_name 行:
reference_dict = {"Area Alpha": ["01","03","04","15","26","38","42","43","63","69","73","74"],
"Area Beta" : ["21","25","39","58","70","71","89","90"],
"Area Gaga" : ["02","01","07","57","88","67","68","54"]}
我的目标是将 reference_dict 的几个键之一放在一个列表中,然后将其用作函数的参数。例如,["Area Alpha"] 的结果将是一个 DF,如:
Area Alpha day
25 2020-03-18
reference_dict 的 key 变成了 Serie 的名字,我们在 reference_dict["Area Alpha"] 中添加细分“01”、“03”和“04”的值
我开始使用此功能进行测试:
def area_cluster(DF, one_list):
for area in one_list:
if area in reference_dict:
condition = DF["subdivision_name"].any() in reference_dict[area]
new_DF = DF[condition]
else:
print("Wrong orthograph")
return new_DF
我测试过
DF = area_cluster(DF_draft, ["Area Alpha"])
并得到了这个错误
KeyError Traceback (most recent call last)
<ipython-input-131-46075ea2a017> in <module>
----> 1 DF = area_cluster(DF_draft, ["Area Alpha"])
<ipython-input-129-a86847d436dd> in area_cluster(DF, one_list)
3 if area in subdivision_dict:
4 condition = DF["subdivision_name"].any() in subdivision_dict[area]
----> 5 new_DF = DF[condition]
6 else:
7 print("Wrong orthograph")
c:\users\raphael\appdata\local\programs\python\python39\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2904 if self.columns.nlevels > 1:
2905 return self._getitem_multilevel(key)
-> 2906 indexer = self.columns.get_loc(key)
2907 if is_integer(indexer):
2908 indexer = [indexer]
c:\users\raphael\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: True
也许这是一个索引问题,但我不知道如何解决它。
解决方案
IIUC,你可以这样做:
import pandas as pd
DF_draft = pd.DataFrame(data={"subdivision_name": ["01", "02", "03", "04", "05"],
"day": ["2020-03-18", "2020-03-18", "2020-03-18", "2020-03-18", "2020-03-18"],
"data": [0, 11, 12, 13, 2]})
reference_dict = {"Area Alpha": ["01", "03", "04", "15", "26", "38", "42", "43", "63", "69", "73", "74"],
"Area Beta": ["21", "25", "39", "58", "70", "71", "89", "90"],
"Area Gaga": ["02", "01", "07", "57", "88", "67", "68", "54"]}
def area_cluster(df, one_list):
ss = []
for area in one_list:
if area in reference_dict:
r = df.assign(area=df["subdivision_name"].map(dict.fromkeys(reference_dict[area], area)))
r = r.dropna().groupby(['area', 'day'])['data'].sum().reset_index()
ss.append(r)
return pd.concat(ss, ignore_index=True)
res = area_cluster(DF_draft, ['Area Alpha', "Area Beta", "Area Gaga"])
print(res)
输出
area day data
0 Area Alpha 2020-03-18 25
1 Area Gaga 2020-03-18 11
推荐阅读
- python-3.x - 具有范围或列表的 Numpy repack_fields 分配内存
- javascript - 向@react-navigation/drawer 组件发送道具
- html - 将背景图像准确定位到屏幕的一半
- flutter - 为什么我的 chart_flutter 在我快速点击时会崩溃?(给出错误'_drawAreaBoundsOutdated == false':不正确。)
- c++ - std::make_tuple 与 c++20 中被操纵的 std::tuple 取消概念的输入
- powershell - PowerShell Parallels Jobs 速度改进
- google-cloud-platform - 如何以编程方式获取当前的谷歌资源价格?
- c - 包含结构数组的结构的动态分配
- python-3.x - 如何删除列表中的字符串?
- asp.net-mvc - 在 mvc EF6 项目中获取数据的方法 - 优点和不方便