python - 字典的平均值
问题描述
使用来自 sklearn 的 iris 数据集。我正在应用感知器拆分数据,在字典中记录分数,该字典将用于拟合模型的样本大小(键)映射到相应的分数(作为元组的训练和测试分数)
当我运行循环 3 次时,这给出了 3 个字典。如何找到 3 次迭代的平均分数?我试图将字典存储在一个列表和平均值中,但它没有用
例如:如果字典是
{21: (0.85, 0.82), 52: (0.80, 0.62), 73: (0.82, 0.45), 94: (0.81, 0.78)}
{21: (0.95, 0.91), 52: (0.80, 0.89), 73: (0.84, 0.87), 94: (0.79, 0.41)}
{21: (0.809, 0.83), 52: (0.841, 0.77), 73: (0.84, 0.44), 94: (0.79, 0.33)}
输出应该是{21:(0.869,0.853),52.....}
键 21 的值的第一个元素是 0.85+0.95+0.809/3,第二个元素是 0.82+0.91+0.83/3
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
score_list=shape_list=[]
iris = load_iris()
props=[0.2,0.5,0.7,0.9]
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
y=df[list(df.loc[:,df.columns.values =='target'])]
X=df[list(df.loc[:,df.columns.values !='target'])]
# number of trials
for i in range(3):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, train_size=0.7)
results = {}
for i in props:
size = int(i*len(X_train))
ix = np.random.choice(X_train.index, size=size, replace = False)
sampleX = X_train.loc[ix]
sampleY = y_train.loc[ix]
#apply model
modelP = Perceptron(tol=1e-3)
modelP.fit(sampleX, sampleY)
train_score = modelP.score(sampleX,sampleY)
test_score = modelP.score(X_test,y_test)
#store in dictionary
results[size] = (train_score, test_score)
print(results)
另外,如果有人知道统计数据,有没有办法在试验中找到标准误差并打印每个样本量(字典键)的平均标准误差?
解决方案
- 更新现有循环以保存
results
到list
,rl
- 加载
rl
到数据框中,因为您已经在使用 pandas - 将 的列展开
tuples
为单独的列 - 用于
.agg
获取指标 - 测试
python 3.8
和pandas 1.3.1
f-strings
(例如f'TrS{c}', f'TeS{c}'
)要求python >= 3.6
对现有代码的更新
# select columns for X and y
y = df.loc[:, 'target']
X = df.loc[:, iris['feature_names']]
# number of trials
rl = list() # add: save results to a list
for i in range(3):
...
results = {}
for i in props:
...
...
rl.append(results) # add: append results
获取指标的新代码
- 转换
metrics
为 alist
oftuples
比 atuple
of容易tuples
,因为 atuple
一旦创建就不可更改。这意味着tuples
可以添加到现有的list
,但不能添加到现有的tuple
。- 因此,更容易使用
defaultdict
创建 alist
oftuples
,然后将每个值转换为 atuple
withmap
。 k[3:]
要求数字始终从index 3
- 因此,更容易使用
from collections import defaultdict
# convert rl to a dataframe
rl = [{21: (0.5714285714285714, 0.6888888888888889), 52: (0.6153846153846154, 0.7111111111111111), 73: (0.7123287671232876, 0.6222222222222222), 94: (0.7127659574468085, 0.6)}, {21: (0.6190476190476191, 0.6444444444444445), 52: (0.6923076923076923, 0.6444444444444445), 73: (0.3698630136986301, 0.35555555555555557), 94: (0.7978723404255319, 0.7777777777777778)}, {21: (0.8095238095238095, 0.5555555555555556), 52: (0.7307692307692307, 0.5555555555555556), 73: (0.7534246575342466, 0.5777777777777777), 94: (0.6170212765957447, 0.7555555555555555)}]
df = pd.DataFrame(rl)
# display(df)
21 52 73 94
0 (0.5714285714285714, 0.6888888888888889) (0.6153846153846154, 0.7111111111111111) (0.7123287671232876, 0.6222222222222222) (0.7127659574468085, 0.6)
1 (0.6190476190476191, 0.6444444444444445) (0.6923076923076923, 0.6444444444444445) (0.3698630136986301, 0.35555555555555557) (0.7978723404255319, 0.7777777777777778)
2 (0.8095238095238095, 0.5555555555555556) (0.7307692307692307, 0.5555555555555556) (0.7534246575342466, 0.5777777777777777) (0.6170212765957447, 0.7555555555555555)
# expand the tuples
for c in df.columns:
df[[f'TrS{c}', f'TeS{c}']] = pd.DataFrame(df[c].tolist(), index= df.index)
df.drop(c, axis=1, inplace=True)
# get the mean and std
metrics = df.agg(['mean', 'std']).round(3)
# display(metrics)
TrS21 TeS21 TrS52 TeS52 TrS73 TeS73 TrS94 TeS94
mean 0.667 0.630 0.679 0.637 0.612 0.519 0.709 0.711
std 0.126 0.068 0.059 0.078 0.211 0.143 0.090 0.097
# convert to dict
dd = defaultdict(list)
for k, v in metrics.to_dict().items():
dd[int(k[3:])].append(tuple(v.values()))
dd = dict(zip(dd, map(tuple, dd.values())))
print(dd)
[out]:
{21: ((0.667, 0.126), (0.63, 0.068)),
52: ((0.679, 0.059), (0.637, 0.078)),
73: ((0.612, 0.211), (0.519, 0.143)),
94: ((0.709, 0.09), (0.711, 0.097))}
推荐阅读
- kivy - 如何更改 MDI 图标按钮的颜色
- c# - 打印图像 ESC/POS
- flutter - Flutter 显示来自模拟 .json 文件的数据
- amazon-web-services - Route 53 DNS 故障转移
- awk - 如何使用 awk 将每一行写入单独的文件?
- javascript - 无法使用 npm 安装 gatsby-plugin-transition-link
- firebase - 使用 @nuxtjs/firebase 在 Nuxt 中使用 Cloud Firestore 生成动态路由
- azure - 用于 OneDrive for business 代码流的重定向 url 是什么?
- firebase - Google Analytics 事件参数仅显示在过去 30 分钟内的事件中
- .net-core - 通过自定义 NuGet 包向 appsettings.json 添加日志记录条目