首页 > 解决方案 > 加速多标签分类器的评估

问题描述

我有一个spacy用于多标签分类问题的文本分类器。评估它需要长时间。

它不是得到需要很长时间的概率,而是计算对数损失精度、召回率和 fscore。

from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import log_loss
import time


def evaluate(
    nlp, texts, cats, labels, threshold=0.3, beta=0.5, batch_size=8
):
    t0 = time.time()
    docs = nlp.pipe(texts, batch_size=batch_size)
    t1 = time.time()
    pred_probs = np.array([list(doc.cats.values()) for doc in docs])

    avg_log_loss = log_loss(cats, pred_probs)
    results = {'log_loss': avg_log_loss}
    y_pred = pred_probs > thresh
    prc, rec, fscore, _ = precision_recall_fscore_support(
      y_true=cats, y_pred=y_pred, beta=beta, average='micro', warn_for=set()
    )
    results[f'f{beta}_{thresh}'] = fscore
    results[f'prc_{thresh}'] = prc
    results[f'rec_{thresh}'] = rec
    t2 = time.time()
    print(f"Used {t1-t0} on predicting; {t2-t1} on scoring")
    return results

哪个输出Used 4.76837158203125e-06 on predicting; 377.1225287914276 on scoring 有什么办法可以加快速度?

标签: python-3.xscikit-learnspacy

解决方案


推荐阅读