首页 > 解决方案 > 避免重复等效行

问题描述

def tokenized_dataset(self, dataset):
    tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")

    print("\n"+"="*10, "Start Tokenizing", "="*10)
    start = time.process_time()
    train_articles = [self.encode(document, tokenizer) for document in dataset["train"]["article"]]
    test_articles = [self.encode(document, tokenizer) for document in dataset["test"]["article"]]
    val_articles = [self.encode(document, tokenizer) for document in dataset["val"]["article"]]
    train_abstracts = [self.encode(document, tokenizer) for document in dataset["train"]["abstract"]]
    test_abstracts = [self.encode(document, tokenizer) for document in dataset["test"]["abstract"]]
    val_abstracts = [self.encode(document, tokenizer) for document in dataset["val"]["abstract"]]

    print("Time:", time.process_time() - start)
    print("=" * 10, "End Tokenizing", "="*10+"\n")

    return {"train": (dataset["train"]["id"], train_articles, train_abstracts),
            "test": (dataset["train"]["id"], test_articles, test_abstracts),
            "val": (dataset["val"]["id"], val_articles, val_abstracts)}

我有这段代码,我刚刚意识到我重复了 6 次或等效的代码,即[self.encode(document, tokenizer) for document in dataset...]. 有没有办法通过更自然和更少重复的东西来改变 6 条等效线的块?

标签: python

解决方案


您可以使用 python 函数轻松地做到这一点。

def get_values(x,y):
    return [self.encode(document, tokenizer) for document in dataset[x][y]]

推荐阅读