首页 > 解决方案 > 谷歌 colab 中的 ScispaCy

问题描述

我正在尝试 在colab中使用ScispaCy构建临床数据的NER模型。我已经安装了这样的软件包。

!pip install spacy
!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz       #pip install <Model URL>```

然后我使用

import scispacy
import spacy
import en_core_sci_md

然后使用以下代码显示句子和实体

nlp = spacy.load("en_core_sci_md")
text ="""Myeloid derived suppressor cells (MDSC) are immature myeloid cells with immunosuppressive activity. They accumulate in tumor-bearing mice and humans with different types of cancer, including hepatocellular carcinoma (HCC)""" 
doc = nlp(text)
print(list(doc.sents))
print(doc.ents)

我收到以下错误

OSError: [E050] Can't find model 'en_core_sci_md'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

我不知道为什么会出现这个错误,我遵循了 ScispaCy 官方 GitHub 帖子中的所有代码。任何帮助,将不胜感激。提前致谢。

标签: pythonnlpspacynamed-entity-recognition

解决方案


我希望我不会太晚......我相信你非常接近正确的方法。

我将逐步写下我的答案,您可以选择在哪里停止。

步骤1)

#Install en_core_sci_lg package from the website of spacy  (large corpus), but you can also use en_core_sci_md for the medium corpus.
       
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_lg-0.2.4.tar.gz 

第2步)

# Import the large dataset
import en_core_sci_lg

步骤 3)

# Identify entities
nlp = en_core_sci_lg.load()
doc = nlp(text)
displacy_image = displacy.render(doc, jupyter = True, style = "ent")

第4步)

#Print only the entities
print(doc.ents)

步骤 5)

# Save the result 
save_res = [doc.ents]
save_res

步骤 6)

#Save the results to a dataframe
df_save_res = pd.DataFrame(save_res)
df_save_res

步骤 7)

# In case that you want to visualise the dependency parse
  displacy_image = displacy.render(doc, jupyter = True, style = "dep")

推荐阅读