r - 如何从字符串中查找特定单词并通过这些单词合并变量 2
问题描述
我问过同样的问题,但这个话题仍然有一些问题。
假设我有数据集 A 像:
**Name**
Liver cell carcinoma
Stomach, unspecified
Malignant neoplasm of rectum
Lumbar and other intervertebral disc disorders with radiculopathy
Bronchus or lung, unspecified
Cerebral infarction, unspecified
Pneumonia, unspecified
Headache
Spinal stenosis, lumbar region
Other specified intervertebral disc displacement
Sigmoid colon
Calculus of ureter
Colon, unspecified
Concussion, without open intracranial wound
Malignant neoplasm of thyroid gland
Breast, unspecified
Other and unspecified cirrhosis of liver
Chronic viral hepatitis B without delta- agent
Dizziness and giddiness
Tension-type headache
Malignant neoplasm of stomach, unspecified, unspecified
Cervical disc disorder with radiculopathy
Malignant neoplasm of bronchus or lung, unspecified, unspecified side
Chest pain, unspecified
Gastroenteritis and colitis of unspecified origin
Bronchiectasis
Concussion
Body of stomach
Acute tubulo-interstitial nephritis
Traumatic subdural haemorrhage, without open intracranial wound
Abnormal findings on diagnostic imaging of lung
Angina pectoris, unspecified
Other disorders of lung
Ascending colon
Essential(primary) hypertension
Pyloric antrum
Intrahepatic bile duct carcinoma
Cervix uteri, unspecified
Gastro-oesophageal reflux disease with oesophagitis
Liver
Fracture of nasal bone, closed
Malignant neoplasm of rectosigmoid junction
Open wound of scalp
Other cerebral infarction
Cerebral aneurysm, nonruptured
Malignant neoplasm of kidney, except renal pelvis
Malignant neoplasm of prostate
Unspecified abdominal pain
而且,数据集 B 就像:
Part Key
Abdominal abdomen
Abdominal abdominal
Other acute myeloblastic leukaemia
Abdominal adrenal
Head allergic rhinitis
Head Alzheimer's
Abdominal ampulla
Abdominal aneurysm
Chest angina
Abdominal antrum
Chest aorta
Abdominal appendicitis
Head arteries
Abdominal ascites
Chest asthma
Abdominal back
other b-cell lymphoma
Abdominal bile duct
Abdominal biliary tract
Abdominal bladder
Head brain
Chest breast
Chest Bronchiectasis
Chest bronchitis
Chest bronchopneumonia
Chest bronchus
Abdominal C64
Abdominal caecum
Abdominal cardia
Head cavity
Head cerebral
Chest cerebrovascular
Head cerebrovascular
Abdominal cervical
Abdominal cervix
Other chemotherapy session for neoplasm
Chest chest
Abdominal cholangitis
Abdominal cholecystitis
Chest circulatorycomplications
Abdominal colon
Head concussion
other connective and soft tissue, unspecified
Head convulsions
Chest Cough
Lung cough
我运行了以下代码:
result <-A %>%
mutate(key = gsub(paste0(".*(", paste(B$key, collapse = "|"), ").*"),"\\1",tolower(A$NAME))) %>%
left_join(B)
结果有一些重复的行。
创建我想要的数据集的最佳代码是什么?我希望我的结果表如下:
Name Key Part
Liver cell carcinoma liver Abdominal
Stomach, unspecified stomach Abdominal
解决方案
使用此处发布的数据,并留在dplyr
世界上,您可以应用一个distinct
功能:
tmp %>%
mutate(key = gsub(paste0(".*(", paste(tmp2$key, collapse = "|"), ").*"), "\\1",tolower(tmp$Disease_name))) %>%
left_join(tmp2) %>% distinct()
Joining, by = "key"
Disease_name key parts
1 (J189)Pneumonia, unspecified pneumonia Chest
2 (R51)Headache headache Head
3 (M4806)Spinal stenosis, lumbar region spinal Abdominal
4 (M512)Other specified intervertebral disc displacement intervertebral Abdominal
5 (C187)Sigmoid colon colon Abdominal
6 (N201)Calculus of ureter ureter Abdominal
7 (C189)Colon, unspecified colon Abdominal
8 (S0600)Concussion, without open intracranial wound concussion Head
9 (C73)Malignant neoplasm of thyroid gland thyroid Neck
10 (C509)Breast, unspecified breast Chest
11 (K746)Other and unspecified cirrhosis of liver liver Abdominal
12 (B181)Chronic viral hepatitis B without delta- agent hepatitis Abdominal
13 (R42)Dizziness and giddiness giddiness Head
推荐阅读
- rest - 使用 URL 映射的 Grails Restful Web 服务
- json - 图表中的 React + json 输出
- pandas - 我错过了什么?sklearn 适合模块
- angular - Angular- 日期选择器禁用验证
- cordova - Cordova 插件视频编辑器:Android 操作系统不支持 AVC 视频配置文件,实际 profile_idc:100
- amazon-web-services - Solr 5.1 主从配置是否可以在 AWS 中使用?
- c# - OpenTl c# IDialogs 没有扩展 CS1061
- java - 如何禁用 Android 本机代码中的日志?
- amazon-web-services - 角色的 AWS CDK Bootstrap 自定义信任关系策略
- node.js - 如果一个 npm 包附带多个 dist 文件夹,我如何知道在构建我的应用程序时使用了哪个?