r - 我有两个数据集,需要将一个数据集列中的字符串与 R 中的其他数据集列进行比较
问题描述
我有两个数据集,需要将一个数据集列中的字符串与 R 中的其他数据集列进行比较:
以下是详细信息。大小写可以忽略
任何人都可以帮我解决这个问题。
第一个数据集:
<table><tbody><tr><th>instancename</th><th>hostname</th><th>sid</th><th> </th></tr><tr><td>instance1</td><td>server1</td><td> </td><td>sid1</td></tr><tr><td>instance2</td><td>server2</td><td> </td><td>sid2</td></tr><tr><td>instance3</td><td>server3</td><td> </td><td>sid3</td></tr><tr><td>instance4</td><td>server4</td><td> </td><td>sid4</td></tr><tr><td>instance5</td><td>server5</td><td> </td><td>sid5</td></tr><tr><td>instance6</td><td>server6</td><td> </td><td>sid6</td></tr></tbody>
第二个数据集:
<table><tbody><tr><th>short_description</th><th>description</th></tr><tr><td>Kindly activate Server1 information</td><td>Kindly activate all sid3 and there is issue with instance3</td></tr><tr><td>server2: issue on instance2</td><td>find a sloution for this issue</td></tr><tr><td>Please fix the issue</td><td>issue is on Sid6</td></tr><tr><td>can you please check instance5 on server5</td><td>Sid5. Please look into this issue asap.</td></tr><tr><td>sid1: performance issue</td><td>server1 and sid1. Performance issue</td></tr><tr><td>Can you please check the issue</td><td>Can you please check the issues</td></tr></tbody></table>
我需要像下面这样的最终数据集
<table><tbody><tr><th>short_description</th><th>description</th><th>Final_output</th></tr><tr><td>Kindly activate Server1 information</td><td>Kindly activate all sid3 and there is issue with instance3</td><td>Server1,sid3,instance3</td></tr><tr><td>server2: issue on instance2</td><td>find a sloution for this issue</td><td>server2,instance2</td></tr><tr><td>Please fix the issue</td><td>issue is on Sid6</td><td>Sid6</td></tr><tr><td>can you please check instance5 on server5</td><td>Sid5. Please look into this issue asap.</td><td>server5,Sid5</td></tr><tr><td>sid1: performance issue</td><td>server1 and sid1. Performance issue</td><td>sid1,server1</td></tr><tr><td>Can you please check the issue</td><td>Can you please check the issues</td><td>no matches found</td></tr></tbody></table>
解决方案
由于您以 html 格式提供数据,因此我必须将其读入 r 以作为表格:
b ="<table><tbody><tr><th>short_description</th><th>description</th></tr><tr><td>Kindly activate Server1 information</td><td>Kindly activate all sid3 and there is issue with instance3</td></tr><tr><td>server2: issue on instance2</td><td>find a sloution for this issue</td></tr><tr><td>Please fix the issue</td><td>issue is on Sid6</td></tr><tr><td>can you please check instance5 on server5</td><td>Sid5. Please look into this issue asap.</td></tr><tr><td>sid1: performance issue</td><td>server1 and sid1. Performance issue</td></tr><tr><td>Can you please check the issue</td><td>Can you please check the issues</td></tr></tbody></table>"
dat2= xml2::as_xml_document(paste0("<body>",b,"</body>"))%>%
rvest::html_table()%>%
{.[[1]]}
serv_instance = gsub("(?|.*?((?i)server\\d+|instance\\d+|sid\\d+)|.+)","\\1",do.call(paste,dat2),perl=T)
final_output = replace(gsub("(?<=\\d)(?=[A-Za-z])",", ",serv_instance,perl=T),!nchar(serv_instance),"No match found")
cbind(dat2,final_output)
short_description description final_output
1 Kindly activate Server1 information Kindly activate all sid3 and there is issue with instance3 Server1, sid3, instance3
2 server2: issue on instance2 find a sloution for this issue server2, instance2
3 Please fix the issue issue is on Sid6 Sid6
4 can you please check instance5 on server5 Sid5. Please look into this issue asap. instance5, server5, Sid5
5 sid1: performance issue server1 and sid1. Performance issue sid1, server1, sid1
6 Can you please check the issue Can you please check the issues No match found
推荐阅读
- arrays - 初始化具有给定大小 n 的非可选对象数组,其中对象的初始化程序可能会失败
- r - 使用 R 对用冒号分隔的数字进行排序
- ffmpeg - ffmpeg 宽度不能被 2 (375x500) 整除错误
- wix - 在 Azure Pipelines 中更改构建映像时,Wix MSI 安装程序失败
- android - 图片未使用 picasso 或 Glide 加载.. 一直显示空白图片
- php - 将php与reactjs结合
- c# - 如何使用与 EntityFramework 相关的 where?
- c# - 我可以从 c# 代码运行 IloOplExec() Cplex 函数吗?
- laravel - 如何检查用户是否是 Laravel 中的管理员?
- java - CRUDrepository 中的自定义方法问题(spring boot)