regex - 从VBA中的字符串中提取数据的最佳方法
问题描述
大家好,像许多人一样,我正在将我的 covid 时间转换为编码。共病?:)
我需要在 Excel 中使用 VBA 从字符串中提取子字符串,并希望获得有关可用解决方案的建议。我认为正则表达式将是要走的路,但实际上我很不确定,因为我对正则表达式相当不熟悉,这可能很复杂。也许有一些我不知道的更简单的解决方案,非常感谢任何建议。
这些字符串源自开源 mybb 论坛的英文文件。我正在创建一个 Excel 工作簿来帮助翻译人员,我将把它捐赠给他们的社区。
需要的是删除 html,并将每个子字符串放在相邻的单元格中。模式是:
- 之前 <
- > 和 < 之间
- 之后的文字 >
字符串的示例是(不带第一个和最后一个引号):
示例 1:
"You are currently viewing a stripped down version of our content. <a href=\"{1}\">View the full version</a> with proper formatting."
- 字符串 1 =
"You are currently viewing a stripped down version of our content. "
- 字符串 2 =
"View the full version"
- 字符串 3 =
" with proper formatting."
示例 2:
"<b>Private</b> Only you will be able to view this event. (Registered Users Only)."
- 字符串 1 =
"Private"
- 字符串 2 =
" Only you will be able to view this event. (Registered Users Only)."
示例 3:
" This day does not have any events associated with it.<p><a href=\'calendar.php?action=addevent&calendar={1}&day={2}&month={3}&year={4}\'>Post an Event</a>.</p>"
- 字符串 1 =
"This day does not have any events associated with it."
- 字符串 2 =
"Post an Event"
- 字符串 3 =
"."
例4:(这个例子是我见过最大的)
"<p><br />[list]<br />[*]List Item #1<br />[*]List Item #2<br />[*]List Item #3<br />[/list]<br /><ul><li>List item #1</li><li>List item #2</li><li>List Item #3</li>"
- 字符串 1 =
"[list]"
- 字符串 2 =
"[*]List Item #1"
- 字符串 3 =
"[*]List Item #2"
- 字符串 4 =
"[*]List Item #3"
- 字符串 5 =
"[/list]"
- 字符串 6 =
"List item #1"
- 字符串 7 =
"List item #2"
- 字符串 8 =
"List item #3"
任何建议将不胜感激。
编辑:添加更多样本
<span title=\"{1}\">Today</span>
<span title=\"{1}\">Yesterday</span>
<span title=\"{5}{6}\">{1}{2} {3} {4}</span>
You are currently using <strong>{1}</strong>.
<br /><br />You are encouraged to register; once you register you will be able to post messages, set your own preferences, and maintain a profile.
<br /><br />Some of the features that generally require registration are subscriptions, changing of styles, accessing of your Personal Notepad and emailing forum members.
<br /><br />Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk.
<br /><br />Cookies on this forum also track the specific topics you have read and when you last read them.
<p><br />[url]http://www.example.com/[/url]<br /> <a href=\"http://www.example.com/\">http://www.example.com/</a>
<p>[url=http://www.example.com/]Example.com[/url]<br /> <a href=\"http://www.example.com/\">Example.com</a>
<p>[email]example@example.com[/email]<br /> <a href=\"mailto:example@example.com\">example@example.com</a>
解决方案
假设您的源字符串在 A 列中:
Sub Demo()
Dim i As Long, r As Long, c As Long, StrIn As String, StrOut As String
With ActiveSheet
For r = 1 To .UsedRange.SpecialCells(xlCellTypeLastCell).Row
StrIn = ActiveSheet.Range("A" & r).Text: c = 1
For i = 0 To UBound(Split(StrIn, ">"))
If Split(StrIn, ">")(i) <> "" Then
If Split(Split(StrIn, ">")(i), "<")(0) <> "" Then
c = c + 1
.Cells(r, c).Value = Split(Split(StrIn, ">")(i), "<")(0)
End If
End If
Next
Next
End With
End Sub
推荐阅读
- cytoscape - Cytoscape 加载麋鹿图表的行为不一致,很多时候我的节点不可见
- javascript - Django 加载静态文件的问题
- python - 从 wordcloud 导出到 Python 列表
- python-3.x - 如何在 VSCODE 和 pylam 中全局禁用 E501
- javascript - 如何检测 mjpeg 流何时停止使用 javascript
- java - 如何重载使用 @RequestBody 的 Spring Boot RestController 方法?
- ruby-on-rails - Rails 模式转储显示 NameError: undefined method `attributes' for class `Nokogiri::XML::AttributeDecl'
- spring-boot - 如何使用 Spring Cloud Sleuth 更改 REST Endpoint 的 Span 名称
- python - FLASK 应用程序:通过另一个脚本从表单发送数据,而一个“输入”是动态段落
- linux - /linux/usr/include/asm-generic/socket.h:123:5:警告:“__BITS_PER_LONG”未定义,计算结果为 0 [-Wundef]