首页 > 解决方案 > 使用 VBA 解析 HTML 以删除文本

问题描述

我在一个名为的变量中有以下 html Description

<STYLE>
#productDescription h3 {
margin: 0.75em 0px 0.375em -1px;
}
</STYLE>
</DIV></DIV></DIV>
<DIV id=detailBullets class=celwidget data-feature-name="detailBullets">
<STYLE type=text/css>
.detail-bullets-wrapper ul.detail-bullet-list {
margin: 0 0 1px 18px;
}
.detail-bullets-wrapper ul li {
margin-bottom: 5.5px;
}
.detail-bullets-wrapper:last-child {
margin-bottom: 4.5px;
}
</STYLE>

<DIV id=detailBulletsWrapper_feature_div class="a-section feature detail-bullets-wrapper bucket" data-feature-name="detailBullets" data-template-name="detailBullets">
<HR class="a-divider-normal bucketDivider">

<H2>Product details</H2><SPAN class=disclaim></SPAN>
<DIV id=detailBullets_feature_div>
<UL class="a-unordered-list a-nostyle a-vertical a-spacing-none detail-bullet-list">
<LI><SPAN class=a-list-item><SPAN class=a-text-bold>Department : </SPAN><SPAN>Womens</SPAN> </SPAN></LI>
<LI><SPAN class=a-list-item><SPAN class=a-text-bold>Date First Available : </SPAN><SPAN>May 21, 2018</SPAN> </SPAN></LI>
<LI><SPAN class=a-list-item><SPAN class=a-text-bold>ASIN : </SPAN><SPAN>B07D6WDLLL</SPAN> </SPAN></LI></UL></DIV>
<UL class="a-unordered-list a-nostyle a-vertical a-spacing-none detail-bullet-list">
<DIV id=dpx-amazon-sales-rank_feature_div>
<DIV id=dpx-amazon-sales-rank_feature_div>
<STYLE type=text/css>
.zg_hrsr_item {list-style : none};
</STYLE>

<LI id=SalesRank style="LIST-STYLE-TYPE: none"><B>Amazon Best Sellers Rank:</B> #2,757,302 in Clothing, Shoes &amp; Jewelry (<A href="about:/gp/bestsellers/fashion/ref=pd_zg_ts_fashion">See Top 100 in Clothing, Shoes &amp; Jewelry</A>) 
<STYLE type=text/css>.zg_hrsr { margin: 0; padding: 0; list-style-type: none; }.zg_hrsr_item { margin: 0 0 0 10px; }.zg_hrsr_rank { display: inline-block; width: 80px; text-align: right; }</STYLE>

<UL class=zg_hrsr>
<LI class=zg_hrsr_item><SPAN class=zg_hrsr_rank>#1198384</SPAN> <SPAN class=zg_hrsr_ladder>in&nbsp;<A href="about:/gp/bestsellers/fashion/7581668011/ref=pd_zg_hrsr_fashion">Women's Shops</A></SPAN> </LI>
<LI class=zg_hrsr_item><SPAN class=zg_hrsr_rank>#262103</SPAN> <SPAN class=zg_hrsr_ladder>in&nbsp;<A href="about:/gp/bestsellers/fashion/9056923011/ref=pd_zg_hrsr_fashion">Women's Novelty T-Shirts</A></SPAN> </LI></UL></LI></DIV></DIV></UL>
<UL class="a-unordered-list a-nostyle a-vertical a-spacing-none detail-bullet-list">
<LI><SPAN class=a-list-item><SPAN class=a-text-bold>Customer Reviews: </SPAN>
<STYLE type=text/css>
/*
* Fix for UDP-1061. Average customer reviews has a small extra line on hover
* https://omni-grok.amazon.com/xref/src/appgroup/websiteTemplates/retail/SoftlinesDetailPageAssets/udp-intl-lock/src/legacy.cssindexName=WebsiteTemplates#40
*/
.noUnderline a:hover {
text-decoration: none;
}
</STYLE>

<DIV id=detailBullets_averageCustomerReviews data-ref="dpx_acr_pop_" data-asin="B07D6VPP5K"><SPAN class=a-declarative data-action="acrStarsLink-click-metrics" data-acrStarsLink-click-metrics="{}"><SPAN id=acrPopover title="5.0 out of 5 stars" class="reviewCountTextLinkedHistogram noUnderline"><SPAN class=a-declarative data-action="a-popover" data-a-popover='{"max-width":"700","closeButton":"false","position":"triggerBottom","url":"/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_contextId=dpx&amp;asin=B07D6VPP5K"}'><A class="a-popover-trigger a-declarative" href="javascript:void(0)"><I class="a-icon a-icon-star a-star-5"><SPAN class=a-icon-alt>5.0 out of 5 stars</SPAN></I> <I class="a-icon a-icon-popover"></I></A></SPAN><SPAN class=a-letter-space></SPAN></SPAN></SPAN><SPAN class=a-letter-space></SPAN><SPAN class=a-declarative data-action="acrLink-click-metrics" data-acrLink-click-metrics="{}"><A id=acrCustomerReviewLink class=a-link-normal href="about:blank#customerReviews"><SPAN id=acrCustomerReviewText class=a-size-base>3 ratings</SPAN> </A></SPAN>
<SCRIPT type=text/javascript>
P.when('A', 'ready').execute(function(A) {
A.declarative('acrLink-click-metrics', 'click', { "allowLinkDefault" : true }, function(event){
if(window.ue) {
ue.count("acrLinkClickCount", (ue.count("acrLinkClickCount") || 0) + 1);
}
});
});
</SCRIPT>

<SCRIPT type=text/javascript>
P.when('A', 'cf').execute(function(A) {
A.declarative('acrStarsLink-click-metrics', 'click', { "allowLinkDefault" : true },  function(event){
if(window.ue) {
ue.count("acrStarsLinkWithPopoverClickCount", (ue.count("acrStarsLinkWithPopoverClickCount") || 0) + 1);
}
});
});
</SCRIPT>
</DIV></SPAN></LI></UL>
<DIV class=a-row></DIV>
<DIV class=a-row></DIV></DIV></DIV>
<DIV id=cpsiaProductSafetyWarning_feature_div class=celwidget data-feature-name="cpsiaProductSafetyWarning"></DIV><br/>

在 html 查看器中显示如下: 在此处输入图像描述

我需要删除线

Amazon Best Sellers Rank: #2,757,302 in Clothing, Shoes & Jewelry (See Top 100 in Clothing, Shoes & Jewelry)
#1198384 in Women's Shops
#262103 in Women's Novelty T-Shirts

我希望删除“Amazon Best Sellers Rank”之后的所有文本以获得以下内容。

在此处输入图像描述

我不确定如何进行此操作。

编辑:

我的技能集仅限于找到“亚马逊畅销书排名”第一次出现的位置,如下所示

InStr(Description, "Amazon Best Seller Rank")

在找到“Amazon Best Seller Rank”的位置后,我可以用它来删除其余的文本,但它会弄乱 html。

标签: excelvba

解决方案


由于您的示例 HTML 无效,因此很难确认此答案有效。也许您没有粘贴所有内容。

但我相信以下 VBA 功能将完成您​​想要的:

Function ParseYourHTML$(ByVal description$)
    Dim p1&, p2&
    Const P1Marker = "<LI id=SalesRank style="
    Const P2Marker = "</DIV></SPAN></LI></UL>"
    If Len(description) Then
        p1 = InStr(description, P1Marker)
        p2 = InStr(p1, description, P2Marker)
        If p1 Then
            If p2 Then
                ParseYourHTML= Left(description, p1 - 1)
                ParseYourHTML= ParseYourHTML& Mid(description, p2)
            End If
        End If
    End If
End Function

你可以像这样调用这个函数:

sHTML = ParseHTML(description)

执行上述行后,sHTML应分配您寻求的解析结果。


推荐阅读