首页 > 解决方案 > Excel - 使用 VBA 抓取 HTML 源代码布局

问题描述

我有一些设计得很糟糕的 HTML,我试图从中抓取数据以便于阅读。

我可以用innerHTML它来获取信息,但不幸的是它以文字墙的形式出现。

当我“查看源代码”(为隐私而更改字词)时来自网页的源代码示例是:

<td nowrap valign="top"><b>Logger Notes</b></td>
    <td valign="top">Hi,
Person needs a full breakdown Important information.
Would also would like confirmation in a letter about what kinds of assistance 
she is not eligible for if possible.
Would prefer sent to email.
Thanks&nbsp;</td>

但是,当我获取数据时,它会显示为一堵文字墙,如下所示:

Hi, Person needs a full breakdown Important information. Would also would like confirmation in a letter about what kinds of assistance  she is not eligible for if possible. Would prefer sent to email. Thanks

这显然更难阅读。

当我使用innerHTML并查看字符串时,所有换行符实际上都是空格字符,所以我不能使用replace.

我已经搜索并尝试了许多不同的东西,但我无法找到一种方法来显示它以便于阅读。

该网页位于我们的工作 Intranet 上,并且有一个登录(使用电子表格的多人将使用该登录 - 所以我无法自动执行此操作)

首选输出示例:

Hi,

Person needs a full breakdown Important information.

Would also would like confirmation in a letter about what kinds of assistance she is not eligible for if possible.

Would prefer sent to email.

任何意见,将不胜感激。

标签: vbaexcel

解决方案


也许,在每次出现一个句点后添加一个换行符?

例如replace".""." & vbCrLf


推荐阅读