javascript - 试图从 HTML 块中提取文本字符串
问题描述
我正在尝试从一段 HTML 中提取文本字符串(文章标题)。在这种情况下,它是“记者据称以滑稽愚蠢的方式监视竞争对手的 Zoom 会议”。
问题是,标题没有我能看到的任何标识符。它在 HTML 中的一些地方,但它所在的 div 没有稳定的名称。
我试过:
var url = $(uCW).find('[href^="https://l.facebook"]').text();
但是得到错误的文本块。(uCW 是我给包含在其中的 div 的变量名——它可以很好地在此处获取其他信息)。真的,我很难弄清楚如何选择它——理论上我可以指定所有东西所在的确切孩子,但孩子变化很大,我想使用更稳定的方法。
<div class="_1dwg _1w_m _q7o" data-vc-ignore-dynamic="1">
<div></div>
<div class="_4r_y">
<div class="_6a uiPopover _5pbi _cmw _b1e _1wbl" id="u_fetchstream_4_6"><a aria-label="Story options" data-testid="post_chevron_button" class="_4xev _p" aria-haspopup="true" aria-expanded="false" rel="toggle" href="#" role="button" id="u_fetchstream_4_7"></a></div>
</div>
<div>
<div class="v_zhq0t5rr6 _5eit p_zhq0tbcbb clearfix">
<div class="clearfix c_zhq0t5thj">
<a target="" class="_5pb8 u_zhq0tbcb8 _8o _8s lfloat _ohe" title="Person" aria-hidden="true" tabindex="-1" data-ft="{"tn":"m"}" href="https://www.facebook.com/j.newsham?fref=nf&__tn__=%2Cdm-R-R&eid=ARBC4Tpii73ko-nTTzvjgbhv8Uvq1GIHitUe_IHE0Ksi1su-LTuENPi9GCWskRJMLwp4VMol7R2filWQ" data-hovercard="/ajax/hovercard/user.php?id=675172323&extragetparams=%7B%22__tn__%22%3A%22%2Cdm-R-R%22%2C%22eid%22%3A%22ARBC4Tpii73ko-nTTzvjgbhv8Uvq1GIHitUe_IHE0Ksi1su-LTuENPi9GCWskRJMLwp4VMol7R2filWQ%22%7D" data-hovercard-prefer-more-content-show="1">
<div class="_38vo">
<!-- react-mount-point-unstable -->
<div><img class="_s0 _4ooo _5xib _5sq7 _44ma _rw img" src="https://scontent-ort2-2.xx.fbcdn.net/v/t1.0-1/p112x112/38427941_10156325214622324_8412493305270501376_n.jpg?_nc_cat=110&_nc_sid=dbb9e7&_nc_ohc=e5WgZHVuabcAX-4npCK&_nc_ht=scontent-ort2-2.xx&_nc_tp=6&oh=eb95679d9ee7fb5be65b6bdb23dcf7b2&oe=5ECE2ADB" alt="" aria-label="Person" role="img"></div>
</div>
</a>
<div class="clearfix _42ef">
<div class="rfloat _ohf"></div>
<div class="l_zhq0t5thg">
<div>
<div class="_6a _5u5j">
<div class="_6a _6b" style="height:40px"></div>
<div class="_6a _5u5j _6b">
<h5 class="_7tae _14f3 _14f5 _5pbw _5vra" data-ft="{"tn":"C"}" id="js_9e"><span class="fwn fcg"><span class="fwb fcg" data-ft="{"tn":";"}"><a title="Person" href="https://www.facebook.com/j.newsham?__tn__=%2CdC-R-R&eid=ARBQdCphQpNyE52IVRqnH7bi35xke_7h8ucoRhm-SykkuyeLTHQwdjplzLmwjPJI_2_SlLcyDWm9pGoB&hc_ref=ARRHpOlTgvosbrodRaKBuoiUQmaEP0kbw6SoEqUpbxJ-qgG56wADKG8zO652g3vacIc&fref=nf" data-hovercard="/ajax/hovercard/user.php?id=675172323&extragetparams=%7B%22__tn__%22%3A%22%2CdC-R-R%22%2C%22eid%22%3A%22ARBQdCphQpNyE52IVRqnH7bi35xke_7h8ucoRhm-SykkuyeLTHQwdjplzLmwjPJI_2_SlLcyDWm9pGoB%22%2C%22hc_ref%22%3A%22ARRHpOlTgvosbrodRaKBuoiUQmaEP0kbw6SoEqUpbxJ-qgG56wADKG8zO652g3vacIc%22%2C%22fref%22%3A%22nf%22%7D" data-hovercard-prefer-more-content-show="1" data-hovercard-referer="ARRHpOlTgvosbrodRaKBuoiUQmaEP0kbw6SoEqUpbxJ-qgG56wADKG8zO652g3vacIc">Person</a></span></span></h5>
<div class="_5pcp _5lel _2jyu _232_" id="feed_subtitle_675172323:7304407797214710582" data-testid="story-subtitle">
<span class="z_zhq0t6o5b"><span class="fsm fwn fcg"><a class="_5pcq" href="/j.newsham/posts/10157963951497324" target=""><abbr data-utime="1588120184" title="Tuesday, April 28, 2020 at 7:29 PM" data-shorten="1" class="_5ptz timestamp livetimestamp"><span class="timestampContent" id="js_9f">16 hrs</span></abbr></a></span></span><span class="_6spk" role="presentation" aria-hidden="true"> · </span>
<div class="_6a _29ee _4f-9 _43_1" data-hover="tooltip" data-tooltip-content="Shared with: Person's friends" role="img" aria-label="Shared with: Person's friends"><span><i class="_1lbg img sp_Ke6ZUJH-N4S_1_5x sx_73b6dc"></i></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="userContent"></div>
<div class="_3x-2" data-ft="{"tn":"H"}">
<div data-ft="{"tn":"H"}">
<div class="mtm">
<div id="u_fetchstream_4_1" class="_6m2 _1zpr clearfix _dcs _4_w4 _41u- _59ap _2bf7 _64lx _3eqz _20pq _3eqw _2rk1 _359m _3n1j _5qqr" data-ft="{"tn":"H"}">
<div class="clearfix _2r3x">
<div class="lfloat _ohe">
<span class="_3m6-">
<div class="_63yw">
<div class="_6ks">
<a href="https://gizmodo.com/journalist-allegedly-spied-on-zoom-meetings-of-rivals-i-1843125262?utm_campaign=Gizmodo&utm_content&utm_medium=SocialMarketing&utm_source=facebook&fbclid=IwAR3MOk2OqjX3z6DNKgdmlVDtcYQz4xIx-CRsQOuV39hVGZR_U-TjgqTKSHQ" aria-describedby="u_fetchstream_4_3" aria-label="Journalist Allegedly Spied on Zoom Meetings of Rivals in Hilariously Dumb Ways" tabindex="-1" target="_blank" rel="noopener nofollow" data-lynx-mode="asynclazy" data-lynx-uri="https://l.facebook.com/l.php?u=https%3A%2F%2Fgizmodo.com%2Fjournalist-allegedly-spied-on-zoom-meetings-of-rivals-i-1843125262%3Futm_campaign%3DGizmodo%26utm_content%26utm_medium%3DSocialMarketing%26utm_source%3Dfacebook%26fbclid%3DIwAR3MOk2OqjX3z6DNKgdmlVDtcYQz4xIx-CRsQOuV39hVGZR_U-TjgqTKSHQ&h=AT0v6E7lQPPlUT-t8yQbu0DBEukuzdXli3s4pdRZxCF9EVtUE0omFYcc-fOtFYQJIHWOVgDfrGhVsH4T3uqimv560qNSBhRnwdM_iCwl4BQJ1f9r5rrk9K1zibH3nA9ZhUT6-YdcIkm7lBZtJYn6SKbWmmPzJsBUI-LcjNoQHXw">
<div class="accessible_elem inlineBlock" id="u_fetchstream_4_3">Financial Times reporter Mark Di Stefano allegedly spied on Zoom meetings at rival newspapers the Independent and the Evening Standard to get scoops on staff cuts and furloughs due to the coronavirus pandemic, according to a report from the UK’s Independent. And he did a comically bad job of cover...</div>
<div class="_6l- __c_">
<div class="uiScaledImageContainer _6m5 fbStoryAttachmentImage" style="width:514px;height:268.42222222222px;"><img class="scaledImageFitWidth img" src="https://external-ort2-2.xx.fbcdn.net/safe_image.php?d=AQCX1CnigNk3SZXL&w=540&h=282&url=https%3A%2F%2Fi.kinja-img.com%2Fgawker-media%2Fimage%2Fupload%2Fc_fill%2Cf_auto%2Cfl_progressive%2Cg_center%2Ch_675%2Cpg_1%2Cq_80%2Cw_1200%2Fy3dmfzz6ktqefakczlow.jpg&cfs=1&upscale=1&fallback=news_d_placeholder_publisher&_nc_hash=AQAApLwXk6n73twX" data-src="https://external-ort2-2.xx.fbcdn.net/safe_image.php?d=AQCX1CnigNk3SZXL&w=540&h=282&url=https%3A%2F%2Fi.kinja-img.com%2Fgawker-media%2Fimage%2Fupload%2Fc_fill%2Cf_auto%2Cfl_progressive%2Cg_center%2Ch_675%2Cpg_1%2Cq_80%2Cw_1200%2Fy3dmfzz6ktqefakczlow.jpg&cfs=1&upscale=1&fallback=news_d_placeholder_publisher&_nc_hash=AQAApLwXk6n73twX" style="top:0px;" alt="" width="514" height="269" aria-label="photo of Journalist Allegedly Spied on Zoom Meetings of Rivals in Hilariously Dumb Ways image"></div>
</div>
</a>
</div>
<a class="_34js _8o63 _1kaa _34jt _34ju _2cpc" ajaxify="/feed/article_context/dialog/?share_id=10157963951502324&entry_type=news_feed_learn_more&trigger_log_id=bd3a8ea2-29fb-4ee2-b335-c27d26be3c85&ft_msg=mf_story_key.10157963951497324%3Atop_level_post_id.10157963951497324%3Atl_objid.10157963951497324%3Acontent_owner_id_new.675172323%3Athrowback_story_fbid.10157963951497324%3Astory_location.4%3Astory_attachment_style.share" href="#" rel="dialog-post" data-ft="{"tn":"-T"}" role="button" data-hover="tooltip" data-tooltip-content="Show more information about this link" data-tooltip-alignh="right" id="u_fetchstream_4_8"><i class="_34k2"></i></a>
</div>
<div class="_3ekx _29_4">
<div class="_6m3 _--6">
<div class="_59tj _2iau">
<div>
<div class="_6lz _6mb _1t62 ellipsis">gizmodo.com</div>
<div class=""></div>
</div>
</div>
<div class="_3n1k">
<div class="mbs _6m6 _2cnj _5s6c"><a href="https://l.facebook.com/l.php?u=https%3A%2F%2Fgizmodo.com%2Fjournalist-allegedly-spied-on-zoom-meetings-of-rivals-i-1843125262%3Futm_campaign%3DGizmodo%26utm_content%26utm_medium%3DSocialMarketing%26utm_source%3Dfacebook%26fbclid%3DIwAR20yuiGAWmKatwN2MwmTXyBmz529Gwnb-h604xwyDNop7FiMX_hTwNDlE8&h=AT0XhG7ILFntZMvv9JimeFCtFMKTLchXKAVbYAyo7kl_dEkPltCRPbpLOroCd6pbCd0hzuD0Mvogr-cL0SEFRrLD0kkhcBkp6GrpjoTaYQwUSt7ReNTshqXkHGCYhAm6hb8qDKcZm3O0mEWUtgLM7_ALdSGyX9DyclB6OlIgsXg" rel="noopener nofollow" target="_blank" data-lynx-mode="asynclazy">Journalist Allegedly Spied on Zoom Meetings of Rivals in Hilariously Dumb Ways</a></div>
<div class="_6m7 _3bt9">Financial Times reporter Mark Di Stefano allegedly spied on Zoom meetings at rival newspapers the Independent and the Evening Standard to get scoops on staff cuts and furloughs due to the coronavirus pandemic, according to a report from the UK’s Independent. And he did a comically bad job of cover...</div>
</div>
</div>
<a href="https://l.facebook.com/l.php?u=https%3A%2F%2Fgizmodo.com%2Fjournalist-allegedly-spied-on-zoom-meetings-of-rivals-i-1843125262%3Futm_campaign%3DGizmodo%26utm_content%26utm_medium%3DSocialMarketing%26utm_source%3Dfacebook%26fbclid%3DIwAR2sRF3AjujE4KgspWs5ltmxgtABX46iAmdHGCVxDmWSzYu93cO_d1EMMfc&h=AT1duKty7qVugflB4dskMMBn6j1M0FJ-cneezEPDTrI6c2IcEKkCT1YZ6-8Bw2oad-n0gZZBFU5Mk-iTNkLo-up1anlYj_l_pIvZEVXz-2WPYAeQrILewicbiMd8Gj6ziLDys5z7PLZy2syfD1-HTufQ12efucyRp3hHa8mCcvGyPH1jtw" aria-label="Journalist Allegedly Spied on Zoom Meetings of Rivals in Hilariously Dumb Ways" aria-describedby="u_fetchstream_4_2" rel="noopener nofollow" tabindex="-1" target="_blank" class="_52c6" data-lynx-mode="asynclazy">
<div class="accessible_elem" id="u_fetchstream_4_2">Financial Times reporter Mark Di Stefano allegedly spied on Zoom meetings at rival newspapers the Independent and the Evening Standard to get scoops on staff cuts and furloughs due to the coronavirus pandemic, according to a report from the UK’s Independent. And he did a comically bad job of cover...</div>
</a>
</div>
</span>
</div>
<div class="_42ef"><span class="_3c21"></span></div>
</div>
</div>
</div>
</div>
</div>
<div></div>
</div>
</div>
解决方案
例如,您可以通过选择以 l.facebook 开头的 href 来获取文本,该 href 包含一个具有类名的元素,accessible_elem
因为:has()
该元素包含文本。
var copy = $(uCW).find('[href^="https://l.facebook"]:has(".accessible_elem")')
.find(".accessible_elem").text();
更新:如评论所述,这不针对想要的文本。相反,可以读出此链接的 aria-label 属性,因为它包含正确的文本:
var copy = $(ucw).find('[href^="https://l.facebook"]:has(".accessible_elem")').attr("aria-label");
推荐阅读
- html - 在不使用引导程序的情况下在导航标签中设置文本颜色的正确 CSS 语法是什么?
- azure - Azure B2C 自定义策略/自定义 ui - 从不同页面嵌入“密码重置”和“注册”按钮
- python - WSGI 服务器忽略 Flask config.from_object
- java - 如何使用多个索引查询 DynamoDB?
- sql - 如何为每个国家/地区选择最大的邮政编码
- flutter - 如何存储和检索列表
达到坚持 - docker - Nginx 重定向到后端-nginx 服务器返回错误的端口
- javascript - socket.io房间名称如何从前端传递到后端?
- django - ValidationError 未显示在模板中(尽管字段验证错误)
- reactjs - 如何将评级保存到本地存储,使其在刷新后不会消失?