首页 > 解决方案 > 从 cURLed HTML 文件中提取 JSON 的问题

问题描述

我不能发布整个代码示例,因为 cURLing 需要对我无法访问的系统进行身份验证。

卷曲工作正常。该页面以 UTF-8 编码。我将 HTML 传递给 simple_html_dom 并提取存储在 HTML 元素的属性中的 JSON。

然而,当我尝试json_decode它时,我被告知存在语法错误,尽管 JSONLint 说它是有效的。从我能够找到它的某种编码不匹配。JSON 似乎是 ASCII 格式(从 simple_html_dom 出来之后),我已经尝试了所有我能找到的东西iconvmb_convert_encoding utf8_encode但无济于事。

这是 devtools 提供的 HAR 文件,供任何愿意帮助解决的人使用。https://fitaf570.com/sud/thrivecart.com_Archive%20[21-05-26%2009-30-28].har

我们试图提取的 JSON 是:

{"optin":{"icon":"/static/images/autoresponder.check.png","events":[]},"abandon":{"icon":"/static/images/autoresponder.abandon.png","events":{"1182":{"action":"abandon","geo":"NON-EU","list":"START Abandon MACB","mode":"add","provider":"integration.activecampaign","subtype":"tag","trigger_days":"1"},"4257":{"action":"abandon","geo":"NON-EU","list":"25","mode":"add","provider":"integration.activecampaign","subtype":"list","trigger_days":"1"}}},"purchase":{"icon":"/static/images/autoresponder.purchase.png","events":{"3135":{"action":"purchase","geo":"NON-EU","list":"6","mode":"add","provider":"integration.activecampaign","subtype":"list","trigger_days":"1"},"5581":{"action":"purchase","geo":"NON-EU","list":"PURCHASE: MACB","list_freeform":"1","mode":"add","provider":"integration.activecampaign","subtype":"new-tag","trigger_days":"1"},"9922":{"action":"purchase","geo":"NON-EU","list":"25","mode":"add","provider":"integration.activecampaign","subtype":"list","trigger_days":"1"}}},"bump":{"icon":"/static/images/autoresponder.bumppurchase.png","events":{"3883":{"action":"bump","geo":"NON-EU","list":"PURCHASE: Parents Insight PDF","list_freeform":"1","mode":"add","provider":"integration.activecampaign","subtype":"new-tag","trigger_days":"1"}}},"affiliate_signup":{"icon":"/static/images/autoresponder.affiliatesignup.png","events":[]},"refund":{"icon":"/static/images/autoresponder.refund.png","events":[]},"refund_bump":{"icon":"/static/images/autoresponder.refund.png","events":[]},"decline":{"icon":"/static/images/autoresponder.decline.png","events":[]},"refund_recur":{"icon":"/static/images/autoresponder.refund_recur.png","events":[]},"recur_fail":{"icon":"/static/images/autoresponder.recur_fail.png","events":[]},"recur_fail_1":{"icon":"/static/images/autoresponder.recur_fail.png","events":[]},"recur_fail_2":{"icon":"/static/images/autoresponder.recur_fail.png","events":[]},"recur_fail_3":{"icon":"/static/images/autoresponder.recur_fail.png","events":[]},"recur_success":{"icon":"/static/images/autoresponder.recur_success.png","events":[]},"recur_cancel":{"icon":"/static/images/autoresponder.recur_cancel.png","events":[]},"recur_finish":{"icon":"/static/images/autoresponder.recur_success.png","events":[]},"dunning_pre":{"icon":"/static/images/autoresponder.dunning_pre.png","events":[]},"dunning_due":{"icon":"/static/images/autoresponder.dunning_due.png","events":[]},"dunning_post":{"icon":"/static/images/autoresponder.dunning_post.png","events":[]},"expiry_pre":{"icon":"/static/images/autoresponder.dunning_pre.png","events":[]},"expiry_post":{"icon":"/static/images/autoresponder.dunning_post.png","events":[]}}

这是我尝试过的:

$dom = new simple_html_dom();
$dom->load($html); //cURL result
$ar=$dom->find('.ui-autoresponder-provider');
$act=stripslashes($ar[0]->attr['data-actions']);

$act=iconv(mb_detect_encoding($act, mb_detect_order(), true), "UTF-8", $act); 
$act=iconv(mb_detect_encoding($act), "UTF-8//TRANSLIT//IGNORE",$act);
$act=mb_convert_encoding($act,"UTF-8");

我单独尝试过的最后 3 行,依次尝试,以及我在发布之前丢弃的其他行。json_decode总是导致语法错误。

非常感谢任何帮助。

标签: phpjsoncurlcharacter-encodingsimple-html-dom

解决方案


推荐阅读