首页 > 解决方案 > How To Parse Json Html in BeautifulSoup

问题描述

Currently I Have The Following

<script>window.__NUXT__=(function(a,b,c,d,e,f,g,h,i){i.date="2020-11-05 09:22:56.000000";i.timezone_type=d;i.timezone="UTC";return {layout:"default",data:[{}],error:a,state:{languages:{text:{},javascript:{mime:"text\u002Fjavascript"},css:{mime:"text\u002Fcss"},html:{directory:"htmlmixed",mime:"text\u002Fhtml"},vue:{directory:"vue",mime:"text\u002Fx-vue"},php:{directory:"php",mime:"application\u002Fx-httpd-php"},c:{directory:b,mime:"text\u002Fx-c++src"},csharp:{directory:b,mime:"text\u002Fx-csharp"},java:{directory:b,mime:"text\u002Fx-java"},lua:{directory:"lua",mime:"text\u002Fx-lua"},golang:{directory:"go",mime:"text\u002Fx-go"},dockerfile:{directory:"dockerfile",mime:"text\u002Fx-dockerfile"}},pastes:{liWq2S3:{status:c,id:e,title:f,paste:g,views:d,syntax:a,size:h,created_at:i}},paste:{title:f,status:c,id:e,paste:g,views:d,syntax:a,size:h,created_at:i}},serverRendered:c}}(null,"clike",true,3,"liWq2S3","Soup","Apple\nOrange\nCake\nPizza",23,{}));</script><script src="/assets/2135194c1f343036c318.js" defer></script><script src="/assets/fbb38f3d2f4d64c9376c.js" defer></script><script src="/assets/ad6678a738ac39c210fc.js" defer></script><script src="/assets/a93fe408ab9aa8104b17.js" defer></script><script src="/assets/f78a487814d7850007a6.js" defer></script>

I wanna Parsed The Title (Soup) And The Description (Apple\nOrange\nCake\nPizza) but i can't seem to find any resource to help me been stuck for a while I stumbled upon this but still can't find the solution for myself

My Code

import requests
from bs4 import BeautifulSoup
import json

ExampleSite = "https://throwbin.io/liWq2S3"

r1 = requests.get(ExampleSite)
r1text = r1.text

soup = BeautifulSoup(r1text,features="html.parser")

ParsedSoup = soup.findAll('script')[1]

print (ParsedSoup)

标签: pythonpython-3.xbeautifulsoup

解决方案


这对我来说看起来不像JSON,但我可能是错的。但是,您可以使用regex将这些东西拿出来。

import re

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://throwbin.io/liWq2S3").text, features="html.parser")
bins_contents = re.search(r"}\((.*)\)\);", str(soup), re.S).group(1)
print(bins_contents.replace('"', "").split(",")[4:6])

输出:

['Soup', 'Apple\\nOrange\\nCake\\nPizza']

推荐阅读