python - How To Parse Json Html in BeautifulSoup
问题描述
Currently I Have The Following
<script>window.__NUXT__=(function(a,b,c,d,e,f,g,h,i){i.date="2020-11-05 09:22:56.000000";i.timezone_type=d;i.timezone="UTC";return {layout:"default",data:[{}],error:a,state:{languages:{text:{},javascript:{mime:"text\u002Fjavascript"},css:{mime:"text\u002Fcss"},html:{directory:"htmlmixed",mime:"text\u002Fhtml"},vue:{directory:"vue",mime:"text\u002Fx-vue"},php:{directory:"php",mime:"application\u002Fx-httpd-php"},c:{directory:b,mime:"text\u002Fx-c++src"},csharp:{directory:b,mime:"text\u002Fx-csharp"},java:{directory:b,mime:"text\u002Fx-java"},lua:{directory:"lua",mime:"text\u002Fx-lua"},golang:{directory:"go",mime:"text\u002Fx-go"},dockerfile:{directory:"dockerfile",mime:"text\u002Fx-dockerfile"}},pastes:{liWq2S3:{status:c,id:e,title:f,paste:g,views:d,syntax:a,size:h,created_at:i}},paste:{title:f,status:c,id:e,paste:g,views:d,syntax:a,size:h,created_at:i}},serverRendered:c}}(null,"clike",true,3,"liWq2S3","Soup","Apple\nOrange\nCake\nPizza",23,{}));</script><script src="/assets/2135194c1f343036c318.js" defer></script><script src="/assets/fbb38f3d2f4d64c9376c.js" defer></script><script src="/assets/ad6678a738ac39c210fc.js" defer></script><script src="/assets/a93fe408ab9aa8104b17.js" defer></script><script src="/assets/f78a487814d7850007a6.js" defer></script>
I wanna Parsed The Title (Soup) And The Description (Apple\nOrange\nCake\nPizza) but i can't seem to find any resource to help me been stuck for a while I stumbled upon this but still can't find the solution for myself
My Code
import requests
from bs4 import BeautifulSoup
import json
ExampleSite = "https://throwbin.io/liWq2S3"
r1 = requests.get(ExampleSite)
r1text = r1.text
soup = BeautifulSoup(r1text,features="html.parser")
ParsedSoup = soup.findAll('script')[1]
print (ParsedSoup)
解决方案
这对我来说看起来不像JSON
,但我可能是错的。但是,您可以使用regex
将这些东西拿出来。
import re
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get("https://throwbin.io/liWq2S3").text, features="html.parser")
bins_contents = re.search(r"}\((.*)\)\);", str(soup), re.S).group(1)
print(bins_contents.replace('"', "").split(",")[4:6])
输出:
['Soup', 'Apple\\nOrange\\nCake\\nPizza']
推荐阅读
- angular - 在reducer中获取后如何清除NgRx存储中的数据
- git - 如何从本地 windows 存储库推送到 windows git 远程服务器
- python - 如何使用类装饰器来实现单例模式?
- r - 根据另一列中的初始字符串添加列值
- nginx - nginx 在 MQTTS 上进行负载平衡
- python - Django如何创建仅对某些类型的用户可见的按钮
- capl - CAPL 代码从按钮发送信号值
- javascript - 即使添加了 async/await,代码也不会异步执行
- android - 如何在 Android 10 中以编程方式连接 Wi-Fi
- linux - 表示 SNMP 中的整数 26