json - Grep command questions - Grep text from program output?
问题描述
I'm trying to extract information from the json file youtube-dl and grep some information from it to a .txt file.
Example the output from youtube-dl when downloading a video.
[info] Writing video description to: /Users/ACCOUNT/Downloads/Rick Astley - Never Gonna Give You Up (Video).description
[info] Writing video description metadata as JSON to: /Users/ACCOUNT/Downloads/Rick Astley - Never Gonna Give You Up (Video).info.json
My thinking
- Grep .json and .description file paths to use that in future grep commands.
- Run working version of the script below and it adds the new text above description text in .description file.
- (Rename .description to .txt)
I prefer this method because youtube-dl is only needed to run one time.
If there are other univeral commands that work on mac and Linux as grep that can make it simple then I see no problem to use them instead of grep.
QUESTIONS
- How to grep the file paths and use it in other commands described below in the script examples?
- How to run the script below but adding all that information above the current description text in that text file?
- When it get information from the json file it also gets " before and after. So a video name becomes:
"VIDEO NAME"
, but want itVIDEO NAME
only. - How to grep the TAGS from the json file? Tags look like this in .json file:
"tags": ["music", "video", "classic"]
. Want to get"music", "video", "classic"
.
Script example
txtfile="$GREP_DESCRIPTION_FROM_YOUTUBE-DL_OUTPUT"
jsonfile="$GREP_JSON_FROM_YOUTUBE-DL_OUTPUT"
echo TITLE >> $txtfile
grep -o '"title": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
echo \ >> $txtfile
echo CHANNEL >> $txtfile
grep -o '"uploader": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
echo \ >> $txtfile
echo CHANNEL URL >> $txtfile
grep -o '"uploader_url": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
echo \ >> $txtfile
echo UPLOAD DATE >> $txtfile
grep -o '"upload_date": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
echo \ >> $txtfile
echo TAGS >> $txtfile
grep -o '"tags": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
echo \ >> $txtfile
echo URL >> $txtfile
echo $url >> $txtfile
echo \ >> $txtfile
echo DESCRIPTION >> $txtfile
解决方案
$ youtube-dl --help | grep "dump-json"
-j, --dump-json Simulate, quiet but print JSON information.
使用此选项,根本无需下载视频。只需将输出通过管道youtube-dl
传输到适当的 JSON 解析器。我会推荐xidel。
$ youtube-dl -j https://www.youtube.com/watch?v=dQw4w9WgXcQ | xidel -se '
$json/(
"- TITLE -",title,"",
"- CHANNEL -",uploader,"",
"- CHANNEL URL -",uploader_url,"",
"- UPLOAD DATE -",upload_date,"",
"- URL -",webpage_url,"",
"- TAGS -",translate(serialize(tags,{"method":"json"}),"[]",""),"",
"- DESCRIPTION -",description
)
'
(替代格式化标签join((tags)() ! x""{.}"",",")
:)
如果您已经下载了视频和 JSON(--write-info-json
我假设),那么您可以使用以下命令检索文件名--get-filename
:
$ youtube-dl --get-filename https://www.youtube.com/watch?v=dQw4w9WgXcQ
Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.mp4
$ jsonfile=$(youtube-dl --get-filename https://www.youtube.com/watch?v=dQw4w9WgXcQ)
$ xidel -s "${jsonfile/.mp4/.info}.json" -e '
$json/(
[...]
)
' > "${jsonfile/.mp4/.info}.txt"
命令输出或“ Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.info.txt ”的内容:
- TITLE -
Rick Astley - Never Gonna Give You Up (Video)
- CHANNEL -
RickAstleyVEVO
- CHANNEL URL -
http://www.youtube.com/user/RickAstleyVEVO
- UPLOAD DATE -
20091024
- URL -
https://www.youtube.com/watch?v=dQw4w9WgXcQ
- TAGS -
"the boys soundtrack","the boys amazon prime","Never gonna give you up the boys","RickAstleyvevo","vevo","official","Rick Roll","video","music video","Rick Astley album","rick astley official","single","album","together forever","Never Gonna Give You Up","Whenever You Need Somebody","pop","rickrolled","WRECK-IT RALPH 2","Fortnite song Fortnite item shop Fortnite time shop today Fortnite montage","Fortnite event","Fortnite dance","fortnite never gonna give you up"
- DESCRIPTION -
Rick Astley's official music video for "Never Gonna Give You Up" Listen to Rick Astley: https://RickAstley.lnk.to/_listenYD Subscribe to the official Rick As...
youtube-dl
实际上,如果这些信息就是您所追求的,则没有必要。解析 html-source 就足够了:
$ xidel -s https://www.youtube.com/watch?v=dQw4w9WgXcQ -e '
"- TITLE -",//meta[@itemprop="name"]/@content,"",
"- CHANNEL -",//meta[@itemprop="channelId"]/@content,"",
"- CHANNEL URL -",//span[@itemprop="author"]/link/@href,"",
"- UPLOAD DATE -",//meta[@itemprop="datePublished"]/@content,"",
"- URL -",//meta[@property="og:url"]/@content,"",
"- TAGS -",join(//meta[@property="og:video:tag"]/x""{@content}"",","),"",
"- DESCRIPTION -",//meta[@itemprop="description"]/@content
'
html-source 也有一个巨大的 JSON,包含你需要的所有信息。提取起来有点困难,但可以做到:
$ xidel -s https://www.youtube.com/watch?v=dQw4w9WgXcQ -e '
parse-json(//script/extract(.,"ytInitialPlayerResponse = (\{.+\})",1))/(
"- TITLE -",videoDetails/title,"",
"- CHANNEL -",videoDetails/channelId,"",
"- CHANNEL URL -",microformat//ownerProfileUrl,"",
"- UPLOAD DATE -",microformat//publishDate,"",
"- URL -","https://www.youtube.com/watch?v="||videoDetails/videoId,"",
"- TAGS -",translate(serialize(videoDetails/keywords,{"method":"json"}),"[]",""),"",
"- DESCRIPTION -",x:lines(videoDetails/shortDescription)[1]
)
'
推荐阅读
- c - 如何使用 C 语言在 stm32 中仅使用 RX_timeout 和 TX_timeout 修复 ping pong
- python - 如何在python中创建具有多个列表/数组的数据框
- python - Python 将 XML 文件解析为 pandas 数据框
- c - C 为命令 'ls -lR [-ddir] | 构建管道 排序 | grep 字符串 [>outfile]'
- java - 需要有关 javardd 中字母计数的帮助
- wpf - 添加命令后按钮被禁用
- python - Scrapy - Shell 抓取页面没有任何问题,但选择器失败
- keycloak - Keycloak:将用户名映射到主题声明
- jquery - 窗口调整大小。适应宽度条件
- powershell - 使用 PowerShell 的两个时间戳之间的差异