首页 > 解决方案 > Grep command questions - Grep text from program output?

问题描述

I'm trying to extract information from the json file youtube-dl and grep some information from it to a .txt file.

Example the output from youtube-dl when downloading a video.

[info] Writing video description to: /Users/ACCOUNT/Downloads/Rick Astley - Never Gonna Give You Up (Video).description
[info] Writing video description metadata as JSON to: /Users/ACCOUNT/Downloads/Rick Astley - Never Gonna Give You Up (Video).info.json

My thinking

  1. Grep .json and .description file paths to use that in future grep commands.
  2. Run working version of the script below and it adds the new text above description text in .description file.
  3. (Rename .description to .txt)

I prefer this method because youtube-dl is only needed to run one time.

If there are other univeral commands that work on mac and Linux as grep that can make it simple then I see no problem to use them instead of grep.


QUESTIONS


Script example

    txtfile="$GREP_DESCRIPTION_FROM_YOUTUBE-DL_OUTPUT"
    jsonfile="$GREP_JSON_FROM_YOUTUBE-DL_OUTPUT"

    echo TITLE >> $txtfile
    grep -o '"title": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo CHANNEL >> $txtfile
    grep -o '"uploader": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo CHANNEL URL >> $txtfile
    grep -o '"uploader_url": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo UPLOAD DATE >> $txtfile
    grep -o '"upload_date": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo TAGS >> $txtfile
    grep -o '"tags": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo URL >> $txtfile
    echo $url >> $txtfile
    echo \ >> $txtfile
    
    echo DESCRIPTION >> $txtfile

标签: jsonbashyoutubegrepyoutube-dl

解决方案


$ youtube-dl --help | grep "dump-json"
    -j, --dump-json                  Simulate, quiet but print JSON information.

使用此选项,根本无需下载视频。只需将输出通过管道youtube-dl传输到适当的 JSON 解析器。我会推荐

$ youtube-dl -j https://www.youtube.com/watch?v=dQw4w9WgXcQ | xidel -se '
  $json/(
    "- TITLE -",title,"",
    "- CHANNEL -",uploader,"",
    "- CHANNEL URL -",uploader_url,"",
    "- UPLOAD DATE -",upload_date,"",
    "- URL -",webpage_url,"",
    "- TAGS -",translate(serialize(tags,{"method":"json"}),"[]",""),"",
    "- DESCRIPTION -",description
  )
'

(替代格式化标签join((tags)() ! x""{.}"",","):)

如果您已经下载了视频和 JSON(--write-info-json我假设),那么您可以使用以下命令检索文件名--get-filename

$ youtube-dl --get-filename https://www.youtube.com/watch?v=dQw4w9WgXcQ
Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.mp4

$ jsonfile=$(youtube-dl --get-filename https://www.youtube.com/watch?v=dQw4w9WgXcQ)

$ xidel -s "${jsonfile/.mp4/.info}.json" -e '
  $json/(
    [...]
  )
' > "${jsonfile/.mp4/.info}.txt"

命令输出或“ Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.info.txt ”的内容:

- TITLE -
Rick Astley - Never Gonna Give You Up (Video)

- CHANNEL -
RickAstleyVEVO

- CHANNEL URL -
http://www.youtube.com/user/RickAstleyVEVO

- UPLOAD DATE -
20091024

- URL -
https://www.youtube.com/watch?v=dQw4w9WgXcQ

- TAGS -
"the boys soundtrack","the boys amazon prime","Never gonna give you up the boys","RickAstleyvevo","vevo","official","Rick Roll","video","music video","Rick Astley album","rick astley official","single","album","together forever","Never Gonna Give You Up","Whenever You Need Somebody","pop","rickrolled","WRECK-IT RALPH 2","Fortnite song Fortnite item shop Fortnite time shop today Fortnite montage","Fortnite event","Fortnite dance","fortnite never gonna give you up"

- DESCRIPTION -
Rick Astley's official music video for "Never Gonna Give You Up" Listen to Rick Astley: https://RickAstley.lnk.to/_listenYD Subscribe to the official Rick As...

youtube-dl实际上,如果这些信息就是您所追求的,则没有必要。解析 html-source 就足够了:

$ xidel -s https://www.youtube.com/watch?v=dQw4w9WgXcQ -e '
  "- TITLE -",//meta[@itemprop="name"]/@content,"",
  "- CHANNEL -",//meta[@itemprop="channelId"]/@content,"",
  "- CHANNEL URL -",//span[@itemprop="author"]/link/@href,"",
  "- UPLOAD DATE -",//meta[@itemprop="datePublished"]/@content,"",
  "- URL -",//meta[@property="og:url"]/@content,"",
  "- TAGS -",join(//meta[@property="og:video:tag"]/x""{@content}"",","),"",
  "- DESCRIPTION -",//meta[@itemprop="description"]/@content
'

html-source 也有一个巨大的 JSON,包含你需要的所有信息。提取起来有点困难,但可以做到:

$ xidel -s https://www.youtube.com/watch?v=dQw4w9WgXcQ -e '
  parse-json(//script/extract(.,"ytInitialPlayerResponse = (\{.+\})",1))/(
    "- TITLE -",videoDetails/title,"",
    "- CHANNEL -",videoDetails/channelId,"",
    "- CHANNEL URL -",microformat//ownerProfileUrl,"",
    "- UPLOAD DATE -",microformat//publishDate,"",
    "- URL -","https://www.youtube.com/watch?v="||videoDetails/videoId,"",
    "- TAGS -",translate(serialize(videoDetails/keywords,{"method":"json"}),"[]",""),"",
    "- DESCRIPTION -",x:lines(videoDetails/shortDescription)[1]
  )
'

推荐阅读