首页 > 解决方案 > How to grep only the desired position match in a single line, where there is multiple matches, using regex?

问题描述

I have a file with hundreds of links of the form: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mkv

And, sometimes, the end of the line has mp4 instead of mkv, like below: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mp4

I already tried 'http.+mp4' pattern to get a single url, or with mkv at the end, but it keeps printing that whole line, because '.+' will do just that, return the phrases that start with http and ends with mp4.

How could specify the regex (using grep) to match only one of the urls, without that html garbage in the middle?

The final result needs to be https://file1.mp4 or https://file1_v2.mkv, with me specifying which one I want.

标签: regexgrep

解决方案


您可以在模式中排除双引号:

grep -o 'https:\/\/[^"]*\.mp4' file
grep -o 'https:\/\/[^"]*\.mkv' file

或两种类型

grep -E -o 'https:\/\/[^"]*\.(mp4|mkv)' file

推荐阅读