regex - How do I extract text from awk results?
问题描述
This command returns the entire line where the word "About" is found.
! lynx -source google.com/search?q=india | awk '/About */'
This will return something like this...
<div id=gbar><nobr><b class=gb1>Search</b> <a class=gb1 href="http://www.google.com/search?hl=en&tbm=isch&source=og&tab=wi">Images</a> <a class=gb1 href="http://maps.google.com/maps?hl=en&tab=wl">Maps</a> <a class=gb1 href="https://play.google.com/?hl=en&tab=w8">Play</a> <a class=gb1 href="http://www.youtube.com/results?gl=US&tab=w1">YouTube</a> <a class=gb1 href="http://news.google.com/nwshp?hl=en&tab=wn">News</a> <a class=gb1 href="https://mail.google.com/mail/?tab=wm">Gmail</a> <a class=gb1 href="https://drive.google.com/?tab=wo">Drive</a> <a class=gb1 style="text-decoration:none" href="https://www.google.com/intl/en/about/products?tab=wh"><u>More</u> »</a></nobr></div><div id=guser width=100%><nobr><span id=gbn class=gbi></span><span id=gbf class=gbf></span><span id=gbe></span><a href="http://www.google.com/history/optout?hl=en" class=gb4>Web History</a> | <a href="/preferences?hl=en" class=gb4>Settings</a> | <a target=_top id=gb_70 href="https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=http://www.google.com/search%3Fq%3Dindia" class=gb4>Sign in</a></nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div><font size="-2"><br clear="all"></font><table border="0" cellpadding="3" cellspacing="0"><tr><td valign="top"><a href="/webhp?hl="><img src="/images/branding/searchlogo/1x/googlelogo_desk_heirloom_color_150x55dp.gif" height="55" width="150" border="0"></a></td><td valign="bottom"><nobr><form name="gs" method="GET" action="/search"><input type="text" name="q" maxlength="2048" title="Search" value="india" size="41"><font size="-1"> </font><input type="Submit" name="btnG" value="Search"><font size="-1"> </font></form></nobr></td><td width="100%" valign="middle"><nobr><font size="-2"><a href="/advanced_search?q=india&hl=">Advanced Search</a><br><a href="/preferences?q=india&hl=">Preferences</a></font></nobr></td></tr></table><table width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td bgcolor="#3366CC" height="1"><img height="1" width="1" alt=""></td></tr></table><table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#D5DDF3"><tr><td><img height="2" width="1" alt=""></td></tr><tr><td><table width="100%" border="0" cellpadding="0" cellspacing="4"><tr><td nowrap><font size="-1"><strong>Web</strong></font></td><td nowrap align="right"><font size="-1">About 10,730,000,000 results (<b>0.34</b> seconds)</font></td></tr></table></td></tr><tr><td><img height="1" width="1" alt=""></td></tr></table><p><a href="/url?q=https://en.wikipedia.org/wiki/India&sa=U&ved=2ahUKEwiih9Hri9niAhWvuVkKHXWUCLkQFjAAegQIDBAH&usg=AOvVaw2q-I4x7L6MSaWE9ziLkwjR"><b>India</b> - Wikipedia</a><table cellpadding="0" cellspacing="0" border="0"><tr><td class="j"><font size="-1"><b>India</b> (ISO: Bhārat), also known as the Republic of <b>India</b> (ISO: Bhārat Gaṇarājya), <br>
But I want to return only the number of results. for e.g.
Expected result:
About 7,970,000,000 results
(Or only the number without words like About)
What I am actually looking for is to return the number of commas in the count. For e.g. the above example has 3 commas it means the count is "very high" and if there is no , (less than 1000 results) the count should return "low".
解决方案
-dump
代替-source
. _ 这消除了 html 代码:
$ lynx -dump google.com/search?q=india | awk -F, '/About/'
Web About 7,410,000,000 results (0.33 seconds)
只打印数字:
$ lynx -dump google.com/search?q=india | awk '/About/{print $3}'
7,410,000,000
只打印逗号的数量:
$ lynx -dump google.com/search?q=india | awk -F, '/About/{print NF-1}'
3
推荐阅读
- kubernetes - Grafana 显示来自 influxdb 的死 kubernetes pod
- ionic3 - ionic cordova:当我从文件选择器插件获取内容:// url 时,如何在 android 库的 img 标签中显示图像
- java - 如何选择 Kafka transaction.id
- java - 将 BytesWritable 转换为 com.vividsolutions.jts.geom.Geometry
- java - Spring没有在构造函数中创建带有布尔值的bean
- jquery - KendoFileUpload 不适用于 jquery 验证
- react-native - React Native:使用嵌套导航器时隐藏标题
- ios - 获取数字键盘以显示在输入类型=文本 iOS
- elasticsearch - mapper [clientip] 有不同的 [norms] 值,不能从禁用更改为启用]
- reactjs - 组织项目组件——这需要按字母顺序排列吗?