regex - How to extract substrings and numbers from curl result using grep or other method
问题描述
This is my first question/post, and I’m very new to using regular expressions. Despite lots of searching and experimenting (e.g., -o and -w options), I can’t seem to make the following work (and I'm too embarrassed to post all of my failed attempts, but see the end of the post). I’m trying to pull some weather details (status, temperature, and wind information) from a web site.
I’m using the following statement to extract the appropriate information into a text file, which I then want to grep to extract the information. Current weather is listed at the top, so I only need the first few lines (head -n 7). You can visit the site (https://wttr.in/[city]) and enter a [city] to see the diversity of results.
curl -s wttr.in/fargo | head -n 7 > ~/Downloads/weather.cache
Here’s are the problems/challenges I’ve faced:
- There is some “stick” art on every line, which is color-coded. These codes get pulled into the text file, along with the “sticks” text.
- Current weather status could be one word (Sunny) or multiple words (Partly cloudy). I want everything.
- Temperature could be a single number (5 °F), a range (0-15 °F), and of course negative numbers are possible (-10--5 °F ). I need all the information.
- Wind direction and speed (↘ 8 mph). Again, speed can be a range (5-16 mph). Wind direction is a special/unicode character, which I want to capture.
- I want to assign each items (#2-4) to its own variable without any extra stuff from the line.
Ideal results from my above example, which will be used in a status bar, would be as follows.
Weather = “Sunny”</p>
Temp = “-22--5 °F”</p>
Wind = “↘ 8 mph”</p>
Any assistance would be most appreciated. Apologies in advance as I struggled to correctly format this post.
Background
Actual website view is below, but without the color-coding for the "Sun" stick figure and "8" (wind speed). Note: the color-coding isn't right, due to the posting software (and probably my lack of knowledge). So, it might be helpful to go to the original site (https://wwtr.in/fargo).
Weather report: Fargo, United States of America
\ / Sunny
.-. -22--5 °F
- ( ) - ↘ 8 mph
`_' 9 mi
/ \ 0.0 in
Curl result is below, which is being stored in the weather cache file I'm working with.
Weather report: Fargo, United States of America
[38;5;226m \ / [0m Sunny
[38;5;226m .-. [0m [38;5;021m-22[0m-[38;5;021m-5[0m °F[0m
[38;5;226m ― ( ) ― [0m [1m↘[0m [38;5;226m8[0m mph[0m
[38;5;226m `-’ [0m 9 mi[0m
[38;5;226m / \ [0m 0.0 in[0m
===
Some Attempts
As an example with temperature, here's the closest I've come.
egrep --regexp='-?[[:digit:]].*°F'
.-. -22--5 °F
Failed attempts include (also tried -w option).
grep -m 1 -Eo -e '-?[[:digit:]].*°F'
38;5;226m .-. -22--5 °F
解决方案
指出 API 允许以其他方式下载会不会很无聊?
例如。各种缩写格式,例如:
curl "http://wttr.in/Fargo?format=4"
curl "http://wttr.in/Fargo?format=%l:%c:%t:%w"
或 html:
curl -H 'User-Agent: mozilla/compatible' http://wttr.in/Fargo
后者有助于插入逻辑标记。
另一种去除 ANSI 转义的方法是:
curl -s http://wttr.in/Fargo | head -7 | colorize --clean-all
如果您有colorize
实用程序(适用于各种 linux 发行版)。
推荐阅读
- python-3.x - 如何在 Scite 中使用 Python 3.7 打印ダイスキ?
- grpc - 如何发送错误详细信息,例如 BadRequest
- c++ - 所需库中的g ++如何链接?
- wpf - 带有数据绑定的 WPF 自定义控件
- python - Python - 使用 sympy 和 lambdify 时,矩阵的点积似乎不起作用
- blueprism - Blueprism - 限制某些资源的进程可访问性
- java - 埃拉托色尼筛的运行时
- opengl - OpenGL:位图字体的mipmap生成使其在缩小时变暗
- accessibility - WebP 和可访问性 - 替代文本
- c# - 使用 C# 检查另一个列表中的值