首页 > 解决方案 > How to extract substrings and numbers from curl result using grep or other method

问题描述

This is my first question/post, and I’m very new to using regular expressions. Despite lots of searching and experimenting (e.g., -o and -w options), I can’t seem to make the following work (and I'm too embarrassed to post all of my failed attempts, but see the end of the post). I’m trying to pull some weather details (status, temperature, and wind information) from a web site.

I’m using the following statement to extract the appropriate information into a text file, which I then want to grep to extract the information. Current weather is listed at the top, so I only need the first few lines (head -n 7). You can visit the site (https://wttr.in/[city]) and enter a [city] to see the diversity of results.

curl -s wttr.in/fargo | head -n 7 > ~/Downloads/weather.cache

Here’s are the problems/challenges I’ve faced:

  1. There is some “stick” art on every line, which is color-coded. These codes get pulled into the text file, along with the “sticks” text.
  2. Current weather status could be one word (Sunny) or multiple words (Partly cloudy). I want everything.
  3. Temperature could be a single number (5 °F), a range (0-15 °F), and of course negative numbers are possible (-10--5 °F ). I need all the information.
  4. Wind direction and speed (↘ 8 mph). Again, speed can be a range (5-16 mph). Wind direction is a special/unicode character, which I want to capture.
  5. I want to assign each items (#2-4) to its own variable without any extra stuff from the line.

Ideal results from my above example, which will be used in a status bar, would be as follows.

Weather = “Sunny”</p>

Temp = “-22--5 °F”</p>

Wind = “↘ 8 mph”</p>

Any assistance would be most appreciated. Apologies in advance as I struggled to correctly format this post.

Background

Actual website view is below, but without the color-coding for the "Sun" stick figure and "8" (wind speed). Note: the color-coding isn't right, due to the posting software (and probably my lack of knowledge). So, it might be helpful to go to the original site (https://wwtr.in/fargo).

Weather report: Fargo, United States of America

         \   /     Sunny
          .-.      -22--5 °F      
       - (   ) -   ↘ 8 mph        
          `_'      9 mi           
         /   \     0.0 in 



Curl result is below, which is being stored in the weather cache file I'm working with.

Weather report: Fargo, United States of America

 [38;5;226m    \   /    [0m Sunny
 [38;5;226m     .-.     [0m [38;5;021m-22[0m-[38;5;021m-5[0m °F[0m      
 [38;5;226m  ― (   ) ―  [0m [1m↘[0m [38;5;226m8[0m mph[0m        
 [38;5;226m     `-’     [0m 9 mi[0m           
 [38;5;226m    /   \    [0m 0.0 in[0m

===

Some Attempts

As an example with temperature, here's the closest I've come.

egrep --regexp='-?[[:digit:]].*°F'


  .-.      -22--5 °F

Failed attempts include (also tried -w option).

    grep -m 1 -Eo -e '-?[[:digit:]].*°F'

38;5;226m     .-.      -22--5 °F

标签: regexcurlawkgrepextract

解决方案


指出 API 允许以其他方式下载会不会很无聊?

例如。各种缩写格式,例如:

curl "http://wttr.in/Fargo?format=4"
curl "http://wttr.in/Fargo?format=%l:%c:%t:%w"

或 html:

curl -H 'User-Agent: mozilla/compatible' http://wttr.in/Fargo

后者有助于插入逻辑标记。

另一种去除 ANSI 转义的方法是:

curl -s http://wttr.in/Fargo | head -7 | colorize --clean-all

如果您有colorize实用程序(适用于各种 linux 发行版)。


推荐阅读