首页 > 解决方案 > 从数据文件中提取数据

问题描述

我有 31 个文件,我想从中提取特定数据并将其写入一个文本文件或在同一个文件中进行编辑。文件示例如下:

请下载“codg0010.18i.Z”文件

数据如下:

   2018     1     1     0     0     0                        EPOCH OF CURRENT MAP
   ...
   45.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   59   63   69   76   83   90   96  100  100   93   81   68   55   46   39   34
   31   29   28   28   26   25   24   26   32   40   48   54   56   54   50   46
   43   42   42   44   46   48   51   54   57   59   59   58   55   51   48   47
   48   50   53   56   58   61   63   65   66   66   65   65   66   68   72   76
   82   86   88   87   81   72   64   59   59
    42.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   63   67   74   80   88   97  107  115  116  109   95   79   64   53   45   40
   37   36   36   38   39   39   40   43   48   54   60   63   62   59   54   50
   47   45   45   46   47   49   51   54   57   60   60   59   57   54   53   54
   56   60   62   64   65   67   69   72   74   74   74   73   72   72   74   77
   82   86   90   89   84   77   68   63   63
    40.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   71   75   80   84   90  100  112  123  127  122  108   91   75   64   56   50
   46   45   47   51   54   57   59   61   65   70   72   72   69   64   58   53
   50   48   47   46   46   47   49   52   56   59   61   61   59   58   58   60
   63   66   68   68   68   70   73   77   80   82   82   81   80   79   79   81
   84   89   93   94   91   84   76   72   71
    37.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   82   84   86   87   89   96  108  122  130  128  116  101   87   77   70   63
   58   56   59   64   69   73   76   79   82   84   83   80   74   67   60   55
   53   51   49   47   45   44   47   51   55   59   62   63   63   62   62   64
   67   69   69   68   67   69   74   81   86   89   90   89   88   88   88   89
   92   96  100  103  100   94   87   83   82
    35.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   94   95   94   89   84   86   96  111  122  125  118  108   98   92   85   77
   70   67   69   74   81   86   89   92   93   93   91   85   77   68   61   57
   55   53   51   48   45   44   46   51   56   61   64   66   66   66   66   67
   68   68   67   64   64   68   75   83   90   95   97   98   98   99   99  101
  103  107  112  115  113  108  100   95   94
    32.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
  109  109  104   94   81   75   80   93  107  114  113  109  106  104   98   90
   80   75   76   82   88   93   95   97   98   97   93   86   77   68   61   58
   57   56   55   51   48   47   49   54   59   63   67   69   70   70   68   67
   65   64   61   59   61   67   76   86   94   99  103  105  108  111  114  116
  119  123  128  130  129  123  116  110  109
    30.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
  127  127  121  106   85   69   66   75   88   98  103  106  110  112  108   98
   87   80   80   84   90   93   94   95   95   93   89   82   74   67   61   58
   58   59   59   57   54   52   54   58   63   66   68   71   72   70   67   63
   59   56   54   54   58   67   78   89   97  102  107  111  117  123  129  135
  139  143  147  149  147  141  133  127  127
  ...
  1                                                      END OF TEC MAP

数据以“START OF TEC MAP”开始,以“END OF RMS MAP”结束。为了不处理标题..

sed -n -i '/START OF TEC MAP/,/END OF RMS MAP/p'

我尝试获取最后五个值的第二行,每个循环以 45.0-180.0 开头并以 25.0-180.0 结尾。所以应该是这样的:

   2018     1     1     0     0     0                        EPOCH OF CURRENT MAP
   ...
   45.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   54   56   54   50   46     
    42.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H       
   63   62   59   54   50
    40.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H       
   72   69   64   58   53
    37.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   80   74   67   60   55
    35.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   85   77   68   61   57
    32.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   86   77   68   61   58
    30.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   82   74   67   61   58
  ...
  1                                                      END OF TEC MAP

正则表达式像我的初学者一样复杂。

标签: bashawksed

解决方案


下面的 AWK 应该可以解决这个问题:

awk '
  /(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/
  $1 == "45.0-180.0" {p=1}
  $1 == "25.0-180.0" {p=0}
  p && $0 ~ "LAT/LON1/LON2/DLON/H" {
    print; getline; getline
    print $(NF-4)" "$(NF-3)" "$(NF-2)" "$(NF-1)" "$NF
  }
' < FILE

解释:

  • 第一行总是打印与正则表达式匹配的行/(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/。这将包括您想要始终包含的所有标题。

  • 接下来的两行将标志“p”设置为真或假(1 或 0),具体取决于$1第一个字段的内容。

  • $0 ~ "LAT/LON1/LON2/DLON/H"允许我匹配/AWK 正则表达式中的字符,有关该语法的更多信息,请参阅此内容。添加p && $0 ~ "LAT/LON1/LON2/DLON/H" { ... }表示如果 p 为真并且整行与模式匹配,则执行 block 内的步骤{ ... }

  • 在块内,我打印该行,然后调用getline两次以读取另外 2 行。

  • 然后打印相对于 AWK 的特殊变量的倒数第五个、倒数第四个、倒数第三个、倒数第二个和最后一个字段$NF,它会为您提供字段数。

测试:

▶ awk '
    /(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/
    $1 == "45.0-180.0" {p=1}
    $1 == "25.0-180.0" {p=0}
    p && $0 ~ "LAT/LON1/LON2/DLON/H" {
      print; getline; getline
      print $(NF-4)" "$(NF-3)" "$(NF-2)" "$(NF-1)" "$NF
    }
  ' < c1pg0010.18i

我得到:

     1                                                      START OF TEC MAP
  2018     1     1     0     0     0                        EPOCH OF CURRENT MAP
    45.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
44 45 44 43 42
    42.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
53 54 54 52 51
    40.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
61 62 61 59 57
    37.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
67 66 65 63 61
    35.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
71 70 68 67 65
    32.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
75 72 71 71 69
    30.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
79 76 75 76 75
    27.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
85 82 81 82 83
     1                                                      END OF TEC MAP
     2                                                      START OF TEC MAP
...

如果您随后需要代码来处理 FTP 目录中的所有 31 个文件,请将您的 AWK 包装在此 Bash 代码中:

for f in *.Z ; do
  gunzip $f
  decompressed=${f%.Z}
  awk '
    /(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/
    $1 == "45.0-180.0" {p=1}
    $1 == "25.0-180.0" {p=0}
    p && $0 ~ "LAT/LON1/LON2/DLON/H" {
      print; getline; getline
      print $(NF-4)" "$(NF-3)" "$(NF-2)" "$(NF-1)" "$NF
    }
  ' < $decompressed > $decompressed.edited
done

我的假设是您从包含所有扩展名为.Z.


推荐阅读