首页 > 解决方案 > 从两个文件中提取订单并匹配交易

问题描述

我有两个附加文件(orders1.txt 和 trades1.txt) 我需要编写一个 Bash 脚本(可能是 awk?)来提取订单并将它们与交易匹配。

输出应生成一个报告,该报告打印包含“ClientID、OrderID、Price、Volume”的逗号分隔值。

除此之外,我还需要为每个客户打印总交易量和营业额(营业额是每笔交易的价格 * 交易量的小计)。

有人可以帮助我使用一个 bash 脚本来使用附件完成上述操作吗?

任何帮助将不胜感激

订单1.txt

Entry Time, Client ID, Security ID, Order ID
25455410,DOLR,XGXUa,DOLR1435804437
25455410,XFKD,BUP3d,XFKD4746464646
25455413,QOXA,AIDl,QOXA7176202067
25455415,QOXA,IRUXb,QOXA6580494597
25455417,YXKH,OBWQs,YXKH4575139017
25455420,JBDX,BKNs,JBDX6760353333
25455428,DOLR,AOAb,DOLR9093170513
25455429,JBDX,QMP1Sh,JBDX2756804453
25455431,QOXA,QIP1Sh,QOXA6563975285
25455434,QOXA,XMUp,QOXA5569701531
25455437,XFKD,QLOJc,XFKD8793976660
25455438,YXKH,MRPp,YXKH2329856527
25455442,JBDX,YBPu,JBDX0100506066
25455450,QOXA,BUPYd,QOXA5832015401
25455451,QOXA,SIOQz,QOXA3909507967
25455451,DOLR,KID1Sh,DOLR2262067037
25455454,DOLR,JJHi,DOLR9923665017
25455461,YXKH,KBAPBa,YXKH2637373848
25455466,DOLR,EPYp,DOLR8639062962
25455468,DOLR,UQXKz,DOLR4349482234
25455474,JBDX,EFNs,JBDX7268036859
25455481,QOXA,XCB1Sh,QOXA4105943392
25455486,YXKH,XBAFp,YXKH0242733672
25455493,JBDX,BIF1Sh,JBDX2840241688
25455500,DOLR,QSOYp,DOLR6265839896
25455503,YXKH,IIYz,YXKH8505951163
25455504,YXKH,ZOIXp,YXKH2185348861
25455513,YXKH,MBOOp,YXKH4095442568
25455515,JBDX,P35p,JBDX9945514579
25455524,QOXA,YXOKz,QOXA1900595629
25455528,JBDX,XEQl,JBDX0126452783
25455528,XFKD,FJJMp,XFKD4392227425
25455535,QOXA,EZIp,QOXA4277118682
25455543,QOXA,YBPFa,QOXA6510879584
25455551,JBDX,EAMp,JBDX8924251479
25455552,QOXA,JXIQp,QOXA4360008399
25455554,DOLR,LISXPh,DOLR1853653280
25455557,XFKD,LOX14p,XFKD1759342196
25455558,JBDX,YXYb,JBDX8177118129
25455567,YXKH,MZQKl,YXKH6485420018
25455569,JBDX,ZPIMz,JBDX2010952336
25455573,JBDX,COPe,JBDX1612537068
25455582,JBDX,HFKAp,JBDX2409813753
25455589,QOXA,XFKm,QOXA9692126523
25455593,XFKD,OFYp,XFKD8556940415
25455601,XFKD,FKQLb,XFKD4861992028
25455606,JBDX,RIASp,JBDX0262502677
25455608,DOLR,HRKKz,DOLR1739013513
25455615,DOLR,ZZXp,DOLR6727725911
25455623,JBDX,CKQPp,JBDX2587184235
25455630,YXKH,ZLQQp,YXKH6492126889
25455632,QOXA,ORPz,QOXA3594333316
25455640,XFKD,HPIXSh,XFKD6780729432
25455648,QOXA,ABOJe,QOXA6661411952
25455654,XFKD,YLIp,XFKD6374702721
25455654,DOLR,BCFp,DOLR8012564477
25455658,JBDX,ZMDKz,JBDX6885176695
25455665,JBDX,CBOe,JBDX8942732453
25455670,JBDX,FRHMl,JBDX5424320405
25455679,DOLR,YFJm,DOLR8212353717
25455680,XFKD,XAFp,XFKD4132890550
25455681,YXKH,PBIBOp,YXKH6106504736
25455684,DOLR,IFDu,DOLR8034515043
25455687,JBDX,JACe,JBDX8243949318
25455688,JBDX,ZFZKz,JBDX0866225752
25455693,QOXA,XOBm,QOXA5011416607
25455694,QOXA,IDQe,QOXA7608439570
25455698,JBDX,YBIDb,JBDX8727773702
25455705,YXKH,MXOp,YXKH7747780955
25455710,YXKH,PBZRYs,YXKH7353828884
25455719,QOXA,QFDb,QOXA2477859437
25455720,XFKD,PZARp,XFKD4995735686
25455722,JBDX,ZLKKb,JBDX3564523161
25455730,XFKD,QFH1Sh,XFKD6181225566
25455733,JBDX,KWVJYc,JBDX7013108210
25455733,YXKH,ZQI1Sh,YXKH7095815077
25455739,YXKH,XIJp,YXKH0497248757
25455739,YXKH,ZXJp,YXKH5848658513
25455747,JBDX,XASd,JBDX4986246117
25455751,XFKD,XQIKz,XFKD5919379575
25455760,JBDX,IBXPb,JBDX8168710376
25455763,XFKD,EVAOi,XFKD8175209012
25455765,XFKD,JXKp,XFKD2750952933
25455773,XFKD,PTBAXs,XFKD8139382011
25455778,QOXA,XJp,QOXA8227838196
25455783,QOXA,CYBIp,QOXA2072297264
25455792,JBDX,PZI1Sh,JBDX7022115629
25455792,XFKD,XIKQl,XFKD6434550362
25455792,DOLR,YKPm,DOLR6394606248
25455796,QOXA,JXOXPh,QOXA9672544909
25455797,YXKH,YIWm,YXKH5946342983
25455803,YXKH,JZEm,YXKH5317189370
25455810,QOXA,OBMFz,QOXA0985316706
25455810,QOXA,DAJPp,QOXA6105975858
25455810,JBDX,FBBJl,JBDX1316207043
25455819,XFKD,YXKm,XFKD6946276671
25455821,YXKH,UIAUs,YXKH6010226371
25455828,DOLR,PTJXs,DOLR1387517499
25455836,DOLR,DCEi,DOLR3854078054
25455845,YXKH,NYQe,YXKH3727923537
25455853,XFKD,TAEc,XFKD5377097556
25455858,XFKD,LMBOXo,XFKD4452678489
25455858,XFKD,AIQXp,XFKD5727938304

交易1.txt

# The first 8 characters is execution time in microseconds since midnight 
# The next 14 characters is the order ID
# The next 8 characters is the zero padded price
# The next 8 characters is the zero padded volume
25455416QOXA6580494597      0000013800001856
25455428JBDX6760353333      0000007000002458
25455434DOLR9093170513      0000000400003832
25455435QOXA6563975285      0000034700009428
25455449QOXA5569701531      0000007500009023
25455447YXKH2329856527      0000038300009947
25455451QOXA5832015401      0000039900006432
25455454QOXA3909507967      0000026900001847
25455456DOLR2262067037      0000034700002732
25455471YXKH2637373848      0000010900006105
25455480DOLR8639062962      0000027500001975
25455488JBDX7268036859      0000005200004986
25455505JBDX2840241688      0000037900002029
25455521YXKH4095442568      0000046400002150
25455515JBDX9945514579      0000040800005904
25455535QOXA1900595629      0000015200006866
25455533JBDX0126452783      0000001700006615
25455542XFKD4392227425      0000035500009948
25455570XFKD1759342196      0000025700007816
25455574JBDX8177118129      0000022400000427
25455567YXKH6485420018      0000039000008327
25455573JBDX1612537068      0000013700001422
25455584JBDX2409813753      0000016600003588
25455603XFKD4861992028      0000017600004552
25455611JBDX0262502677      0000007900003235
25455625JBDX2587184235      0000024300006723
25455658XFKD6374702721      0000046400009451
25455673JBDX6885176695      0000010900009258
25455671JBDX5424320405      0000005400003618
25455679DOLR8212353717      0000041100003633
25455697QOXA5011416607      0000018800007376
25455696QOXA7608439570      0000013000007463
25455716YXKH7747780955      0000037000006357
25455719QOXA2477859437      0000039300009840
25455723XFKD4995735686      0000045500009858
25455727JBDX3564523161      0000021300000639
25455742YXKH7095815077      0000023000003945
25455739YXKH5848658513      0000042700002084
25455766XFKD5919379575      0000022200003603
25455777XFKD8175209012      0000033300006350
25455788XFKD8139382011      0000034500007461
25455793QOXA8227838196      0000011600007081
25455784QOXA2072297264      0000017000004429
25455800XFKD6434550362      0000030000002409
25455801QOXA9672544909      0000039600001033
25455815QOXA6105975858      0000034800008373
25455814JBDX1316207043      0000026500005237
25455831YXKH6010226371      0000011400004945
25455838DOLR1387517499      0000046200006129
25455847YXKH3727923537      0000037400008061
25455873XFKD5727938304      0000048700007298

我有以下脚本:

'''
#!/bin/bash
declare -A volumes
declare -A turnovers
declare -A orders

# Read the first file, remembering for each order the client id
while read -r line
do
        # Jump over comments
        if [[ ${line:0:1} == "#" ]] ; then continue; fi;

        details=($(echo $line | tr ',' " "))
        order_id=${details[3]}
        client_id=${details[1]}
        orders[$order_id]=$client_id
done < $1

echo "ClientID,OrderID,Price,Volume"
while read -r line
do
        # Jump over comments
        if [[ ${line:0:1} == "#" ]] ; then continue; fi;

        order_id=$(echo ${line:8:20} | tr -d '[:space:]')
        client_id=${orders[$order_id]}
        price=${line:28:8}
        volume=${line: -8}

        echo "$client_id,$order_id,$price,$volume"
        price=$(echo $price | awk '{printf "%d", $0}')
        volume=$(echo $volume | awk '{printf "%d", $0}')
        order_turnover=$(($price*$volume))

        old_turnover=${turnovers[$client_id]}
        [[ -z "$old_turnover" ]] && old_turnover=0
        total_turnover=$(($old_turnover+$order_turnover))
        turnovers[$client_id]=$total_turnover

        old_volumes=${volumes[$client_id]}
        [[ -z "$old_volumes" ]] && old_volumes=0
        total_volume=$((old_volumes+volume))
        volumes[$client_id]=$total_volume
done < $2

echo "ClientID,Volume,Turnover"
for client_id in ${!volumes[@]}
do
        volume=${volumes[$client_id]}
        turnover=${turnovers[$client_id]}
        echo "$client_id,$volume,$turnover"
done

谁能想到更优雅的东西?

提前致谢

C

标签: linuxawk

解决方案


假设 1:这两个文件是有序的,因此行x表示比x+1更早的操作。如果没有,则需要进一步的工作。

这个假设使我们的工作更容易。我们先把交易者的分隔符改成逗号:

sed -i 's/      /,/g' traders.txt

为了简单起见,这将在适当的位置完成。因此,您现在拥有逗号分隔的交易者,订单也是如此。这是假设 2。

继续为交易者工作:拆分所有列并添加标题1。稍后再详细说明原因。

gawk -i inplace -v INPLACE_SUFFIX=.bak 'BEGINFILE{FS=",";OFS=",";print "execution time,order ID,price,volume";}{print substr($1,1,8),substr($1,9),substr($2,1,9),substr($2,9)}' traders.txt

丑陋但有效。现在让我们使用以下 awk 脚本处理您的数据:

BEGIN { 
    FS=","
    OFS=","
}

{
    if (1 == NR) {
        getline line < TRADERS                              # consume title line
        print "Client ID,Order ID,Price,Volume,Turnover";   # consume title line. Remove print to forget it

        getline line < TRADERS                      # reads first data line
        split(line, transaction, ",")

        next
    }       

    if (transaction[2] == $4) {
        print $2, $4, transaction[3], transaction[4], transaction[3]*transaction[4]
        getline line < TRADERS                      # reads new data line
        split(line, transaction, ",")
    }

}

调用者:

gawk -f script -v TRADERS=traders.txt orders.txt

你有它。一些警告:

  1. 检查数字,因为使用零填充数字的隐式 gawk 数字转换可能不正确。有一个解决方案,以防万一;

  2. 如果我们用完交易者的线路,getline 可能会爆炸。我没有做任何检查,这取决于你

  3. 无法控制时间戳。匹配基于Order ID.

输出文件:

Client ID,Order ID,Price,Volume,Turnover
QOXA,QOXA6580494597,000001380,00001856,2561280
JBDX,JBDX6760353333,000000700,00002458,1720600
DOLR,DOLR9093170513,000000040,00003832,153280
QOXA,QOXA6563975285,000003470,00009428,32715160
QOXA,QOXA5569701531,000000750,00009023,6767250
YXKH,YXKH2329856527,000003830,00009947,38097010
QOXA,QOXA5832015401,000003990,00006432,25663680
QOXA,QOXA3909507967,000002690,00001847,4968430
DOLR,DOLR2262067037,000003470,00002732,9480040
YXKH,YXKH2637373848,000001090,00006105,6654450
DOLR,DOLR8639062962,000002750,00001975,5431250
JBDX,JBDX7268036859,000000520,00004986,2592720
JBDX,JBDX2840241688,000003790,00002029,7689910
YXKH,YXKH4095442568,000004640,00002150,9976000
JBDX,JBDX9945514579,000004080,00005904,24088320
QOXA,QOXA1900595629,000001520,00006866,10436320
JBDX,JBDX0126452783,000000170,00006615,1124550
XFKD,XFKD4392227425,000003550,00009948,35315400
XFKD,XFKD1759342196,000002570,00007816,20087120
JBDX,JBDX8177118129,000002240,00000427,956480
YXKH,YXKH6485420018,000003900,00008327,32475300
JBDX,JBDX1612537068,000001370,00001422,1948140
JBDX,JBDX2409813753,000001660,00003588,5956080
XFKD,XFKD4861992028,000001760,00004552,8011520
JBDX,JBDX0262502677,000000790,00003235,2555650
JBDX,JBDX2587184235,000002430,00006723,16336890
XFKD,XFKD6374702721,000004640,00009451,43852640
JBDX,JBDX6885176695,000001090,00009258,10091220
JBDX,JBDX5424320405,000000540,00003618,1953720
DOLR,DOLR8212353717,000004110,00003633,14931630
QOXA,QOXA5011416607,000001880,00007376,13866880
QOXA,QOXA7608439570,000001300,00007463,9701900
YXKH,YXKH7747780955,000003700,00006357,23520900
QOXA,QOXA2477859437,000003930,00009840,38671200
XFKD,XFKD4995735686,000004550,00009858,44853900
JBDX,JBDX3564523161,000002130,00000639,1361070
YXKH,YXKH7095815077,000002300,00003945,9073500
YXKH,YXKH5848658513,000004270,00002084,8898680
XFKD,XFKD5919379575,000002220,00003603,7998660
XFKD,XFKD8175209012,000003330,00006350,21145500
XFKD,XFKD8139382011,000003450,00007461,25740450
QOXA,QOXA8227838196,000001160,00007081,8213960
QOXA,QOXA2072297264,000001700,00004429,7529300
XFKD,XFKD6434550362,000003000,00002409,7227000
QOXA,QOXA9672544909,000003960,00001033,4090680
QOXA,QOXA6105975858,000003480,00008373,29138040
JBDX,JBDX1316207043,000002650,00005237,13878050
YXKH,YXKH6010226371,000001140,00004945,5637300
DOLR,DOLR1387517499,000004620,00006129,28315980
YXKH,YXKH3727923537,000003740,00008061,30148140
XFKD,XFKD5727938304,000004870,00007298,35541260

1:需要 gawk 4.1.0 或更高版本


推荐阅读