首页 > 解决方案 > 从python中的文本字符串中提取相关信息

问题描述

我有一堆具有以下文本格式的文本字符串:

b' ABC Singapore Pte Ltd\n SG-Singapore - Co-Load\n No 30, Carta Avenue\n #01-06 and 02-06\n 4981 SINGAPORE    \n Singapore  \n Tel : +65 1234 56778  \n\nINVOICE ABC00518941 Copy\n\ABC PROJECT\n9 CHANGI SOUTH \nTAMPINES 486345 \n \n \n\nINVOICE DATE\nCUSTOMER ID\nSHIPMENT\nDUE DATE\nTERMS\nINCOTERM\n\nPage 1 of 1\n\n10-Oct-19 \nSGARSCDISIN\nSSISA017170\n09-Nov-19 \n30 days from Inv. Date\nDAP - Delivered At Place\nC06273660\n\n**Try the ABC e-Booking tool available for you through the IRIS portal** CONSOL NUMBER\n\n********************************************************************************************************************\n\nSHIPMENT DETAILS\nSHIPPER\nKUSU PTE LTD C/O ABC PTE LTD\nREFERENCE\n2500440249, 227364\nGOODS DESCRIPTION\nCABLE, 2M C14-C13 WHITE/   CABLE, 2M C14-C13 BLACK/   ROUTER, PTX1000 SYSTEM WITH 72-PORT 40G QSFP+ / 24-PORT 100G \nQSFP28 / 288-PORT 10G SFP+ WITH 4 1600W AC POWER SUPPLIES AND 3 FAN TRAYS/   INVOICE NO: 227364\nIMPORT CUSTOMS BROKER\n\nCONSIGNEE\nXYZ HONG KONG LIMITED\n\nIssued By John John@ABC.com\n\nWEIGHT\n90.000 KG\n\nVOLUME\n0.834 M3\nMAWB\n61872336191\n\nCHARGEABLE\n139.000 KG\n\nPACKAGES\n1 PLT\n\nHAWB\nSISA19017170\n\nETD 20-Sep-19\nETD 21-09-19 14:30\n\nGOODS DELIVERED TO\n\nDESTINATION\n\nETA 25-09-19 10:00\nHKHKG = Hong Kong, Hong Kong ETA 21-09-19 18:40\n\nFLIGHT / DATE\n / SQ865 / \n\nGOODS COLLECTED FROM SGSIN = Singapore , Singapore\nSGSIN = Singapore, Singapore\nORIGIN\nCHARGES\n\nDESCRIPTION\n\nFreight charge - 139 KG @ USD 0.70/KG \n\nWarehouse Handling - Gateway Fee \n\nHandling - Origin Handling \n\nDelivery Cartage \n\nDocumentation fee \n\nGST IN USD\n\nCHARGES IN USD\n\nZero Rated\n\nZero Rated\n\nZero Rated\n\nZero Rated\n\nZero Rated\n\n97.30\n\n50.00\n\n65.00\n\n75.00\n\n32.50\n\nTOTAL CHARGES\nPlease contact us within 7 days should there be any discrepancies.\n\nInterest rate of 1.5% per month will be charged on overdue invoices.\n\nNEW: Payment available via HSBC PayNow\n\nSUBTOTAL\nADD GST\n\nTOTAL USD\n\n319.80\n0.00\n\n319.80\n

我想提取某些表示标题名称和相应值的字段。

例如:

我想要(一些领域):

**INVOICE** ABC00518941
**CUSTOMER ID**  SGARSCDISIN
**SHIPMENT** SSISA017170
**WEIGHT** 90.000
**VOLUME** 0.834
**CHAREGABLE** 139.00 KG
**PACKAGES** 1 PLT
**FREIGHT CHARGE** 97.30
**WAREHOUSE HANDLING- GATEWAY FEE** 50.00
**HANDLING- ORIGIN HANDLING** 65.00
**DELIVERY CHARGE** 75.00

其中第一个字段是字段的实际名称(星号用于加粗),值是该字段的对应值。只要我可以提取这样的对,获取列表列表或元组列表或字典就可以了。我可以稍后将它们格式化并存储为列和值。

谢谢

标签: pythonregex

解决方案


推荐阅读