首页 > 解决方案 > 如何将 HTML 格式的文本数据读取到 Pandas 数据框?

问题描述

我有以下示例数据。我需要提取<NetworkPayeeAddManager></NetworkPayeeAddManager>之间的所有字段<PayeeAddManager></PayeeAddManager>并将所有信息保存到 Pandas DataFrame。我希望输出数据帧具有从第一个“TenantId”到最后一个“AccountNumber”的列,其值例如为 13744、XX2222。最好的方法是什么?

OrderedDict([('_raw', '2021-11-08 08:58:23,832 [42] INFO  FiservLog.stdlog - <NetworkPayeeAddManager><TenantId>13744</TenantId><UserId>999176993878</UserId><SourceMethodName>LogInfoSecure</SourceMethodName><SourceLineNumber>234</SourceLineNumber><Message>NetworkPayee was added successfully</Message><Timestamp>2021-11-08T13:58:23.831628Z</Timestamp><Exception /><AdditionalInformation><SessionId>F7E65ED4D8C74E6699C62F23ECF5D000200TWNQ9X1AA1754513234A6367FEE06</SessionId><Timestamp>11/8/2021 1:58:23 PM</Timestamp><CorrelationId>2461b5d9839a46739e9a3e918ca0681b-01</CorrelationId><PayeeName>Louisville fire brick</PayeeName><Address>{"Address1":"Po 9229","Address2":null,"City":"Louisville","State":"KY","Zip5":"40209","Zip4":null,"Zip2":null}</Address><PayeeType>UnManagedPayee</PayeeType><AccountNumber>XX2222</AccountNumber></AdditionalInformation></NetworkPayeeAddManager>')])
OrderedDict([('_raw', '2021-11-08 08:58:24,783 [105] INFO  FiservLog.stdlog - <PayeeAddManager><TenantId>DI737</TenantId><UserId>344801483</UserId> <SourceMethodName>LogInfoSecure</SourceMethodName><SourceLineNumber>234</SourceLineNumber><Message>Payee was added successfully</Message><Timestamp>2021-11-08T13:58:24.7831103Z</Timestamp><Exception /><AdditionalInformation><SessionId>7FC6442718864CE4838E50B026C8D0A0000TWNXSV1721BE0D804F295706DD39E</SessionId><Timestamp>11/8/2021 1:58:24 PM</Timestamp><CorrelationId>ab33b59c-756e-4144-ad62-6f0afadbe8eb</CorrelationId><PayeeName>Gail Nezworski</PayeeName><Address>{"Address1":"2280 S 460 E","Address2":null,"City":"LaGrange","State":"IN","Zip5":"46761","Zip4":null,"Zip2":null}</Address><PayeeType>UnManagedPayee</PayeeType><AccountNumber>XXXXX1888</AccountNumber></AdditionalInformation></PayeeAddManager>')])

标签: dataframe

解决方案


推荐阅读