dataframe - 如何将 HTML 格式的文本数据读取到 Pandas 数据框?
问题描述
我有以下示例数据。我需要提取<NetworkPayeeAddManager>
和</NetworkPayeeAddManager>
之间的所有字段<PayeeAddManager>
,</PayeeAddManager>
并将所有信息保存到 Pandas DataFrame。我希望输出数据帧具有从第一个“TenantId”到最后一个“AccountNumber”的列,其值例如为 13744、XX2222。最好的方法是什么?
OrderedDict([('_raw', '2021-11-08 08:58:23,832 [42] INFO FiservLog.stdlog - <NetworkPayeeAddManager><TenantId>13744</TenantId><UserId>999176993878</UserId><SourceMethodName>LogInfoSecure</SourceMethodName><SourceLineNumber>234</SourceLineNumber><Message>NetworkPayee was added successfully</Message><Timestamp>2021-11-08T13:58:23.831628Z</Timestamp><Exception /><AdditionalInformation><SessionId>F7E65ED4D8C74E6699C62F23ECF5D000200TWNQ9X1AA1754513234A6367FEE06</SessionId><Timestamp>11/8/2021 1:58:23 PM</Timestamp><CorrelationId>2461b5d9839a46739e9a3e918ca0681b-01</CorrelationId><PayeeName>Louisville fire brick</PayeeName><Address>{"Address1":"Po 9229","Address2":null,"City":"Louisville","State":"KY","Zip5":"40209","Zip4":null,"Zip2":null}</Address><PayeeType>UnManagedPayee</PayeeType><AccountNumber>XX2222</AccountNumber></AdditionalInformation></NetworkPayeeAddManager>')])
OrderedDict([('_raw', '2021-11-08 08:58:24,783 [105] INFO FiservLog.stdlog - <PayeeAddManager><TenantId>DI737</TenantId><UserId>344801483</UserId> <SourceMethodName>LogInfoSecure</SourceMethodName><SourceLineNumber>234</SourceLineNumber><Message>Payee was added successfully</Message><Timestamp>2021-11-08T13:58:24.7831103Z</Timestamp><Exception /><AdditionalInformation><SessionId>7FC6442718864CE4838E50B026C8D0A0000TWNXSV1721BE0D804F295706DD39E</SessionId><Timestamp>11/8/2021 1:58:24 PM</Timestamp><CorrelationId>ab33b59c-756e-4144-ad62-6f0afadbe8eb</CorrelationId><PayeeName>Gail Nezworski</PayeeName><Address>{"Address1":"2280 S 460 E","Address2":null,"City":"LaGrange","State":"IN","Zip5":"46761","Zip4":null,"Zip2":null}</Address><PayeeType>UnManagedPayee</PayeeType><AccountNumber>XXXXX1888</AccountNumber></AdditionalInformation></PayeeAddManager>')])
解决方案
推荐阅读
- ios - GoogleDataTransport/GDTCORLibrary/Public/GoogleDataTransport/GDTCORTransport.h' 文件未找到
- python - 按特定日期过滤日期框
- selenium - Chromedriver - 无头时不接受选定的语言环境
- azure - Azure 路由查询
- java - JPanel 未添加新组件
- networkx - 街道网络中的最短路径
- kubernetes - 遍历 helm 中的值列表时出错
- google-cloud-platform - 如何从谷歌云下载 Minecraft 世界文件?
- django - Django Rest Framework - 过滤同一字段上的多个值(OR)
- python - 调整数据框的大小