首页 > 解决方案 > 如何从python中的给定txt文件中删除重复文本?

问题描述

我是 Python 新手。我有一个 txt 文件,我使用pd.read_csv('transactions.txt). 但它像这样超过 80,000 行(我在这里显示 2 行):

{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}

{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}

如您所见,行{}

请帮助我如何删除每一行中重复的变量名,并且还将变量名保留为列标题。

提前致谢。

标签: pythonpandas

解决方案


我打赌这不是 CSV。如果是这样,那么它的格式非常糟糕,因为每一行都只是一个包含巨大 JSON 有效负载的列。

假设这实际上是一个 JSON 文件,请改为使用

df = pandas.read_json('file.txt')

您将获得以字段名作为列的预期输出。

参考:https ://pandas.pydata.org/pandas-docs/version/0.24.2/reference/api/pandas.read_json.html


推荐阅读