pandas - Pandas,使用 read_csv 导入类似 JSON 的文件
问题描述
我想将数据从 .txt 导入数据框。我无法使用经典的 pd.read_csv 导入它,而使用不同类型的 sep 会引发错误。我要导入的数据Cell_Phones_&_Accessories.txt.gz是一种格式。
product/productId: B000JVER7W
product/title: Mobile Action MA730 Handset Manager - Bluetooth Data Suite
product/price: unknown
review/userId: A1RXYH9ROBAKEZ
review/profileName: A. Igoe
review/helpfulness: 0/0
review/score: 1.0
review/time: 1233360000
review/summary: Don't buy!
review/text: First of all, the company took my money and sent me an email telling me the product was shipped. A week and a half later I received another email telling me that they are sorry, but they don't actually have any of these items, and if I received an email telling me it has shipped, it was a mistake.When I finally got my money back, I went through another company to buy the product and it won't work with my phone, even though it depicts that it will. I have sent numerous emails to the company - I can't actually find a phone number on their website - and I still have not gotten any kind of response. What kind of customer service is that? No one will help me with this problem. My advice - don't waste your money!
product/productId: B000JVER7W
product/title: Mobile Action MA730 Handset Manager - Bluetooth Data Suite
product/price: unknown
....
解决方案
您可以使用jen
for 分隔符,然后通过 first:
和分割pivot
:
df = pd.read_csv('Cell_Phones_&_Accessories.txt', sep='¥', names=['data'], engine='python')
df1 = df.pop('data').str.split(':', n=1, expand=True)
df1.columns = ['a','b']
df1 = df1.assign(c=(df1['a'] == 'product/productId').cumsum())
df1 = df1.pivot('c','a','b')
用于提高性能的Python 解决方案defaultdict
和构造函数:DataFrame
from collections import defaultdict
data = defaultdict(list)
with open("Cell_Phones_&_Accessories.txt") as f:
for line in f.readlines():
if len(line) > 1:
key, value = line.strip().split(':', 1)
data[key].append(value)
df = pd.DataFrame(data)
推荐阅读
- mysql - Grouping rows via two different columns in MYSQL
- scala - MLlib MatrixFactorizationModel RecommendationProducts(user, num) 还返回训练数据产品项
- javascript - Double rendering in React with asynchronous call in componentDidMount causing error
- html - Angular generated css breaks child combinator
- git - Git添加预提交钩子而不是暂存文件以进行提交
- asp.net-web-api2 - Swagger ui 无法在服务器中加载 json 规范文件
- javascript - 从另一个组件更改组件的属性并在 html 中以角度 2 呈现它
- python - 熊猫中的数据操作 - python
- android - android在usb附件模式下无法接收512bytes的数据
- c# - 为什么 AsyncLocal 不从 OWIN 中间件流向 WebForms 页面?