python - Python Split on \t 'fooled' by text with )
问题描述
I have some python code that downloads an amazon report, makes a byte object, and parses it in to individual lines by looking for the \n. It mostly works good but a block of text seems to fool the line split. It is getting fooled by the text at the (120ml)
Code
report = report_api.get_report(report_id=ReportID)
report_as_dict = report.parsed # bytes object
pp.pprint(report_as_dict)
line_split=report_as_dict.split(b'\n')
for line in line_split[1:]:
pp.pprint(line)
Sample from 'report_as_dict'
b'elete\tpending-quantity\tfulfillment-channel\tmerchant-shipping-group\nMenic'
b'on Unique ab Multi-Purpose Solution + abc Case, ONE 8 fl oz (120ml) bot'
b'tle\t\t012312VTS55\t0P-avac2A-38\t19.99\t\t2019-03-19 13:43:38 PDT\t\ty\t'
b'1\t\t\t11\t\t\t\tB00E3GXZJA\t\t\t\t\t\tB00E3GXZJA\t\t\t\tAMAZON_NA\tMigrat'
b'ed Template\nRed Barn Naturals Cat Treats, 6 pack\t\t0'
Sample of the split - It mostly splits properly on the \n but has one extra split around the text that was (120ml). The ') bottle'
should be part of the line above
[b'Menion on Unique ab Multi-Purpose Solution + abc Case, ONE 8 fl oz (120ml'
b') bottle',
b'',
b'012312VTS55',
b'0P-avac2A-38',
解决方案
There's no actual extra split there. That's just pprint
doing something confusing.
See how there's no comma between ...(120ml'
and b') bottle'
? In Python source code, two bytestring literals with no other tokens between them get implicitly concatenated into a single bytestring. (This also happens with regular Unicode strings.) Try it for yourself:
>>> b'a' b'b'
b'ab'
pprint
has decided that the first bytestring in the split
output is too long to print on one line, so it splits it into two implicitly concatenated bytestrings. split
didn't produce an extra split.
推荐阅读
- dart - Flutter NestedScrollView/SliverAppBar - 不需要的底部边距
- java - 从 FirebaseRecyclerAdapter 获取文档 ID 并将值传递到新页面
- sql - 问:如何从 Oracle 中的 19 位数字中检索带时间(小时、分钟、秒)的日期
- java - 与 Firebase 数据库集成的 Spring 代码在 IDE 上运行良好,但在 Tomcat 上运行良好
- javascript - 同位素:根据宽度重新排序或打乱元素
- java - 如何在 Java 中使用 Selenium Webdriver 获取标签值
- wagtail - “‘根’值必须是整数。” 预览鹡鸰页时
- javascript - TestCafe 在移动 Chrome 模拟器上出现“无法读取未定义的属性‘pageX’”错误
- swift - Observe text change in NSTextView
- reactjs - 使用重定向传递状态