python - Pandas - 用某种模式替换值
问题描述
在我的数据框中:。
df = pd.DataFrame(zip(datetimes, from_, message), columns=['timestamp', 'sender', 'message'])
df['timestamp'] = pd.to_datetime(df.timestamp, format='%d/%m/%Y, %I:%M %p')
有一些有问题的值,由清晰的模式定义:
timestamp sender message
113381 2020-06-04 11:59:24 Jose bom te ver feliz\r\n
113382 2020-06-04 11:59:29 Jose ❤\r\n
113383 2020-06-04 11:59:40 Maria Estar bem com você me faz feliz\r\n
113384 2020-06-04 12:00:57 Maria Estava falando com uma amiga de infância aque...
113385 2020-06-04 12:01:14 Maria Ela teve uma briga feia com o marido\r\n
113386 2020-06-04 12:01:24 Maria: <attached 00113509-PHOTO-2020-06-04-12-01-25.jpg>\r\n
113387 2020-06-04 12:02:54 Maria e assim leva-se a vida, um\n
113388 2020-06-04 12:03:21 Maria Pelo menos ela riu isso ajuda\r\n
113389 2020-06-04 13:06:39 Jose: <attached 00113512-PHOTO-2020-06-04-13-06-40.jpg>\r\n
名称总是会有所不同,很可能是:
John
John: <attached
Mary
Mary: <attached
但: <attached
会一直在。
如何执行字符串替换以纠正该问题,与 string 无关,最终结果为:
timestamp sender message
113381 2020-06-04 11:59:24 Jose bom te ver feliz\r\n
113382 2020-06-04 11:59:29 Jose ❤\r\n
113383 2020-06-04 11:59:40 Maria Estar bem com você me faz feliz\r\n
113384 2020-06-04 12:00:57 Maria Estava falando com uma amiga de infância aque...
113385 2020-06-04 12:01:14 Maria Ela teve uma briga feia com o marido\r\n
113386 2020-06-04 12:01:24 Maria 00113509-PHOTO-2020-06-04-12-01-25.jpg>\r\n
113387 2020-06-04 12:02:54 Maria e assim leva-se a vida, um\n
113388 2020-06-04 12:03:21 Maria Pelo menos ela riu isso ajuda\r\n
113389 2020-06-04 13:06:39 Jose 00113512-PHOTO-2020-06-04-13-06-40.jpg>\r\n
解决方案
数据
df = pd.DataFrame({'sender': ['Jose','Jose','Maria','Maria','Maria','Maria: <attached','Maria','Maria','Jose: <attached']})
解决方案
df.sender = df.sender.str.split(': <attached').str[0]
sender
0 Jose
1 Jose
2 Maria
3 Maria
4 Maria
5 Maria
6 Maria
7 Maria
8 Jose
推荐阅读
- python - 如何识别目录中的文件
- javascript - 如何将一个变量与对象数组中的另一个变量进行比较?
- c - scanf“最大字段宽度”包括空格?
- node.js - 将猫鼬模式存储在数据库中
- java - 为什么这个十六进制值得到不同的十进制值?
- mongodb - Mongodb Lookup 无法正常工作
- ios - 如何使 iOS UIPicker 与多个列和标题进行本机反应?
- postgresql - 是否可以在不破坏现有数据库的情况下重新安装 Postresql?
- qt - 在 VNC 上运行 Qt GUI 应用程序会导致分段错误并显示错误消息
- multithreading - 如何解决管理员 Weblogic 11 OSB 中的卡住线程?