首页 > 解决方案 > 使用正则表达式如何删除字符串中的日期、时间?

问题描述

有一个带有值的df

      0                                             |    1
 Thanks $.728.98 in nyc on 2018-04-22:11:09:35      |   7812

 Rs.999.98 in shop 1872 mumbai on 2018-04-22        |   8574
 INR.999.98 in shop 1872 mumbai on 2018-04          |   79821
 Thanks $.4728.98 in nyc on 2018-04-22 sat 11:09:35 |   7818

使用正则表达式如何删除具有不同格式日期的字符串中的这些日期

输出应该是

标签: pythonregexpython-3.xpandasdataframe

解决方案


如果您的日期是一致的并且在最后一个“on”字之后,您可以尝试下面的代码来解析它:

import re
from datetime import datetime

from dateutil.parser import parse
import unittest

def parse_custom_string(mystr):
    return mystr.split(mystr.split(sep="on")[-1])[0][:-3]

def parse_date_custom_string(mystr):
    return parse(timestr=(mystr.split(mystr.split(sep="on")[-2])[1]), dayfirst=False,fuzzy_with_tokens=True)[0]

assert (parse_custom_string('Thanks $.728.98 in nyc on 2018-04-22:11:09:35')  == "Thanks $.728.98 in nyc" )
assert (type(parse_date_custom_string('Thanks $.728.98 in nyc on 2018-04-22:11:09:35')) == datetime)


assert (parse_custom_string('Rs.999.98 in shop 1872 mumbai on 2018-04-22')  == "Rs.999.98 in shop 1872 mumbai" )
assert (type(parse_date_custom_string('Rs.999.98 in shop 1872 mumbai on 2018-04-22')) == datetime)

assert (parse_custom_string('INR.999.98 in shop 1872 mumbai on 2018-04')  == "INR.999.98 in shop 1872 mumbai" )
assert (type(parse_date_custom_string('INR.999.98 in shop 1872 mumbai on 2018-04')) == datetime)

assert (parse_custom_string('Thanks $.4728.98 in nyc on 2018-04-22 sat 11:09:35')  == "Thanks $.4728.98 in nyc" )
assert (type(parse_date_custom_string('Thanks $.4728.98 in nyc on 2018-04-22 sat 11:09:35')) == datetime)

推荐阅读