首页 > 解决方案 > Extract date from a string with a lot of numbers

问题描述

There seems to be quite a few ways to extract datetimes in various formats from a string. But there seems to be an issue when the string contains many numbers and symbols.

Here is an example:

t = 'Annual Transmission Revenue Requirements and Rates Transmission Owner (Transmission Zone) Annual Transmission Revenue Requirement Network Integration Transmission Service Rate    ($/MW-Year) AE (AECO) $136,632,319 $53,775 AEP (AEP)   $1,295,660,732   $59,818.14  AP (APS) $128,000,000 $17,895 ATSI (ATSI) $659,094,666 $54,689.39 BC (BGE) $230,595,535 $35,762 ComEd, Rochelle (CE)  $702,431,433  $34,515.60 Dayton (DAY) $40,100,000 $13,295.76 Duke (DEOK) $121,250,903 $24,077 Duquesne (DLCO) $139,341,808 $51,954.44 Dominion (DOM) $1,031,382,000 $52,457.21 DPL, ODEC (DPL) $163,224,128 $42,812 East Kentucky Power Cooperative (EKPC) $83,267,903 $24,441 MAIT (METED, PENELEC) $150,858,703 $26,069.39 JCPL $135,000,000 $23,597.27  PE (PECO)  $155,439,100 $19,093 PPL, AECoop, UGI (PPL)  $435,349,329  $58,865 PEPCO, SMECO (PEPCO)   $190,876,083  $31,304.21 PS (PSEG) $1,248,819,352 $130,535.22 Rockland (RECO)  $17,724,263 $44,799 TrAILCo  $226,652,117.80  n/a  Effective June 1, 2018 '

import datefinder
m = datefinder.find_dates(t)
for match in m:
    print(match)

Is there a way to smoothly extract the date? I can resort to re for specific formats if no better way exists. From github of datefinder it seems that it was abandoned a year ago.

标签: pythonstringdatetimepython-datetimedatefinder

解决方案


虽然我不确切知道您的日期是如何格式化的,但这里有一个正则表达式解决方案,可以处理以“/”分隔的日期。应使用月份和日期表示为单个数字或包含前导零的日期。

如果您的日期由连字符分隔,请将正则表达式的第 9 个和第 18 个字符替换为连字符而不是 /。(如果使用第二个打印语句,替换第 12 和第 31 个字符)

编辑:使用更好的正则表达式添加了第二个打印语句。这可能是更好的方法。

import re
mystring = r'joasidj9238nlsd93901/01/2021oijweo8939n'
print(re.findall('\d{1,2}\/\d{1,2}\/\d{2,4}', mystring)) # This would probably work in most cases
print(re.findall('[0-1]{0,2}\/[0-3]{0,1}\d{0,1}\/\d{2,4}', mystring)) # This one is probably a better solution. (More protection against weirdness.)

编辑#2:这是一种方法,用拼出的月份名称(完整或 3 个字符的缩写),然后是天,然后是逗号,然后是 2 或 4 位数的年份。

import re
mystring = r'Jan 1, 2020'
print(re.findall(r'(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Nov(?:ember)?|Dec(?:ember)?)\s+\d{1,2}\,\s+\d{2,4}',mystring))

   

推荐阅读