正则表达式用于从python中的字符串中提取所有复杂的日期格式

我有以下字符串:


 dateEntries = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"

在这里,我想使用提取所有提到的日期regex。作为尝试,我写了以下内容regex:


import re


regEx = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'


re.findall(regEx, dateEntries)

我期望这可以工作,但它只返回日期的子集。


A = ['Mar 20, 2009',

 'March 20, 2009',

 'Mar. 20, 2009',

 'Mar 20 2009',

 '20 Mar 2009',

 '20 March 2009',

 '2 Mar. 2009',

 '20 March, 2009',

 'Mar 20th, 2009',

 'Mar 21st, 2009',

 'Mar 22nd, 2009',

 'Feb 2009',

 'Sep 2009',

 'Oct 2010']

我不明白为什么它不返回日期:


B=[04-20-2009; 04/20/09; 4/20/09; 4/3/09; 6/2008; 12/2009; 2009; 2010"]

我regEx通过扩展来创建了r'(?:\d{1,2}[-\s\/])?(?:\d{1,2}[-\/\s])?(?:\d{2,4})'B ,它对B集有效。但是regEx却无法产生A+B


任何人都可以帮助制作正则表达式以提取我提到的所有日期dateEntries吗?


注意:我只想使用正则表达式解决此问题。


慕斯709654
浏览 279回答 3
3回答

翻过高山走不出你

您只是?在该(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)组之后遗漏了一个,以将其标记为不必要。另外,我+在最后两个组后面添加了一个后缀,以确保正则表达式不会将“ 2009年3月20日”之类的日期拆分为两个不同的日期。完整代码:import reregEx = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)+(?:\d{2,4})+'dateEntries = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"result = re.findall(regEx, dateEntries)print(result)如果您的日期有前导空格,则结果也将有前导空格。如果继续使用日期字符串,则可以使用以下方法将其删除:.strip()

临摹微笑

您的正则表达式模式是完全不可读的。请使用简单的构建块来构建您的正则表达式模式。这将使代码更具可读性import reimport calendarfull_months = [month for month in calendar.month_name if month]short_months = [d[:3] for d in full_months]months = '|'.join(short_months + full_months)sep = r'[.,]?\s+'               # seperatorday = r'\d+'year = r'\d+'day_or_year = r'\d+(?:\w+)?'r = re.compile(rf'(?:{day}{sep})?(?:{months}){sep}{day_or_year}(?:{sep}{year})?')r.findall(dateEntries)# ['Mar 20, 2009', 'March 20, 2009', 'Mar. 20, 2009', 'Mar 20 2009', '20 Mar 2009', '20 March 2009', '2 Mar. 2009', '20 March, 2009', 'Mar 20th, 2009', 'Mar 21st, 2009', 'Mar 22nd, 2009', 'Feb 2009', 'Sep 2009', 'Oct 2010']

扬帆大鱼

试试正则表达式:^(?:\d{1,2}(?:(?:-|/)|(?:th|st|nd|rd)?\s))?(?:(?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)(?:(?:-|/)|(?:,|\.)?\s)?)?(?:\d{1,2}(?:(?:-|/)|(?:th|st|nd|rd)?\s))?)(?:\d{2,4})$
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python