提取数字和单词之间的文本

我有一个包含以下内容的文件:


01009700  Samsung  Samsung SGH-N625  GSM 1900,GSM 900  

01009800  Motorola  Motorola T194 EOTD  GSM 1900  


01009900  Option International  

,GSM 900  

01009901  Option International  


,GSM 1900,GSM 900 01009902 Option International ,GSM 1900,GSM 900 01009903 Option International ,GSM 1900,GSM 900 01009904 Option International ,GSM 1900,GSM 900 01009905 Option International ,GSM 1900,GSM 900 01009906 Option International ,GSM 1900,GSM 900 01009907 Option International ,GSM 1900,GSM 900 01009908 Option International ,GSM 1900,GSM 900 01009909 Option International ,GSM 1900,GSM 900 01009910 Option International ,GSM 1900,GSM 900 01009911 Option International ,GSM 1900,GSM 900 01009912 Option International ,GSM 1900,GSM 900 01009913 Option International ,GSM 1900,GSM 900 01009914 Option International ,GSM 1900,GSM 900 01009915 Option International ,GSM 1900,GSM 900 01009916 Option International ,GSM 1900,GSM 900 01009917 Option International ,GSM 1900,GSM 900 01009918 Option International ,GSM 1900,GSM 900 01009919 Option International ,GSM 1900,GSM 900 


01010000  Sierra Wireless Sierra Wireless Aircard 710  GSM 1900  

01010100  Sierra Wireless Sierra Wireless Aircard 750  GSM 1800,GSM 190  

0,GSM 900 

使用正则表达式,我试图从 8 位数字和第一次 GSM 出现之前提取任何内容,例如:


01009700  Samsung  Samsung SGH-N625

01009800  Motorola  Motorola T194 EOTD

01009900  Option International

01009902  Option International

01009919  Option International

01010000  Sierra Wireless Sierra Wireless Aircard

01010100  Sierra Wireless Sierra Wireless Aircard

我试过了,\d{8}.+(GSM)?但似乎不起作用。


什么是正确的正则表达式?


森栏
浏览 149回答 1
1回答

暮色呼如

您可以使用re.findall(r'\b(\d{8}.*?)\W*GSM', s)查看正则表达式演示细节\b - 字边界((\d{8}.*?) - 第 1 组:八位数字,然后是除换行符以外的任何 0+ 字符,尽可能少\W* - 任何 0+ 个非单词字符GSM- 一个GSM子串。Python 演示:import res="""01009700  Samsung  Samsung SGH-N625  GSM 1900,GSM 900  01009800  Motorola  Motorola T194 EOTD  GSM 1900  01009900  Option International  ,GSM 900  01009901  Option International  ,GSM 1900,GSM 900 01009902 Option International ,GSM 1900,GSM 900 01009903 Option International ,GSM 1900,GSM 900 01009904 Option International ,GSM 1900,GSM 900 01009905 Option International ,GSM 1900,GSM 900 01009906 Option International ,GSM 1900,GSM 900 01009907 Option International ,GSM 1900,GSM 900 01009908 Option International ,GSM 1900,GSM 900 01009909 Option International ,GSM 1900,GSM 900 01009910 Option International ,GSM 1900,GSM 900 01009911 Option International ,GSM 1900,GSM 900 01009912 Option International ,GSM 1900,GSM 900 01009913 Option International ,GSM 1900,GSM 900 01009914 Option International ,GSM 1900,GSM 900 01009915 Option International ,GSM 1900,GSM 900 01009916 Option International ,GSM 1900,GSM 900 01009917 Option International ,GSM 1900,GSM 900 01009918 Option International ,GSM 1900,GSM 900 01009919 Option International ,GSM 1900,GSM 900 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 01010000  Sierra Wireless Sierra Wireless Aircard 710  GSM 1900  01010100  Sierra Wireless Sierra Wireless Aircard 750  GSM 1800,GSM 190  0,GSM 900 """print(re.findall(r"\b(\d{8}.*?)\W*GSM", s))输出:['01009700  Samsung  Samsung SGH-N625', '01009800  Motorola  Motorola T194 EOTD', '01009900  Option International', '01009901  Option International', '01009902 Option International', '01009903 Option International', '01009904 Option International', '01009905 Option International', '01009906 Option International', '01009907 Option International', '01009908 Option International', '01009909 Option International', '01009910 Option International', '01009911 Option International', '01009912 Option International', '01009913 Option International', '01009914 Option International', '01009915 Option International', '01009916 Option International', '01009917 Option International', '01009918 Option International', '01009919 Option International', '01010000  Sierra Wireless Sierra Wireless Aircard 710', '01010100  Sierra Wireless Sierra Wireless Aircard 750']
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python