猿问

使用 Python 和正则表达式从字符串中提取子字符串

我在“页面”列中有一个包含很长字符串的熊猫数据框,我试图从中提取子字符串:


示例字符串: /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s& search_query=示例一&y=0&x=0


使用正则表达式,我很难确定如何提取两个&符号之间的字符串并删除较大字符串的所有其他字符部分。


到目前为止,我的代码如下所示:


import pandas as pd

import re


dataset = pd.read_excel(r'C:\Users\example.xlsx')

dataframe = pd.DataFrame(dataset)


dataframe['Page'] = format = re.search(r'&(.*)&',str(dataframe['Page']))


dataframe.to_excel(r'C\Users\output.xlsx)

上面的代码运行但不会向我的新电子表格输出任何内容。


杨__羊羊
浏览 333回答 3
3回答

呼唤远方

您可以使用 提取 URL 中的查询字符串urllib.parse.urlparse,然后使用 解析它urllib.parse.parse_qs:>>> from urllib.parse import urlparse, parse_qs>>> path = '/ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=0'>>> query_string = urlparse(path).query  >>> parse_qs(query){'search_query': ['example one'], 'y': ['0'], 'x': ['0']}编辑:query_string从Page列中的所有页面中提取:dataframe['Page'] = dataframe['Page'].apply(lambda page: parse_qs(urlparse(page).query)['search_query'][0])

狐的传说

你可以试试这个(?<=&).*?(?=&)解释(?<=&)- 积极的回顾。匹配&。(.*?)- 匹配除换行符以外的任何内容。(懒人方法)。(?=&)- 正向前瞻匹配&。

陪伴而非守候

快速高效的熊猫方法。示例数据:temp,page1,&nbsp; /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=02,&nbsp; /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=03,&nbsp; /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=0代码:df = example.data # from abovedf["query"] = df['page'].str.split("&", expand=True)[1].str.split("=", expand=True)[1]print(df)示例输出:&nbsp; &nbsp;temp&nbsp; \0&nbsp; 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;1&nbsp; 2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;2&nbsp; 3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; page&nbsp; \0&nbsp; &nbsp; /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=0&nbsp; &nbsp;1&nbsp; &nbsp; /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=0&nbsp; &nbsp;2&nbsp; &nbsp; /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=0&nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;query&nbsp;&nbsp;0&nbsp; example one&nbsp;&nbsp;1&nbsp; example one&nbsp;&nbsp;2&nbsp; example one&nbsp;&nbsp;如果您想根据 key=value 对标记您的列,那将是不同的提取后记。
随时随地看视频慕课网APP

相关分类

Python
我要回答