如何删除可变长度字符串的一部分

我有一个 DataFrame,其中一列是如下所示的字符串行:


Received value 126;AOC;H3498XX from 602

Received value 101;KYL;0IMMM0432 from 229

我想删除(或不替换)第二个分号之后的部分,使其看起来像


Received value 126;AOC; from 602

但是我想删除的这部分将具有不同且不可预测的长度(总是 AZ 和 0-9 的组合)。分号和 froms 将始终存在以供参考。


我试图通过研究这个链接来使用正则表达式:https : //docs.python.org/3/library/re.html


import re

for row in df[‘column’]:

    row = re.sub(‘;[A-Z0-9] from’ , ‘; from’, row)

我认为 [A-Z0-9] 未能结合我想要的不同长度方面。


胡说叔叔
浏览 154回答 2
2回答

HUH函数

使用str.replace()with的示例str.split():s = ['126;AOC;H3498XX from 602', '101;KYL;0IMMM0432 from 229']for elem in s:    print(elem.replace(elem.split(";",2)[-1].split()[0],''))输出:126;AOC; from 602101;KYL; from 229编辑:同样适用于以下示例:s = ['Received value 126;AOC;H3498XX from 602', 'Received value 101;KYL;0IMMM0432 from 229']for elem in s:    print(elem.replace(elem.split(";",2)[-1].split()[0],''))输出:Received value 126;AOC; from 602Received value 101;KYL; from 229

PIPIONE

使用模式 (Received value \d+;[A-Z]+;)\w+(\s.*?)前任:import res = ["Received value 126;AOC;H3498XX from 602", "Received value 101;KYL;0IMMM0432 from 229"]for i in s:    print( re.sub(r"(Received value \d+;[A-Z]+;)\w+(\s.*?)", r"\1", i) )输出:Received value 126;AOC;from 602Received value 101;KYL;from 229
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python