如何根据匹配的子字符串从一个列表创建多个列表?

我在 python 中有一个由各种文件名组成的字符串列表,如下所示(但更长):


all_templates = ['fitting_file_expdisk_cutout-IMG-HSC-I-18115-6,3-OBJ-NEP175857.9+655841.2.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-3,3-OBJ-NEP180508.6+655617.3.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-1,8-OBJ-NEP180840.8+665226.2.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,7-OBJ-NEP175927.6+664230.2.feedme', 'fitting_file_expdisk_cutout-IMG-HSC-I-18114-0,5-OBJ-zsel56238.feedme', 'fitting_file_devauc_cutout-IMG-HSC-I-18114-0,3-OBJ-NEP175616.1+660601.5.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,4-OBJ-zsel56238.feedme']


我想为具有相同对象名称(以 开头OBJ-和结尾的子字符串)的元素创建多个较小的列表.feedme。所以我有一个这样的列表:


obj1 = ['fitting_file_expdisk_cutout-IMG-HSC-I-18114-0,5-OBJ-zsel56238.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,4-OBJ-zsel56238.feedme'],


等等其他匹配的“对象”。实际上,我有 900 多个独特的“对象”,而原始列表all_templates有 4000 多个元素,因为每个对象都有 3 个或更多单独的模板文件(它们都以随机顺序出现)。所以最后我想要超过 900 个列表(每个对象一个)。我怎样才能做到这一点?


编辑:这是我尝试过的,但它为我提供了每个子列表中所有原始模板文件名的列表(对于一个对象名称,每个文件名都应该是唯一的)。


import re

# Break up list into multiple lists according to substring (object name)

obj_list = [re.search(r'.*(OBJ.+)\.feedme', filename)[1] for filename in all_template_files]

obj_list = list(set(obj_list)) # create list of unique objects (remove duplicates)


templates_objs_sorted = [[]]*len(obj_list)

for i in range(len(obj_list)):

    for template in all_template_files:

        if obj_list[i] in template:

            templates_objs_sorted[i].append(template)


饮歌长啸
浏览 219回答 3
3回答

胡说叔叔

from collections import defaultdictfrom pprint import pprintall_templates = ['fitting_file_expdisk_cutout-IMG-HSC-I-18115-6,3-OBJ-NEP175857.9+655841.2.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-3,3-OBJ-NEP180508.6+655617.3.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-1,8-OBJ-NEP180840.8+665226.2.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,7-OBJ-NEP175927.6+664230.2.feedme', 'fitting_file_expdisk_cutout-IMG-HSC-I-18114-0,5-OBJ-zsel56238.feedme', 'fitting_file_devauc_cutout-IMG-HSC-I-18114-0,3-OBJ-NEP175616.1+660601.5.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,4-OBJ-zsel56238.feedme']# simple helper function to extract the common object name# you could probably use Regex... but then you'd have 2 problemsdef objectName(path):&nbsp; &nbsp; start = path.index('-OBJ-')&nbsp; &nbsp; stop = path.index('.feedme')&nbsp; &nbsp; return path[(start + 5):stop]# I really wanted to use a one line reduce here, but...&nbsp;grouped = defaultdict(list)for each in all_templates:&nbsp; &nbsp; grouped[objectName(each)].append(each)pprint(grouped)侧面/切线好吧,我不能在reduce那里做一个简单的衬里,这真的让我感到烦恼。最后,希望python有一个好的groupby功能。它具有该名称的功能,但仅限于连续键。Smalltalk、Objc 和 Swift 都具有 groupby 机制,基本上允许您通过任意传递函数对可发音进行分桶。我最初的尝试看起来像:grouped = reduce(&nbsp; &nbsp; lambda accum, each: accum[objectName(each)].append(each),&nbsp; &nbsp; all_templates,&nbsp; &nbsp; defaultdict(list))问题是拉姆达。lambda 仅限于单个表达式。为了让它在 reduce 中工作,它最多返回累积参数的修改版本。但是除非必须,python 不喜欢从函数/方法返回东西。即使我们更换了append用<accessTheCurrentList> + [each],我们需要一本字典修饰方法更新在关键值,返回修改后的字典。我找不到这样的东西。但是,我们可以做的是将更多信息加载到我们的累加器中,例如元组。我们可以使用元组的一个槽来继续传递 defaultdict 指针,另一个来捕获修改操作的无用 None 返回。它最终非常丑陋,但它是一个单线:from functools import reducegrouped = reduce(&nbsp; &nbsp; lambda accum, each: (accum[0], accum[0][objectName(each)].append(each)),&nbsp; &nbsp; all_templates,&nbsp; &nbsp; (defaultdict(list), None))[0]

慕神8447489

您可以对排序列表进行分组:from itertools import groupbyimport reall_templates = ['fitting_file_expdisk_cutout-IMG-HSC-I-18115-6,3-OBJ-NEP175857.9+655841.2.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-3,3-OBJ-NEP180508.6+655617.3.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-1,8-OBJ-NEP180840.8+665226.2.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,7-OBJ-NEP175927.6+664230.2.feedme', 'fitting_file_expdisk_cutout-IMG-HSC-I-18114-0,5-OBJ-zsel56238.feedme', 'fitting_file_devauc_cutout-IMG-HSC-I-18114-0,3-OBJ-NEP175616.1+660601.5.feedme', 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,4-OBJ-zsel56238.feedme']pattern = re.compile(r'OBJ-.*?\.feedme$')objs = {name: pattern.search(name)[0] for name in all_templates}result = [list(g) for k, g in groupby(sorted(all_templates, key=objs.get), key=objs.get)]print(result)输出:[['fitting_file_devauc_cutout-IMG-HSC-I-18114-0,3-OBJ-NEP175616.1+660601.5.feedme'],&nbsp;['fitting_file_expdisk_cutout-IMG-HSC-I-18115-6,3-OBJ-NEP175857.9+655841.2.feedme'],&nbsp;['fitting_file_sersic_cutout-IMG-HSC-I-18115-6,7-OBJ-NEP175927.6+664230.2.feedme'],&nbsp;['fitting_file_sersic_cutout-IMG-HSC-I-18115-3,3-OBJ-NEP180508.6+655617.3.feedme'],&nbsp;['fitting_file_sersic_cutout-IMG-HSC-I-18115-1,8-OBJ-NEP180840.8+665226.2.feedme'],&nbsp;['fitting_file_expdisk_cutout-IMG-HSC-I-18114-0,5-OBJ-zsel56238.feedme',&nbsp; 'fitting_file_sersic_cutout-IMG-HSC-I-18115-6,4-OBJ-zsel56238.feedme']]
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python