猿问

扩展包含并列连词如 ( / , and, or, & )

我有一个包含并列连词的短语列表,如(和、或、/、&)。我想将它们中的每一个扩展到所有可能的单独短语。扩展包含连词的短语的最佳方法是什么?使用 NLP 库或 python 函数。喜欢" alphabet a/b/c can have color red/blue/green"。这可以扩展到九个短语[" alphabet a can have color red", "alphabet a can have color blue",... "alphabet b can have color blue",..."alphabet c can have color green"].


其他示例:


    ['bag of apples/oranges', 'case of citrus (lemon or limes)',

'chocolates/candy box' , 'bag of shoes & socks', 

'pear red/brown/green', 'match box and/or lighter',

 'milkshake (soy and almond) added ']

应该将其扩展为


    ['bag of apples','bag of oranges',

 'case of citrus lemon', 'case of citrus limes',

'chocolates box' , 'candy box' ,'bag of socks', 

'bag of shoes', 'pear red', 'pear brown',

'pear green', 'match box ', 'lighter',

'milkshake almond added', 'milkshake soy added']


慕侠2389804
浏览 228回答 1
1回答

慕的地6264312

总有蛮力方法可以解决这个问题。我正在寻找一些聪明的东西。def expand_by_conjuction(item):     def get_slash_index(item):                   for num , ele in enumerate(item):            if "/" in ele:                return num      items = [item]    while any([True for item in items for ele in item if "/" in ele]):        for item in items:            item_org = item            item = item.split()            if any([ True for ele in item if "/" in ele]):                sls_index = get_slash_index(item)                                       split_conjucted = item[sls_index].split("/")                for idx, part in enumerate(split_conjucted):                    n_item = []                    n_item += item[:sls_index]                    n_item.append(part)                    sls_p1 = sls_index +1                    if not sls_p1 > len(item):                        n_item += item[sls_p1:]                       n_item = " ".join(n_item)                    #print(n_item)                    items.append(n_item)                    if item_org in items:                        items.remove(item_org)    return itemsdef slashize_conjuctions(item):    slashize = [' or ', ' and ', ' and/or ', ' or/and ', ' & ']    for conj in slashize:        if conj in item:            item = item.replace(conj,"/")    return itemitems = ['bag of apples/oranges', 'case of citrus (lemon or limes)','chocolates/candy box' , 'bag of shoes & socks', 'pear red/brown/green', 'match box and/or lighter', 'milkshake (soy and almond) added ']new_items = []for string in items:    item = slashize_conjuctions(string)    lst = expand_by_conjuction(item)    lst = [ele.replace("(","").replace(")","") for ele in lst]    [new_items.append(ele) for ele in lst]    #print(f'String:{string} ITEM:{item} --> list{lst}')print(new_items)
随时随地看视频慕课网APP

相关分类

Python
我要回答