将列表中的项目重新格式化为不同的类型和子列表

UNCLEANED = [

['1 -  32/', 'Highway', '403', '43.167233',

 '-80.275567', '1965', '2014', '2009', '4',

 'Total=64  (1)=12;(2)=19;(3)=21;(4)=12;', '65', '04/13/2012', '72.3', '',

 '72.3', '', '69.5', '', '70', '', '70.3', '', '70.5', '', '70.7', '72.9',

 ''],

['1 -  43/', 'WEST', '403', '43.164531', '-80.251582',

 '1963', '2014', '2007', '4',

 'Total=60.4  (1)=12.2;(2)=18;(3)=18;(4)=12.2;', '61', '04/13/2012',

 '71.5', '', '71.5', '', '68.1', '', '69', '', '69.4', '', '69.4', '',

 '70.3', '73.3', ''],

['2 -   4/', 'STOKES', '6', '45.036739', '-81.33579', '1958',

 '2013', '', '1', 'Total=16  (1)=16;', '18.4', '08/28/2013', '85.1',

 '85.1', '', '67.8', '', '67.4', '', '69.2', '70', '70.5', '', '75.1', '',

 '90.1', '']

]

上面是一个包含三个子列表的列表的未清理版本......我需要将它转换成一个更清晰的版本,可能看起来像这样:


CLEANED = [[1, 'Highway', '403', 43.167233,

              -80.275567, '1965', '2014', '2009', 4,

              [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012',

              [72.3, 69.5, 70.0, 70.3, 70.5, 70.7, 72.9]],

             [2, 'WEST', '403', 43.164531, -80.251582,

              '1963', '2014', '2007', 4, [12.2, 18.0, 18.0, 12.2], 61.0, 

              '04/13/2012', [71.5, 68.1, 69.0, 69.4, 69.4, 70.3,

                             73.3]],

             [3, 'STOKES', '6', 45.036739, -81.33579, '1958',

              '2013', '', 1, [16.0], 18.4, '08/28/2013',

              [85.1, 67.8, 67.4, 69.2, 70.0, 70.5, 75.1, 90.1]]

            ]

我认为该模式用于未清理版本中的 index[0],我只保留第一个字符。index[1],[2]保持相同的,转index[3]和[4]转换成int .....


然后到达index[9],我必须忽略总数,只提取其余的数字,然后放入子列表中.....


最后一件事是将日期之后的数字放入子列表中,同时排除第一个数字。


我对如何不断循环直到它完成“清理” UNCLEANED 中的所有内容感到非常困惑?


如果UNCLEANED不只是这三个元素呢?如果它很长,我将如何遍历它?


非常感谢您的帮助


阿晨1998
浏览 115回答 3
3回答

慕斯王

这是进行上述转换的解决方案。这是一个简单的for循环:UNCLEANED = [['1 -  32/', 'Highway', '403', '43.167233', '-80.275567', '1965', '2014', '2009', '4', 'Total=64  (1)=12;(2)=19;(3)=21;(4)=12;', '65', '04/13/2012', '72.3', '', '72.3', '', '69.5', '', '70', '', '70.3', '', '70.5', '', '70.7', '72.9', ''],['1 -  43/', 'WEST', '403', '43.164531', '-80.251582', '1963', '2014', '2007', '4', 'Total=60.4  (1)=12.2;(2)=18;(3)=18;(4)=12.2;', '61', '04/13/2012', '71.5', '', '71.5', '', '68.1', '', '69', '', '69.4', '', '69.4', '', '70.3', '73.3', ''],['2 -   4/', 'STOKES', '6', '45.036739', '-81.33579', '1958', '2013', '', '1', 'Total=16  (1)=16;', '18.4', '08/28/2013', '85.1', '85.1', '', '67.8', '', '67.4', '', '69.2', '70', '70.5', '', '75.1', '', '90.1', '']]# Function that performs the conversion described above.def cleanElement(elem):    elem[0] = elem[0].split(' - ')[0]    elem[3] = float(elem[3])    elem[4] = float(elem[4])    elem[8] = int(elem[8])    tempList = elem[9].split('  ')[1].split(';')    tempList = [float(i.split('=')[1]) for i in tempList if not i=='']    elem[9] = tempList    elem[10] = float(elem[10])    elem[13] = [float(i) for i in elem[13:] if not i=='']    elem.pop(12)    return elem[:13]# Function that loops in the uncleaned list and performs the conversion for each element.def cleanList(uncleaned):    return [cleanElement(elem) for elem in uncleaned]cleaned = cleanList(UNCLEANED)for i in cleaned:    print(i)输出:['1', 'Highway', '403', 43.167233, -80.275567, '1965', '2014', '2009', 4, [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012', [72.3, 69.5, 70.0, 70.3, 70.5, 70.7, 72.9]]['1', 'WEST', '403', 43.164531, -80.251582, '1963', '2014', '2007', 4, [12.2, 18.0, 18.0, 12.2], 61.0, '04/13/2012', [71.5, 68.1, 69.0, 69.4, 69.4, 70.3, 73.3]]['2', 'STOKES', '6', 45.036739, -81.33579, '1958', '2013', '', 1, [16.0], 18.4, '08/28/2013', [85.1, 67.8, 67.4, 69.2, 70.0, 70.5, 75.1, 90.1]]

三国纷争

这是使用函数集合清理列表列表的另一种方法。棘手的部分是对列表的最后一部分进行切片,其中必须将交替字符串收集到数组中并过滤空字符串。我假设每个子数组尾部的前 3 项中的非空字符串值是所需的值。arrange处理按返回一致值的顺序放置前 3 个项目。恕我直言,这种方式的优点是,如果您想对任何特定项目做任何不同的事情,更改代码会更容易。import itertools as itdef get_first_char_int(item):    first_char, *_ = item    return int(first_char)def identity(item):    return itemdef get_floats(item):    tokens = ''.join(item.split(' ')[2:]).split('=')[1:]    return [float(token.split(';')[0]) for token in tokens]def get_float(item):    return float(item) if item else itemUNCLEANED = [    ['1 -  32/', 'Highway', '403', '43.167233',     '-80.275567', '1965', '2014', '2009', '4',     'Total=64  (1)=12;(2)=19;(3)=21;(4)=12;', '65', '04/13/2012', '72.3', '',     '72.3', '', '69.5', '', '70', '', '70.3', '', '70.5', '', '70.7', '72.9',     ''],    ['1 -  43/', 'WEST', '403', '43.164531', '-80.251582',     '1963', '2014', '2007', '4',     'Total=60.4  (1)=12.2;(2)=18;(3)=18;(4)=12.2;', '61', '04/13/2012',     '71.5', '', '71.5', '', '68.1', '', '69', '', '69.4', '', '69.4', '',     '70.3', '73.3', ''],    ['2 -   4/', 'STOKES', '6', '45.036739', '-81.33579', '1958',     '2013', '', '1', 'Total=16  (1)=16;', '18.4', '08/28/2013', '85.1',     '85.1', '', '67.8', '', '67.4', '', '69.2', '70', '70.5', '', '75.1', '',     '90.1', ''],]functions = [ # 1:1 mapping of functions to items in each list in UNCLEANED.    get_first_char_int,    identity,    identity,    float,    float,    identity,    identity,    identity,    int,    get_floats,    float,    identity,]end = len(functions)item_length, = {len(items) for items in UNCLEANED}# Calculate argument to pass to it.isliceextra_count = item_length - end# Extend functions by extra_count times with get_floatfunctions.extend(list(it.repeat(get_float, extra_count)))## Handle items up to start of alternating strings and empty strings.head_results = (    [f(item)     for f, item     in zip(functions[0:end], collection[0:end])]    for collection in UNCLEANED)def arrange(items):    """Handle varying order of first 3 items of items."""    item, *_ = items    items[0:3] = [item, '', item]    return items## Apply arrange to the tail of each sublistcollection_ = it.chain.from_iterable(arrange(collection[end:])                                     for collection in UNCLEANED)## Handle items starting with alternating strings and empty strings.tail_results = (    [f(item)     for f, item     in it.islice(zip(functions[end:], collection_), 2, item_length)]    for collection in UNCLEANED)results = [[head, [item for item in tail if item]]            for head, tail in zip(head_results, tail_results)]for item in results:    print(item)输出:[[1, 'Highway', '403', 43.167233, -80.275567, '1965', '2014', '2009', 4, [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012'], [72.3, 69.5, 70.0, 70.3, 70.5, 70.7, 72.9]][[1, 'WEST', '403', 43.164531, -80.251582, '1963', '2014', '2007', 4, [12.2, 18.0, 18.0, 12.2], 61.0, '04/13/2012'], [71.5, 68.1, 69.0, 69.4, 69.4, 70.3, 73.3]][[2, 'STOKES', '6', 45.036739, -81.33579, '1958', '2013', '', 1, [16.0], 18.4, '08/28/2013'], [85.1, 67.8, 67.4, 69.2, 70.0, 70.5, 75.1, 90.1]]

呼唤远方

创建一个 clean_row(row) 函数,然后所有的“清理规则”都应该从这里调用。那你就可以了CLEANED = [clean_row(uncleaned) for uncleaned in UNCLEANED]。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python