如何根据列表特定条件从列表创建数据框

我有以下列表:


['1',

 'William Dunn Moseley',

 'June 25, 1845–October 1, 1849(term limited)',

 'Democratic',

 '1845',

 'Office did not exist',

 '2',

 'Thomas Brown',

 'October 1, 1849–October 3, 1853(term limited)',

 'Whig',

 '1849',

 '3',

 'James E. Broome',

 'October 3, 1853–October 5, 1857(term limited)',

 'Democratic',

 '1853',

]

列表中的每个数字对应于我要生成的数据集中的一行。因此,从这个列表中,id 想生成一个如下所示的数据集:


Number         Name                            Term                              Party       Election       Office

1      'William Dunn Moseley' 'June 25, 1845–October 1, 1849(term limited)'    Democratic     1845    'Office did not exist'

2          'Thomas Brown'     'October 1, 1849–October 3, 1853(term limited)'    'Whig'      '1849'    NA

3         'James E. Broome'   'October 3, 1853–October 5, 1857(term limited)'  'Democratic'  '1853'    NA


有没有一种简单的方法可以根据列表中的某些值(例如行号)将列表翻转为数据框?


您能提供的任何帮助将不胜感激!非常感谢。


森栏
浏览 152回答 3
3回答

qq_笑_17

由于您的数据不规则,因此很难以 100% 的精度做到这一点,但这里有一些东西。import numpy as npimport pandas as pdnumber_of_presidents = 3presidents = np.array(['1', 'William Dunn Moseley', 'June 25, 1845–October 1, 1849(term limited)', 'Democratic', '1845',  'Office did not exist',  '2', 'Thomas Brown', 'October 1, 1849–October 3, 1853(term limited)', 'Whig', '1849', '3', 'James E. Broome', 'October 3, 1853–October 5, 1857(term limited)', 'Democratic', '1853'])indexes = []for i in range(1, number_of_presidents + 1):    indexes.append(np.where(presidents == str(i))[0][0])df = pd.DataFrame(np.split(presidents, indexes)[1:]).iloc[:, 1:]print(df)    1  ...                     50 William Dunn Moseley ... 办公室不存在1 托马斯·布朗 ... 无2 詹姆斯·E·布鲁姆 ... 无[3 行 x 5 列]

泛舟湖上清波郎朗

将您的列表存储在L您可以执行以下操作:首先,更正清单。检查每 6 个元素是否是数字,如果是则插入一个空字符串元素。如果在此循环之后列表的长度是 6 的倍数,则它已经完成,否则附加另一个空字符串:i = 5while i < len(L):&nbsp; &nbsp; if L[i].isdecimal():&nbsp; &nbsp; &nbsp; &nbsp; L.insert(i, '')&nbsp; &nbsp; i += 6if len(L)%6 != 0:&nbsp; &nbsp; L.append('')使用这个常规列表,数据框的创建很容易,只需将列表转换为 2D 即子列表列表并添加列名:import pandas as pdvalues = [L[i:i+6] for i in range(0, len(L), 6)]col = ['Number', 'Name', 'Term', 'Party', 'Election', 'Office']df = pd.DataFrame(values, columns=col)#&nbsp; &nbsp;Number&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Name&nbsp; ... Election&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Office# 0&nbsp; &nbsp; &nbsp; 1&nbsp; William Dunn Moseley&nbsp; ...&nbsp; &nbsp; &nbsp;1845&nbsp; Office did not exist&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;# 1&nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Thomas Brown&nbsp; ...&nbsp; &nbsp; &nbsp;1849&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;# 2&nbsp; &nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp; &nbsp;James E. Broome&nbsp; ...&nbsp; &nbsp; &nbsp;1853&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

人到中年有点甜

如果最后两位总统没有“办公室不存在”也没关系。而且你不需要知道有多少位总统。;D当您遇到索引时,您可以简单地循环并将它们分成几行temp = []output = []idx = 0for row in a:&nbsp; &nbsp; if row.isnumeric() and int(row) == idx+1:&nbsp; &nbsp; &nbsp; &nbsp; output.append(temp)&nbsp; &nbsp; &nbsp; &nbsp; temp = []&nbsp; &nbsp; &nbsp; &nbsp; idx += 1&nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; temp.append(row)output.append(temp)df = pandas.DataFrame(output[1:], columns=column_names)这会给你你想要的。但是您必须标记列名。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python