将仅包含行的连续文本文件解析为熊猫数据框

我有一个包含重复行的文本文件,我想转换成一个数据框。


10/21/2019

abcdef

100.00

10/22/2019

ghijk

120.00

有一个明显的模式,我希望数据框看起来像这样:


Data       | Description | Amount

10/21/2019 | abcdef      | 100.00

10/22/2019 | ghijk       | 120.00

这是怎么做到的?


慕码人2483693
浏览 111回答 3
3回答

幕布斯7119047

一些正则表达式来提取细节,然后向前填充前两列并删除空值pattern = r"(?P<Date>\d{2}/\d{2}/\d{4})|(?P<Description>[a-z]+)|(?P<Amount>\d{1,}\.00)"res = (df1.text.str.extract(pattern)&nbsp; &nbsp; &nbsp; &nbsp;.assign(Date = lambda x: x.Date.ffill(),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Description = lambda x: x.Description.ffill()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )&nbsp; &nbsp; &nbsp; &nbsp;.dropna(how='any')&nbsp; &nbsp; &nbsp; )res&nbsp; &nbsp; &nbsp;Date&nbsp; &nbsp;Description Amount2&nbsp; &nbsp;10/21/2019&nbsp; abcdef&nbsp; 100.005&nbsp; &nbsp;10/22/2019&nbsp; ghijk&nbsp; &nbsp;120.00如果你不关心正则表达式,并且格式是不变的,那么我们可以用 numpy 重塑数据并创建一个新的数据框。#reshape the data#thanks to @Chester&nbsp;#removes unnecessary computationres = np.reshape(df1.to_numpy(),(-1,3))#create new dataframepd.DataFrame(res,columns=['Date','Description','Amount'])&nbsp; &nbsp; &nbsp; &nbsp;Date Description Amount0&nbsp; &nbsp;10/21/2019&nbsp; abcdef&nbsp; 100.001&nbsp; &nbsp;10/22/2019&nbsp; ghijk&nbsp; &nbsp;120.00

小怪兽爱吃肉

将原始数据从文件读取到 aSeries并转换为PandasArray以简化以后对索引的处理:raw_data&nbsp;=&nbsp;pd.read_csv("path\to\a\data\file.txt",&nbsp;names=['raw_data'],&nbsp;squeeze=True).array创建一个DataFrame使用切片:df&nbsp;=&nbsp;pd.DataFrame(data={'Data':&nbsp;raw_data[::3],&nbsp;'Description':&nbsp;raw_data[1::3],&nbsp;'Amount':&nbsp;raw_data[2::3]})只需 2 个简单的步骤,无需正则表达式和不必要的转换。简短高效。

12345678_0001

如果您的字符串具有您提到的确切模式,则可以使用以下代码string = '''10/21/2019abcdef100.0010/22/2019ghijk120.00'''token_list = string.split()Data = token_list[0::3]Description = token_list[1::3]Amount = token_list[2::3]Aggregate = list(zip(Data, Description, Amount))df = pd.DataFrame(Aggregate, columns = ['Data ', 'Description', 'Amount'])
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python