MMTTMM
使用熊猫主要区别在于,pandas 已将所有数据转换为正确的dtype,(例如datetime, int, 和float),并且代码更简洁。此外,数据现在采用了一种有用的格式来执行时间序列分析和绘图,但我建议添加列名称。df.columns = ['datetime', ..., 'price']这可以通过 1 行矢量化操作来完成。如timeit测试所示,对于 1M 行数据,使用 pandas 比使用 读取文件with open和str查找:00.读取文件并pandas.read_csv解析第 0 列中的日期。使用header=None,因为测试数据中没有提供标题使用布尔索引选择秒为 0 的日期使用.dt访问器获取.second.import pandas as pd# read the file which apparently has no header and parse the date columndf = pd.read_csv('test.csv', header=None, parse_dates=[0])# using Boolean indexing to select data when seconds = 00top_of_the_minute = df[df[0].dt.second == 0]# save the datatop_of_the_minute.to_csv('clean.csv', header=False, index=False)# display(top_of_the_minute) 0 1 2 3 4 5 6 7 85 2020-08-03 22:17:00 0 0 4803 4800 91 28.05 24.05 58.89176 2020-08-03 22:17:00 0 0 4802 4800 91 28.05 24.05 58.89257 2020-08-03 22:17:00 0 0 4805 4800 91 28.05 24.05 58.93418 2020-08-03 22:17:00 0 0 4802 4800 91 28.05 24.05 58.96839 2020-08-03 22:17:00 0 0 4802 4800 91 28.05 23.05 58.9780# example: rename columnstop_of_the_minute.columns = ['datetime', 'v1', 'v2', 'v3', 'v4', 'v5', 'p1', 'p2', 'p3']# example: plot the datap = top_of_the_minute.plot('datetime', 'p3')p.legend(bbox_to_anchor=(1.05, 1), loc='upper left')p.set_xlim('2020-08', '2020-09')test.csv2020-08-03 22:17:12,0,0,4803,4800,91,28.05,24.05,58.89172020-08-03 22:17:13,0,0,4802,4800,91,28.05,24.05,58.89252020-08-03 22:17:14,0,0,4805,4800,91,28.05,24.05,58.93412020-08-03 22:17:15,0,0,4802,4800,91,28.05,24.05,58.96832020-08-03 22:17:18,0,0,4802,4800,91,28.05,23.05,58.9782020-08-03 22:17:00,0,0,4803,4800,91,28.05,24.05,58.89172020-08-03 22:17:00,0,0,4802,4800,91,28.05,24.05,58.89252020-08-03 22:17:00,0,0,4805,4800,91,28.05,24.05,58.93412020-08-03 22:17:00,0,0,4802,4800,91,28.05,24.05,58.96832020-08-03 22:17:00,0,0,4802,4800,91,28.05,23.05,58.978%%timeit测试创建测试数据# read test.csvdf = pd.read_csv('test.csv', header=None, parse_dates=[0])# create a dataframe with 1M rows df = pd.concat([df] * 100000)# save the new test datadf.to_csv('test.csv', index=False, header=False)test_skdef test_sk(path: str): zero_entries = [] with open(path, "r") as file: for line in file: semi_index = line.index(',') if line[:semi_index].endswith(':00'): zero_entries.append(line) return zero_entries%%timeitresult_sk = test_sk('test.csv')[out]:668 ms ± 5.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)test_tmdef test_tm(path: str): df = pd.read_csv(path, header=None, parse_dates=[0]) return df[df[0].dt.second == 0]%%timeitresult_tm = test_tm('test.csv')[out]:774 ms ± 7.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
慕桂英4014372
试试这个finalmasterlist2 = []for i in range(len(altmasterlist)): if ":00" in altmasterlist[i][0]: finalmasterlist2.extend(altmasterlist[i])print("finalemasterlist_2")print(finalmasterlist2)输入:2020-08-03 22:17:12,0,0,4803,4800,91,28.05,24.05,58.8917 2020-08-03 22:17:13,0,0,4802,4800,91,28.05,24.05,58.8925 2020-08-03 22:17:00,0,0,4805,4800,91,28.05,24.05,58.9341 2020-08-03 22:17:15,0,0,4802,4800,91,28.05,24.05,58.9683 2020-08-03 22:17:18,0,0,4802,4800,91,28.05,23.05,58.978 输出:['2020-08-03 22:17:00', '0', '0', '4805', '4800', '91', '28.05', '24.05', '58.9341']