如何使用python pandas处理传入的实时数据

我将使用HDF5 / pytables如下：将数据尽可能长地保留为python列表。将结果追加到该列表。当它变大时：使用pandas io（和一个可附加的表）推送到HDF5 Store。清除列表。重复。实际上，我定义的函数为每个“键”使用一个列表，以便您可以在同一过程中将多个DataFrame存储到HDF5存储。我们定义一个函数，您需要在每一行中调用它d：CACHE = {}STORE = 'store.h5'   # Note: another option is to keep the actual file opendef process_row(d, key, max_len=5000, _cache=CACHE):    """    Append row d to the store 'key'.    When the number of items in the key's cache reaches max_len,    append the list of rows to the HDF5 store and clear the list.    """    # keep the rows for each key separate.    lst = _cache.setdefault(key, [])    if len(lst) >= max_len:        store_and_clear(lst, key)    lst.append(d)def store_and_clear(lst, key):    """    Convert key's cache list to a DataFrame and append that to HDF5.    """    df = pd.DataFrame(lst)    with pd.HDFStore(STORE) as store:        store.append(key, df)    lst.clear()注意：我们使用with语句在每次写入后自动关闭存储。它可以更快地保持开放，但即便如此我们建议您定期刷新（收盘刷新）。还要注意，使用collection deque而不是列表可能更易读，但是列表的性能在这里会稍好一些。要使用此功能，请致电：process_row({'time' :'2013-01-01 00:00:00', 'stock' : 'BLAH', 'high' : 4.0, 'low' : 3.0, 'open' : 2.0, 'close' : 1.0},            key="df")注意：“ df”是pytables存储中使用的存储键。作业完成后，请确保您store_and_clear剩余的缓存：for k, lst in CACHE.items():  # you can instead use .iteritems() in python 2    store_and_clear(lst, k)现在，您可以通过以下方式使用完整的DataFrame：with pd.HDFStore(STORE) as store:    df = store["df"]                    # other keys will be store[key]

如何使用python pandas处理传入的实时数据

3回答