我有一个百万行的时间序列数据框,其中 Date 列中的某些值具有混乱的日/月值。
我如何有效地理清它们而又不破坏那些正确的东西?
# this creates a dataframe with muddled dates
import pandas as pd
import numpy as np
from pandas import Timestamp
start = Timestamp(2013,1,1)
dates = pd.date_range(start, periods=942)[::-1]
muddler = {}
for d in dates:
if d.day < 13:
muddler[d] = Timestamp(d.year, d.day, d.month)
else:
muddler[d] = Timestamp(d.year, d.month, d.day)
df = pd.DataFrame()
df['Date'] = dates
df['Date'] = df['Date'].map(muddler)
# now what? (assuming I don't know how the dates are muddled)
潇湘沐
小唯快跑啊
翻翻过去那场雪
随时随地看视频慕课网APP
相关分类