我有一个熊猫数据框。它包含“start_date”列和“end_date”列。我想计算开始和结束之间的工作日,将它们分成几个月并将它们作为附加列添加到数据框中。
目前这是我编码的。无论如何我可以让它更快吗?
data = pd.DataFrame([
{'start_date': '2020-03-03', 'end_date' : '2020-06-18'},
{'start_date': '2020-06-03', 'end_date' : '2020-09-18'},
])
def days_month(df):
days_month = pd.DatetimeIndex(pd.bdate_range(df['start_date'], df['end_date'])).month.value_counts()
for i in range(1, 13):
try:
days_month[i]
except:
days_month[i] = 0
return days_month
data[['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']] = data.apply(days_month, axis=1, result_type="expand")
print(data)
end_date start_date Jan Feb Mar Apr May Jun Jul Aug Sep Oct \
0 2020-06-18 2020-03-03 0 0 21 22 21 14 0 0 0 0
1 2020-09-18 2020-06-03 0 0 0 0 0 20 23 21 14 0
Nov Dec
0 0 0
1 0 0
更新尝试实施 Ethan 建议:
def countWeekDays(df):
fromDate=df['PO Creation Date']
toDate=df['PO Expected Delivery Date']
d = np.arange(fromDate, toDate, dtype=np.datetime64)
weekdays = d[np.is_busday(d, busdaycal=calendar())]
workDays = {m: np.array([i for i in weekdays if i.item().month==m]).size for m in range(1,13)}
return workDays
def calendar():
#set work week mask and optional holidays array
return np.busdaycalendar(weekmask='1111100', holidays=['2020-01-01','2020-01-20','2020-02-17','2020-05-25','2020-07-03','2020-09-07','2020-10-12','2020-11-11','2020-11-26','2020-12-25'])
data[['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']] = data.apply(countWeekDays, axis=1, result_type="expand")
但是,它给了我一个内存错误。不知道为什么...
侃侃尔雅
相关分类