训练配置以及检测我在上一篇文章里已经写过了,这里主要写一下可能会遇到的一些问题及解决方案。
这里面要做的许多事情别人都已经是做过了的,我做一下总结而已。
1. 保存训练日志以及训练日志可视化。
1.1.保存训练日志。
训练日志就是训练的时候输出的那一大堆东西,要总结实验结果,可视化训练日志的关键参数是一个很有效的方式,所以我们来做这个东西。
YOLO的代码里是有保存训练日志的模块的,只需在训练的时候增加命令即可,最后面的参数则是保存的日志信息,保存了所有打印在终端里面的信息。
./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74 2>1 | tee train_yolov3.log
训练日志会保存在train_yolov3.log里面,日志名字和保存的位置自己定义就是,建议新建一个log文件夹来保存日志,我们后续建立的解析和可视化的代码也放在这个文件夹下面。
1.2. 解析训练日志。
先看下一个完整的batch会输出什么样的日志:
Loaded: 0.000026 secondsRegion 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461636, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495116, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.263516, Class: 0.436704, Obj: 0.495635, No Obj: 0.416553, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462496, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494657, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.268727, Class: 0.621400, Obj: 0.168319, No Obj: 0.416158, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462955, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.493331, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.207399, Class: 0.466373, Obj: 0.332663, No Obj: 0.417906, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462174, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.492877, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.194398, Class: 0.463323, Obj: 0.273619, No Obj: 0.424027, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: 0.311682, Class: 0.484914, Obj: 0.280431, No Obj: 0.460396, .5R: 0.000000, .75R: 0.000000, count: 1Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494058, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.243448, Class: 0.593246, Obj: 0.383726, No Obj: 0.421082, .5R: 0.000000, .75R: 0.000000, count: 1Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462786, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495880, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.176802, Class: 0.792245, Obj: 0.179281, No Obj: 0.416623, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.461665, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496222, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.243663, Class: 0.242757, Obj: 0.534085, No Obj: 0.417660, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.460757, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.492573, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.089885, Class: 0.332247, Obj: 0.467275, No Obj: 0.418509, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.458823, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: 0.262965, Class: 0.556296, Obj: 0.354470, No Obj: 0.494544, .5R: 0.000000, .75R: 0.000000, count: 2Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.420531, .5R: -nan, .75R: -nan, count: 0Region 82 Avg IOU: 0.125970, Class: 0.210711, Obj: 0.619777, No Obj: 0.458084, .5R: 0.000000, .75R: 0.000000, count: 1Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494639, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.183248, Class: 0.069611, Obj: 0.418807, No Obj: 0.415397, .5R: 0.000000, .75R: 0.000000, count: 1Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.460093, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495465, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.415323, Class: 0.665443, Obj: 0.395756, No Obj: 0.418683, .5R: 0.500000, .75R: 0.000000, count: 2Region 82 Avg IOU: 0.281197, Class: 0.619936, Obj: 0.135923, No Obj: 0.462208, .5R: 0.000000, .75R: 0.000000, count: 1Region 94 Avg IOU: 0.142975, Class: 0.232742, Obj: 0.140729, No Obj: 0.493011, .5R: 0.000000, .75R: 0.000000, count: 1Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.419366, .5R: -nan, .75R: -nan, count: 0Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.460545, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: 0.056978, Class: 0.266799, Obj: 0.624412, No Obj: 0.494370, .5R: 0.000000, .75R: 0.000000, count: 1Region 106 Avg IOU: 0.503086, Class: 0.510277, Obj: 0.578751, No Obj: 0.417475, .5R: 1.000000, .75R: 0.000000, count: 1Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.462025, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494696, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.081392, Class: 0.318970, Obj: 0.406135, No Obj: 0.415770, .5R: 0.000000, .75R: 0.000000, count: 2Region 82 Avg IOU: 0.474609, Class: 0.742155, Obj: 0.686746, No Obj: 0.462689, .5R: 0.000000, .75R: 0.000000, count: 1Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494386, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.045405, Class: 0.582467, Obj: 0.359547, No Obj: 0.417340, .5R: 0.000000, .75R: 0.000000, count: 1Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.459882, .5R: -nan, .75R: -nan, count: 0Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495080, .5R: -nan, .75R: -nan, count: 0Region 106 Avg IOU: 0.190005, Class: 0.338868, Obj: 0.491059, No Obj: 0.419754, .5R: 0.000000, .75R: 0.000000, count: 2 1: 1066.143311, 1066.143311 avg, 0.000000 rate, 4.240781 seconds, 32 images
规律还是很明显的,我们可以通过不同的关键词来把不同的行提取到不同的文件里,然后再做处理,包含IOU
关键字的就是IOU的信息,同时我们需要把包含nan
的信息丢掉。包含images
的行就是包含loss信息的行,保存下来就可以了。
#!/usr/bin/env python3# -*- coding: utf-8 -*-""" Created on Thu Nov 29 15:50:53 2018 @author: zhxing """#this code is to extract the yolov3 train log''' import inspect import os import random import sys '''def extract_log(log_file,new_log_file,key_word): f=open(log_file,'r') train_log=open(new_log_file,'w') for line in f: if 'Syncing' in line: #多gpu同步信息,我就一个GPU,这里是可以不要的。 continue if 'nan' in line: #包含nan的不要 continue if key_word in line: #包含关键字 train_log.write(line) f.close() train_log.close() extract_log('train_yolov3.log','DJI_yolov3_train_loss.txt','images') extract_log('train_yolov3.log','DJI_yolov3_train_iou.txt','IOU')
这样的话我们就得到两个txt文件,然后就可以提取需要的数据来进行可视化了。可视化的代码基本上也是参考的别人的:YOLOV3可视化,我写一些注释便于阅读和修改。
1.3. loss可视化。
我们来可视化平均loss。
#!/usr/bin/env python3# -*- coding: utf-8 -*-""" Created on Thu Nov 29 16:17:24 2018 @author: zhxing """import pandas as pdimport numpy as npimport matplotlib.pyplot as plt#%matplotlib inlinelines =16000 #rows to be drawresult = pd.read_csv('DJI_yolov3_train_loss.txt', skiprows=[x for x in range(lines) if ((x%10!=9) |(x<1000))] ,error_bad_lines=False, names=['loss', 'avg', 'rate', 'seconds', 'images']) result.head()#print(result)result['loss']=result['loss'].str.split(' ').str.get(1) result['avg']=result['avg'].str.split(' ').str.get(1) result['rate']=result['rate'].str.split(' ').str.get(1) result['seconds']=result['seconds'].str.split(' ').str.get(1) result['images']=result['images'].str.split(' ').str.get(1) result.head() result.tail()#print(result.head())# print(result.tail())# print(result.dtypes)''' print(result['loss']) print(result['avg']) print(result['rate']) print(result['seconds']) print(result['images']) '''result['loss']=pd.to_numeric(result['loss']) result['avg']=pd.to_numeric(result['avg']) result['rate']=pd.to_numeric(result['rate']) result['seconds']=pd.to_numeric(result['seconds']) result['images']=pd.to_numeric(result['images']) result.dtypes fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(result['avg'].values,label='avg_loss')#ax.plot(result['loss'].values,label='loss')ax.legend(loc='best') ax.set_title('The loss curves') ax.set_xlabel('batches*10') fig.savefig('avg_loss',dpi=600)#fig.savefig('loss')
数据没问题的话图是很快就画出来了,这个读取数据的函数还是挺麻烦的,有兴趣可以看官方的文档,我没看下去。
avg_loss.png
1.4.IOU可视化。
和loss的可视化是异曲同工的,但是这里面并没有保存batch的信息,而且犹豫略去了包含nan的行,所以其实是看不到具体的IOU随着batch变化的精确信息,不过可以看到随着batch的增大,IOU大概是一个怎样的趋势。
#!/usr/bin/env python3# -*- coding: utf-8 -*-""" Created on Thu Nov 29 16:23:11 2018 @author: zhxing """import pandas as pdimport numpy as npimport matplotlib.pyplot as plt#%matplotlib inline lines = 16000 #根据train_log_iou.txt的行数修改result = pd.read_csv('DJI_yolov3_train_iou.txt', skiprows=[x for x in range(lines) if (x%10==0 or x%10==9) ] ,error_bad_lines=False, names=['Region Avg IOU', 'Class', 'Obj', 'No Obj', 'Avg Recall','count']) result.head() result['Region Avg IOU']=result['Region Avg IOU'].str.split(': ').str.get(1) result['Class']=result['Class'].str.split(': ').str.get(1) result['Obj']=result['Obj'].str.split(': ').str.get(1) result['No Obj']=result['No Obj'].str.split(': ').str.get(1) result['Avg Recall']=result['Avg Recall'].str.split(': ').str.get(1) result['count']=result['count'].str.split(': ').str.get(1) result.head() result.tail() # print(result.head())# print(result.tail())# print(result.dtypes)print(result['Region Avg IOU']) result['Region Avg IOU']=pd.to_numeric(result['Region Avg IOU']) result['Class']=pd.to_numeric(result['Class']) result['Obj']=pd.to_numeric(result['Obj']) result['No Obj']=pd.to_numeric(result['No Obj']) result['Avg Recall']=pd.to_numeric(result['Avg Recall']) result['count']=pd.to_numeric(result['count']) result.dtypes fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(result['Region Avg IOU'].values,label='Region Avg IOU')# ax.plot(result['Class'].values,label='Class')# ax.plot(result['Obj'].values,label='Obj')# ax.plot(result['No Obj'].values,label='No Obj')# ax.plot(result['Avg Recall'].values,label='Avg Recall')# ax.plot(result['count'].values,label='count')ax.legend(loc='best')# ax.set_title('The Region Avg IOU curves')ax.set_title('The Region Avg IOU curves') ax.set_xlabel('batches')# fig.savefig('Avg IOU')fig.savefig('Region Avg IOU')
附图:
Region Avg IOU.png
大概可以看一个样子,我是昨天中午2点半开始训练的,本来预计5,6个小时就可以了,直到睡觉前loss还是徘徊在0.04左右,索性就没关机跑了一夜,这几天重感冒一直在宿舍没有出去,早起已经9点过半,看了训练日志大概loss稳定到0.02左右就不再下降了,于是停止训练了。
检测的效果还不错,比上次150张图片训练的准确率要高出不少,天空和树林背景的检测准确率已经很高,不过白色的楼做背景的话,白色的无人机要就检测出来确实比较难。看后面会不会有什么别的好办法。
IOU来看的话还是能看出来一个趋势的,IOU最后基本会稳定在0.8--1的一个水平,从视频上来看,检测框的准确性确实比较一般,不知道还有么有比较好的方法去进一步提高这个精确度,这都再说了,现在就是希望感冒赶紧好起来。
作者:和蔼的zhxing
链接:https://www.jianshu.com/p/7b7420890639