Updated: Sep 13, 2019
YOLO is a state-of-the-art object detection system. It is used to detect objects in an image and also draw a bounding box around the object. In other object detection systems like Fast RCNN & Faster RCNN, separate networks are used to detect the objects and predict the bounding boxes whereas in YOLO, a single convolutional network predicts the bounding boxes and the class probabilities for these boxes, hence the name You Only Look Once.The original yolo research paper is available here https://pjreddie.com/media/files/papers/yolo.pdf
YOLO is originally implemented in darknet which is an open source neural network framework written in C and CUDA. Later, it is implemented in other libraries like keras, pytorch, tensorflow. In this article, we will see how to train yolo darknet on a custom dataset. The following process is implemented in google colab which is useful as it provides free GPU and also to avoid the hassle of several installations and setups.
Anchor boxes are used to help the model in drawing the bounding boxes around each object. YOLO divides an entire image into grids. Sometimes a single image may contain more than one object to detect. By default, 5 anchor boxes are used which means the model detects a maximum of 5 objects in each grid.
The dataset can be downloaded from here:
Original Annotations: https://github.com/Shenggan/BCCD_Dataset/tree/master/BCCD/Annotations
YOLO Format Annotations: https://github.com/DatascienceAuthority/Bloodcells-Detection-YOLOV3/tree/master/annotations-yolo
This is a dataset for blood cells detection(RBC, WBC, Platelets). It contains 2 folders - Images folder and annotations folder which consists of bounding box values for objects in each image in xml format. Our goal here is to train the model to detect WBC in an image.
Consider the first image ‘BloodImage_00000.jpg’, its corresponding annotation file is in annotations folder ‘BloodImage_00000.xml’. The xml file has the locations of all RBC, WBC, Platelets present in that image in the following format,
Annotations of Above Image :
Each object’s label is in the <name> tag and its corresponding location is represented by the tags <xmin>, <ymin>, <xmax>, <ymax>. This is called Pascal VOC. But to train images with Darknet and YOLOV3, the annotations should be in YOLO format. There is a project on github which can be used to convert annotations from Pascal VOC to YOLO format - https://github.com/ssaru/convert2Yolo .
As we want our model to detect only WBCs; RBCs and Platelets should be omitted from the annotations. So, some custom changes should be made to that project to achieve this task. Instead, you can download the YOLO format annoatations directly from here
During conversion of annotations, another .txt file is generated which has locations of all the images. I have excluded few images (408, 409, 410) so that they can be used to test the trained model. So, these are train.txt and test.txt files.
Move all the train images and annotations to same folder in your Google Drive and correct the locations in train.txt using a text editor.
Create *.names file which consists of class name to detect. In our case, only one class (WBC)
Create *.data file which consist of locations of other files or folders
train, valid, names are the files we created above and backup is the location where the trained model will be saved. The model is saved for every 100 steps (Processing of 1 batch of images i.e batch size is 1 step. Note that the ‘My Drive’ is written as ‘My\ Drive’ in the above file but it is written as ‘My Drive’ in annotation files.
Configuration file (.cfg)
The original yolo is trained on ImageNet data which has many number of classes. Here, we will be training our model to detect only WBCs, so we need to create a custom configuration file.
We should change the batch and subdivisions based on the memory available. Change the classes in all the [yolo] layers to 1 as we are training the model for 1 class (WBC). Above all the [yolo] layers, change the number of filters in the [convolution layer] to 3*(classes+5). The number 3 is the number of masks in the [yolo] layer, classes is the number of classes to detect and the number 5 is due to the parameters in prediction output (center_x, center_y, width, height, confidence). To be specific, create a copy of the configuration file, rename it to yolo_custom.cfg and make the following changes.
Line 6: batch=64
Line 7: subdivisions=64
Line 603, 689, 776: filters=18
Line 610, 696, 783: classes=1
Optionally, you can change the width and height (line 8 and 9) to increase resolution.
The new configuration file after making the above changes
Create a new python notebook in Google Colab and clone the darknet repository from github and setup darknet. The original darknet repository is at https://github.com/pjreddie/darknet but there is another darknet implementation which is suitable for Google Colab https://github.com/kriyeng/darknet/. The Google Drive folder structure includes a space “My Drive” which will raise an error if we use original implementation and it is handled by the git branch ‘feature/google-colab’ in this version of darknet.
Download Pre-Trained Darknet Weights
Upload the files created above to your Google Drive and save their locations in variables
Authenticate so that Colab can access the files uploaded to drive
Train the model
Training will take few hours, it took about 6 hours for me and I stopped training after 1000 steps and the loss is about 0.3. The training weights will be saved for every 100 steps at the location specified (backup).
Start training from last saved weights
Predicting with trained model
Let us see how the model detects on an image that is not used during training
On the left is the original image and on the right is the predicted image. The model is predicting WBC with 97% confidence. The difference in the original bounding box and predicted bounding box can be seen below
Original Data: https://github.com/Shenggan/BCCD_Dataset