How to train a two class Detectnet Neural Network on DIGITS
Detectnet is a object detection neural network structure based on Caffe (NvCaffe mainteined by Nvidia). Detectnet can be easy trainded by NVIDIA Deep Learning GPU Training System DIGITS and it has a native integration for NVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based video and image.
First of all you have to create your own dataset composed by images containing object oh both classes. When you choose image resolution for the images remeber to use a size divisible by 16 which is the neural network window stride size definition (ex : 1248 x 384, 1280 x 720..) , otherwise you have to change winodws stride size on network definition.
You can find 2 Class Detectnet neural network definition here here
An useful tool for labeling your images to create a Dataset is https://github.com/tzutalin/labelImg you can tag your images and export annotation on Pascal VOC format.
NOTE : Digits expect tagging annotations on KITTI format you can use a conversion Python script to do the job. You can download one on my Github or create your own to fit your needing.
Once your images are ready and objects tagged with annotations it’s time to split them in two directories one for training and other for validation usually a good dataset is composed by 70% images for training and 30% images for validation.
In Digits you can import your Dataset to train DetectNet creating a new Dataset of Object Detection Type, insert data for both train and validation paths for images and annotations and image size.

On Custom Class you have to specify your own tagged classes names usually first is always dontcare class followed by index names of your classes. Index position reflects class names index in Detectnet network definition.
Create a new Model of Object Detection Type, select your epoch of training (depends on how many images is composed your dataset and how much complex are object you are going to detect, in this sapled I trained DetectNet for 600 epochs on 4 GPU’s NVIDIA RTX 1080 and dataset is composed by 400 images)

In my tests for training a Detectnet with two classes I got better Map score result using Adam solver type with Learning rate 0.0001 and Exponential Decay with gamma 0.95 as policy for learning rate.
Select Custom Network Tab and copy Detecnet neural network definition for two classes that you can find here
Adjust network definition to fit your dataset images size on the lines 57, 58, 79, 80, 118, 119, 2504, 2519, and 2545
Change all object_class tags for index mapping to fit class index you have defined when importing Dataset on Digits.

Import Pretrained model for DetectNet model that can be dowloaded here
Select GPU’s you want to use for training and click Create to start Training Job.
After 600 epoch of training that’s the Map score I got for both classes :

Testing trained DetectNet on sample images objects of both classes (helmets and jackets in my case) are successfully detected :
