Guanghan Ning’s Blog

Recent Posts

Preface

I am sharing thoughts with you in this blog.

If something echoes back, I feel most lucky;

if it is to some extent understood... more than satisfied;

if it drains down into the ground, I have nothing to regret.


Start Training YOLO with Our Own Data

Related Articles:

YOLO

You Only Look Once

 

YOLO, short for You Only Look Once, is a real-time object recognition algorithm proposed in paper You Only Look Once: Unified, Real-Time Object Detection , by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi.

The open-source code, called darknet, is a neural network framework written in C and CUDA. The original github depository is here.

As was discussed in my previous post (in Chinese), the Jetson TX1 from NVIDIA is a boost to the application of deep learning on mobile devices and embedded systems. Many potentially inspiring products are approaching, one of which, to name with, is the real-time realization of computer vision tasks on mobile devices. Imagine the real-time abnormal action recognition under surveillance cameras, the real-time scene text recognition by smart glasses, or the real-time object recognition by smart vehicles or robots. Not excited? How about this, the real-time computer vision tasks on egocentric videos, or on your AR and even VR devices. Imagine you watch a clip of video shot by Kespry (What is this?) , you experience how Messi beat less than a dozen players and scored a goal. This can be used for educational purposes, where you stand in a player’s shoes, study how he/she observes the real-time circumstances and handles the ball. (If you are considering a patent, please put my name to the end of the inventors list.)

That being said, I assume you have at least some interest of this post. It has been illustrated by the author how to quickly run the code, while this article is about how to immediately start training YOLO with our own data and object classes, in order to apply object recognition to some specific real-world problems.

Here are two DEMOS of YOLO trained with customized classes:

Yield Sign:

Stop Sign:

The cfg that I used is here: darknet/cfg/yolo_2class_box11.cfg

The weights that I trained can be downloaded here: yolo_2class_box11_3000.weights

The pre-compiled software with source code package for the demo: darknet-video-2class.zip

You can use this as an example. The code above is ready to run the demo.

In order to run the demo on a video file, just type:

./darknet yolo demo_vid cfg/yolo_2class_box11.cfg model/yolo_2class_box11_3000.weights /video/test.mp4

If you would like to repeat the training process or get a feel of YOLO, you can download the data I collected and the annotations I labeled.

images: images.tar.gz

labels: labels.tar.gz


 

I have forked the original Github repository and modified the code, so it is easier to start with. Well, it was already easy to start with but I have so far added some additional niche that might be helpful, since you do not have to do the same thing again (unless you want to do it better):

(1). Read a video file, process it, and output a video with boundingboxes.

(2). Some utility functions like image_to_Ipl, converting the image from darknet back to Ipl image format from OpenCV(C).

(3). Adds some python scripts to label our own data, and preprocess annotations to the required format by darknet.
(…More may be added)

This fork repository also illustrates how to train a customized neural network with our own data, with our own classes.

1. Collect Data and Annotation

(1). For Videos, we can use video summary, shot boundary detection or camera take detection, to create static images.

(2). For Images, we can use BBox-Label-Tool to label objects.

2. Create Annotation in Darknet Format

(1). If we choose to use VOC data to train, use scripts/voc_label.py to convert existing VOC annotations to darknet format.

(2). If we choose to use our own collected data, use scripts/convert.py to convert the annotations.

At this step, we should have darknet annotations (.txt) and a training list (.txt).

Upon labeling, the format of annotations generated by BBox-Label-Tool is:

class_number

box1_x1 box1_y1 box1_width box1_height

box2_x1 box2_y1 box2_width box2_height

….

After conversion, the format of annotations converted by scripts/convert.py is:

class_number box1_x1_ratio box1_y1_ratio box1_width_ratio box1_height_ratio

class_number box2_x1_ratio box2_y1_ratio box2_width_ratio box2_height_ratio

….

Note that each image corresponds to an annotation file. But we only need one single training list of images. Remember to put the folder “images” and folder “annotations” in the same parent directory, as the darknet code look for annotation files this way (by default).

You can download some examples to understand the format:

before_conversion.txt

after_conversion.txt

training_list.txt

3. Modify Some Code

(1) In src/yolo.c, change class numbers and class names. (And also the paths to the training data and the annotations, i.e., the list we obtained from step 2. )

If we want to train new classes, in order to display correct png Label files, we also need to moidify and rundata/labels/make_labels

(2) In src/yolo_kernels.cu, change class numbers.

(3) Now we are able to train with new classes, but there is one more thing to deal with. In YOLO, the number of parameters of the second last layer is not arbitrary, instead it is defined by some other parameters including the number of classes, the side(number of splits of the whole image). Please read the paper.

(5 x 2 + number_of_classes) x 7 x 7, as an example, assuming no other parameters are modified.

Therefore, in cfg/yolo.cfg, change the “output” in line 218, and “classes” in line 222.

(4) Now we are good to go. If we need to change the number of layers and experiment with various parameters, just mess with the cfg file. For the original yolo configuration, we have the pre-trained weights to start from. For arbitrary configuration, I’m afraid we have to generate pre-trained model ourselves.

4. Start Training

Try something like:

./darknet yolo train cfg/yolo.cfg extraction.conv.weights

If you find any problems regarding the procedure, contact me at gnxr9@mail.missouri.edu.

Or you can join the aforesaid Google Group; there are many brilliant people answering questions out there.


PS:

I also have a windows version for darknet available:
http://guanghan.info/projects/YOLO/darknet-windows.zip
But you need to use Visual Studio 2015 to open the project. Also note that this windows version is only ready for testing. The purpose of this version if for fast testing of cpuNet.

Here is a quick hand-on guide:

1. Open VS2015. If you don't have it, you can install it for free from the offcial microsoft website.

2. Open: 
darknet-windows\darknet\darknet.sln

3. Comiple

4.
Copy the exe file from:
darknet-windows\darknet\x64\Debug\darknet.exe
to the root folder:
darknet-windows\darknet.exe

5. Open cmd 
Run: darknet yolo test [cfg_file] [weight_file] [img_name]

6. The image will be output to:
darknet-windows\prediction.png
darknet-windows\resized.png

Recently I have received a lot of e-mails asking about yolo training and testing. Some of the questions are towards the same issue. Therefore, I picked some representative questions for this FAQ section. If you find a similar question here, you may have an answer for yourself right away. Since I am also a student of the darknet, if you find any of my answers erroneous, please comment below. Thanks!

FAQ:

Q: 
Sorry about last email, I just re-read your Github and found you used (5 x 2 + 2) x 11 x 11 = 1452.
I have another confusion about the subdivisions, the author set subdivisions = 64, can you give me some clues about what this variable mean?

A:
In my understanding, if you have 128 images as a batch, for example, you batch update the weights upon processing 128 images.

And if you have subdivision to be set to 64, you have 2 images for each subdivision. For each division, you concatenate the ground truth image feature vectors into one and process it as a whole.

If you set subdivision to 2, the training is the fastest, but you see less results printed out.

 

Q: 

– how to get the predicted bounding boxes coordinates, I am planning to write the complete results, detection class labels and the associated predicted bounding boxes  coordinates in a flat text file. Which source file I should look into or modify to do this?
–  when to stop training? – when I am training yolo it is not showing any information about the accuracy of the network?
–  I am wondering how to test it on videos, because the test command takes an image at a time, so how to pass multiple images to yolo for testing?

A:

As of the bounding boxes, you can find it in [yolo_kernels.cu], in function “void *detect_in_thread(void *ptr)”.

If you want to control the training, just modify the configuration file. For example, in [yolo_2class_box11.cfg], change the max_batches and steps in line 14 and 12, respectively.

In order to test a video file, just type this in the terminal: ./darknet yolo vid_demo yolo_2class_box11.cfg yolo_2class_box11_3000.weights  /video/test.mp4

 

Q:

I downloaded all your images and lables, modified my training_list.txt and start to train, but always ends with a ‘Couldn’t open file: /root/cnn/guanghan/darknet/scripts/labels/stopsign/rural_027.txt’ problem.

Each time says the different file that can not open.

A:

Please check if the folder [labels] and the folder [images] are in the same directory. I think this is the problem.

Q:

Yes, they are all in the darknet/scripts directory:

labels

A:

Oh, you made a typo… It is “labels”, not “lables”. That’s the issue. >_<!

 

Q:

Thanks for your reply, Ning. I tried to repeat the training results of stopsign and yieldsign but looks like I am not able to get the right results.

  1. After downloading your images and annotations,
  2. I changed the line 218 output=588 and classes=2 in cfg/yolo-tiny.cfg. In your github website, yolo_2class_box11.cfg line 218 is output=1452. This is not correct for 2 classes, right?
  3. After execute the training, for several tries, the AOU=nan….
  4. Anyway, I waited for a couple of hours and get a 2000 iteration weight results.
  5. By testing on this video https://www.youtube.com/watch?v=OCaRH_C_USg with setting threshold=0, no bounding box show up.

Any comments? I am afraid something wrong at step 3 because everything is nan….

A:

  1. The second last layer is correct. For 11 by 11 splits of image (in order to be more accurate in small object recognition), the number of neurons at this layer should be: (5*2 + 2)* 11*11= 1452.
  2. During training, the “AOU= nan” sometimes occur. This is a little complicated to explain, but I will do my best to simplify the answer:
    1. First, try to reduce the learning rate to a smaller value, this usually helps.
    2. When the annotation data is not correct, by which I mean there exists a training image whose annotation is empty.
      1. If there is no object labeled, the code will try to update the weights at some point, with no actual data fed. This batch update will be nonsense, and ruin the current weights.
      2. But since you downloaded my data, this should not be the case.
    3. When the annotation data is correct, this sometimes happens because there is a hidden bug in darknet code.
      1. During training, the code tries to “randomly” sample image patches from the training image, as a way of data augmentation. If the bounding box is near the edge of the image, sometimes the sampled patch will cross the border. This invalid data will ruin the update of weights as well.
      2. One way to avoid this is to batch update the weights with many images instead of a single image, thus alleviate the effect.
      3. This is why in the cfg file I set subdivisions = 2 instead of default 64.
  1. If you use the tiny model, you should pre-train the model yourself or use darknet.conv.weights, rather than fine-tuning with the provided pre-trained model extraction.conv.weights. It is because the pre-trained model extraction.conv.weights had more layers than the tiny model.

 

Q:

I tried to add a 3rd class in the training, but somehow the test always gives me the first two. The 3rd class never showed up if I set the threshold to 0 or even -1.

Does your training code only works for 2 classes?

I changed the number of classes to 3 both in yolo.c and cfg.

A:

Have you checked out yolo_kernels.cu? You should set CLS_NUM to be 3 in there as well.

 

Q:
I see that the bounding boxes for the yieldsign and stopsign are not tight. Also the top left corner seems to be where it should be Is there any specific reason for the bounding box to be bigger than what it should be and the exactness in location for the top left corner?
A:

The image of YOLO is split to 7*7 partitions by default. In order to predict more precisely, you could modify the second last layer by setting it to be 11*11 partitions, for example. This only alleviates the problem but does not completely solve the problem, because the splits are still not arbitrary.

The fast rcnn can generate more precise detections because the bounding box can occur in arbitrary position, and that is why it is much slower than yolo.

Once thing we can do is to add a post-processing step to align the bounding box.

 

Q:

I wanted to ask what data you used for this..is this any public data you used, or did you annotate your own data..

how many frames did you have to train for each of these classes to get a descent performance?

A:

The data I used was downloaded from [Google Images], and hand-labeled by my intern employees. Just kidding, I had to label it myself. It is covered in the README how to label data.

Since I am training with only two classes, and that the signs have less distortions and variances (compared to person or car, for example), I only trained around 300 images for each class to get a decent performance. But if you are training with more classes or harder classes, I suggest you have at least 1000 images for each class.

 

Q:
Is it possible to share the data and the annotations you collected for me to get a feel of YOLO?. I am limited by resources and hence not able to test it on large datasets as of now. Also, it will be easy for me to comprehend your instructions with this data.
A:

I have uploaded the data that I trained for the demo, with the corresponding annotations. If you download them, you should be repeating the training process easily.

Here are the links: images.tar.gzlabels.tar.gz

Q:

I have successfully run your example, but when switched to running my data with 10 class (1000 images) I got an error “cannot load image”

I have checked the image file, link to image and open normally, Do you have any suggest ?

A:

I have encountered similar problems before. The solution to my problem was that I checked the txt file. It seems that there is a difference between Python2 and Python3 in treating spaces in text files. You cannot see the difference in the text file, though(unless you look at the char). I used Windows(python3) to generate the text file, and used Ubuntu(python2) to load the images, and that’s where I met this problem.

In short, the program failed to interpret and find the correct path.

Maybe you are having the same issue? Please check it out.

Leave a Reply

233 Comments on "Start Training YOLO with Our Own Data"

Notify of

Shengxi Li
8 years 1 month ago

During training, the “AOU= nan” sometimes occur and ruin the training process. When bounding box is near the edge of the image, how can i avoid the sampled patch will cross the border and remove the invalid data?

Francisco Erivaldo Fernandes Junior
8 years 16 days ago

How did you use the BBox-Label-Tool? I tried here and the images simply won’t load in the program. I don’t know what is the problem.

Weisheng
8 years 12 days ago

Did you tried YOLO on Tx1 using camera input?
I have compilation of error for YOLO+Opencv3.0+Tx1 as explained in web below
https://groups.google.com/d/msg/darknet/AfZcD-C6yXY/Knr8iKmeAwAJ

Abhinav
7 years 10 months ago

How long did it take you to train on 300 images for 1 or 2 classes? I am trying to train for a single class from VOC dataset and I am using Grid K520 from AWS EC2 g2x.large instance. It is taking like forever to train for 50 images. I have followed the steps from your blog and made modifications to yolo files. I am just wondering if this is normal or have I missed something.

Should I compute my own pre-train weights if I am trying to classify for just “car” class?

I have only 50 images in train.txt files.

Guanghan Ning
7 years 10 months ago

Maybe you have too many epoches? You can just stop training, and use the intermediate weights. In my case, I used [yolo_2class_box11_3000.weights] in the example, which took less than an hour on TITANX to train.

Abhinav
7 years 10 months ago

I did that and used weights for 1000, 2000 but they are not giving good results. Is there some other metric to see like how much is MAP or some other error measure? I can see some numbers printed on the console and they are going down like 0.8 .. then 0.77 .. something like that.

From my knowledge and experience with nerual networks, an error measure is generally associated with the training and sometimes minimum error metric is used to stop training rather than number of training iterations.

Abhinav
7 years 10 months ago

I noticed a mistake while preparing data using voc_label.py. I was doing a mistake in class label while preparing label files. Then I did some more digging and I found some lines in yolo.c which bugs me.
I found the same lines in your fork of darknet too. Some lines consists of reference to voc files. For e.g. in line 155 of yolo.c I found this:

list *plist = get_paths(“data/voc.2007.test”);

Will this not affect the outcome of the program for custom data?

Abhijay
7 years 2 months ago

Hi Abhinav,
How long did you wait?

derklishuai
7 years 10 months ago

Hi! Guanghan I successfully downloaded your images and labels and make the source code from the darknet website wiht makefile set at GPU=1 OPENCV=0, but I encountered a problem when I started training,it always shows that segment error(core dumped)Is it because the image number are too small cause I only used 500+images for training,but changing the parameters batch and subdivisions to a small value in yolo.cfg also don’t works.
Help me…Thank you very much!

Guanghan Ning
7 years 10 months ago

Do you have enough memory? I remember one of the pals who emailed me had the same problem. In his case, there was not enough memory of his laptop.
It is not a problem of the number of images used for training. Lacking training samples would certainly impair the performance, but should not bring this problem.

Abhinav
7 years 10 months ago
You probably have to increase the batch size. I encountered similar problem when I set subdivisions = 2. I increased it to 8 and then I did not see the same problem. It gave me a message not enough GPU memory. I was using only 50 images so may be its different for you because you have 500+ images. I noticed that for highest subdivisions i.e. 64 its saving weights after every 100 iterations and for lower values like for 8 it saved for 600 and then 1000 then 2000 and so on. I dont understand how these are being… Read more »
Abhinav
7 years 10 months ago

Sorry I meant increase the subdivisions in the first statement.

Luis
7 years 10 months ago

Can I train YOLO on a CPU for only detecting one single class??

Jie Lian
7 years 10 months ago

Hey,Ning,How to test yolo using python script,just like demo.py in fast-rcnn?

Guanghan Ning
7 years 9 months ago

Somebody may have wrapped it up. But here is one way to go around it:

cmd = “./darknet yolo test cfg/cpuNet6.cfg cpuNet6.weights %s%s” % (app.config[‘UPLOAD_FOLDER’], filename)
try:
stdout_result = subprocess.check_output(cmd, shell=True)

Abhinav
7 years 10 months ago

Is your code saving intermediatery results like that of original yolo code?

I am training for one class using your code and I could not find backup weight files after 100 iterations.

I am wondering if I have made another mistake while setting up training files.

Where can I check in source files about this that after how many iterations its writing a weight file?

Abhinav
7 years 10 months ago

I found that it saved weight file for 600 iterations but there should be some way to change this. It may be quick on high GPU machines but on the AWS g2x large its fairly slow. It is hard to guess if someone has to wait till 600th iteration before one finds out that its working fine.

Guanghan Ning
7 years 9 months ago

It is hard-coded by the author in “yolo.c”:

if(i%1000==0 || i == 600){
char buff[256];
sprintf(buff, “%s/%s_%d.weights”, backup_directory, base, i);
save_weights(net, buff);
}

You can change this as you like.

learn
7 years 3 months ago

Hi Abhinav. Can you please tell me how you got it working. I also have 600,1000,2000etc weights saved in the backup folder and the training does not stop. Its been 8 hours and still continues?
Also the model did not start training until I gave the learning rate to be 0.000000000001 and batch size=64 and subdivision=8 for two classes.
Please let me know if I have to change something

hyh
7 years 9 months ago

I copy your code and model file ,and run the test”./darknet yolo test cfg/yolo_2class_box11.cfg yolo_2class_box11_3000.weights scripts/sign/images/stopsign/001.JPEG “,but I get no box.I wnat to know why?

Itt
7 years 9 months ago

How much time it took for anyone of you ? Can you please share including the GPU name.

chenxiaodong
7 years 9 months ago

thank you for your article.I hve succeed running YOLO in Linux system.I want to run the YOLO in windows system,have you ever succeeding running in windows system? wait for your reply!

Itt
7 years 9 months ago

How can I fine tune only one layer, e.g. the last layer only ?

Itt
7 years 9 months ago

Any info on this Guanghan. Thanks

Ed
7 years 9 months ago

Hye Guys. is it possible to do text recognition using YOLO?

Abhinav
7 years 8 months ago

Thanks for sharing the windows version. I was able to use your version of yolo-windows for vs2015 but I am having a hard time building opencv binaries for vs 2015. Can you please share a version of yolo-windows with pre-built opencv binaries for vs2015 ?

Abhinav
7 years 8 months ago

The training and demo functions are missing in the windows version you have shared. Kindly make a note of that in the description. It can only be run in “test” mode.

Abhinav
7 years 8 months ago

Is it possible to build windows version of darknet using visual studio 2012? If not then what is lacking in visual studio 2012?

I have seen another version of darknet which is working for visual studio 2012 and its built using opencv binaries as well. But this version has some issues and I am not able to run darknet in demo version.

I need to have a windows version built with opencv which is capable of running the demo version.

Nora E.
7 years 8 months ago
Hi! I am trying to detect two classes. For that, I did the modifications you described in this blog entry: change .cfg file setting the no. of outputs to 588 (12 x 7 x 7) and no. of classes to 2, change yolo.c and change number of classes in Yolo.cu. Despite working perfectly with 20 classes, when I make all the changes mentioned above, no bounding boxes appear. If I run the detection with threshold for detection probability equal to zero, bounding boxes are drawn but are very very small.Do you know what might be going on? I am using… Read more »
Abhinav
7 years 8 months ago

the yolo.weights is for 20 classes. If you want to use it to detect two out of 20 then you should not make any changes to the code and use it as it was. It can detect any objects present among those 20 classes and draw boxes.

The changes in the code is for the case when you have your own training data for any n number of classes and you want to train for that and obtain your own weights

Nora E.
7 years 8 months ago
Ok, thank you. My problem is that I just want to detect two classes out of 20 classes. If I do not change the code at all, despite setting a high threshold, YOLO detects other classes that are not correct. For instance, I want to detect just two classes (e.g. aeroplane and person) and YOLO detects the class sofa with a class probability that is quite high. So my question is, how can I make YOLO detect just the two target classes, namely aeroplane and person. What I do is the following: in yolo.c I set #define CLASSNUM 2 char… Read more »
Nora E.
7 years 8 months ago

[SOLVED] Hi! I solved it modifying the code and just taking the values that are relevant to my target classes.

guo
7 years 7 months ago

Hi,I encountered the similar problem with you,can you tell me how you solved it ? Thank you very much.

Nicholas
7 years 8 months ago

I can’t run vid_demo using the source code that you have provided (darknet_video_2class.zip), there is no vid_demo file under the obj directory. Do you have another link to the vid_demo file?

Nora E.
7 years 8 months ago

Hi! YOLO is working and detects and draws the corresponding bounding boxes correctly. But also, it displays a big Bounding Box that does not correspond to any of the classes to be detected. Does anyone know why this big Bounding Box is drawn and where it comes from?

Alfonso
7 years 8 months ago

Hi! Do you have any alternative to BBox-Label-Tool??

Abhinav
7 years 8 months ago

I find this labelling tool easier to work with

https://github.com/tzutalin/labelImg

Alfonso
7 years 8 months ago

Thanks very much!!

Trinh
6 years 10 months ago

Using labelImg will generate a xml file but I don’t know how to convert this xml to yolo format. Could you please tell me how?

Han
6 years 9 months ago

Yes, please provide the solution, how to use xml file generated by labelImg to train yolo. Waiting for answer. Thank you very much.

cartucho
6 years 2 months ago
Oscar
7 years 8 months ago

Hello? I have a question about the new class that had never been in yolo.c file. If a class is added up to the old class list (let’s say it is the ‘computer mouse’ class), does it detect the new class object as well as the old one?

Ed
7 years 8 months ago

Can i do more than 20 classes?

hukg
7 years 8 months ago

Hi, nice work and thanks for sharing.
When I use darknet for recognition of cars, I have installed CUDA. I find it can’t adjust for OpenCV. I guess the reason is the version of it. So which version did you use?
Thanks in advance.

Nora E.
7 years 8 months ago

Hi!

I have two questions:
What do you use to stabilize the Bounding Boxes and prevent them from flashing?
Do you do any sort of fine-tuning?

Thank you very much for your help.

Alfonso
7 years 8 months ago

Hi!

I try the training for two classes (stopsign and yieldsing) and everything is ok, the training is completed without errors but, when i used the weights the bounding boxes of both signals (stop and yield) have the same label “stopsign” (with good score)

Do you know why?

Futhermore i followed the steps in this page and i detected two issues (i think):

1) When i go to create the labels, if i don’t change “stop sign” by “stopsign” throw me a error.

2) In yolo.c in line 16 appears yeildsing instead of yieldsign.

Anas KH
7 years 8 months ago

are you planning to add the training stage to the windows version that you built?

saikrishna gv
7 years 7 months ago

list *plist = get_paths(“/home/pjreddie/data/voc/2007_test.txt”);
What should this be replaced with in the code “src/yolo.c”? (mentioned in your 3rd point)

thanks you!

Niharika Maheshwari
7 years 7 months ago
Hi, I am trying to detect smaller objects on the resized 448*448 input image. Due to resizing the target objects become smaller than 64*64 i.e. “side = 7” is too small. As a quick first step — I modified ONLY side =14 and the final FC layer = 5880 [14*14(5*2+20)] I retrained it on PASCAL VOC 2012/2007 (around 30,000 images) with batch size of 64, and max batches = 40,0000 However my training run is terrible: 66: 11209.21 117.15 avg, 0.00500 rate, 1.826401 seconds, 4160 images and after 80 batches all parameters printed are “nan” 1. Any suggestions as to… Read more »
Oscar
7 years 7 months ago

The .weights file is not required to generate a new .weights file for new classes, I guess. In case of replacing all classes darknet has, command arguments ‘darknet.exe yolo train myCFG.cfg” will be fine. If this is wrong, please let me know.

Hammer
7 years 7 months ago

Hi, how can i get the mAP, i know “void validate_yolo_recall(char *cfgfile, char *weightfile)”. But how do i calculate with this output the mAP?
my output of the validation process

812 964 1094 RPs/Img: 30.79 IOU: 72.07 Recall: 88.12
813 965 1095 RPs/Img: 30.78 IOU: 72.07 Recall: 88.13
814 968 1098 RPs/Img: 30.80 IOU: 72.07 Recall: 88.16

Bharat
7 years 7 months ago

Hi,

I am trying to train the machine with PASCAL VOC 2007 and 2012 dataset. However, the process is running forever. Can someone tell me how much time does it require to train?

GPU: Quadro k2200 GPU
batchsize in yolo.train.cfg is 128
subdivision is set to 128.

If I use any other size combination like 64 and 2 or 128 and 2, I am getting a cuda out of memory error.

Any help will be appreciated.

Thanks,

Chanhee Jean
7 years 6 months ago

Hi,
By now, you may have figured it out. but just in case you haven’t, read this (https://groups.google.com/d/msg/darknet/fdkf4tbr-e4/FZq_U2mhBQAJ)

so if you run out of memory, reduce batch size to 64

and then try to increase your subdivisions (starting from 2) by the power of 2.

train with the smallest subdivision number that does not cause any error

Jürgen Schmidt
7 years 7 months ago
Hey, what was your training command? I try to reproduce the same results with your data as you have with the weightsfile that you produce but I never get them. When I start training with ‘./darknet yolo train cfg/yolo_2class_box11.cfg extraction.conv.weights’ I am starting to get an average IOU = ‘nan’ errors all the time after the second picture. And if I try it with ‘./darknet yolo train cfg/yolo_2class_box11.cfg darknet.conv.weights’ it starts and I don’t get any AOU=’nan’ error, but after training it for 6 hours on my gtx 960 I also get a ‘yolo_2class_box11_3000.weights’ file but it can not detect… Read more »
oscar
7 years 7 months ago

Read my comment below. The source codes from Darknet has been altered by original author for some reason. It won’t read it right without making corrections in parser.c and also won’t write any good weights file without making corrections both in utils.c and parser.c. That’s why it shows NaN.

oscar
7 years 7 months ago

I have a question; why did you use FILE *fp = fopen(filename, “w”) in line 612 and FILE *fp = fopen(filename, “r”) in line 723 in parser.c? Even though your pre-trained weights (yolo_2class_box11_3000.weights) only can be accessed if “rb” is used for line 723. And did you save the weights file with “w” option or “wb” option? Thanks.

oscar
7 years 7 months ago
To my question, the sources parser.c and utils.c are wrong. The original all weights files from Darknet must have been written with wb option (weights files are only accessible by rb option in fopen().) but the author had changed it for some reason after writing the right weights files. Maybe he wanted some level of insurance that no body will write right traininig weights file? lol. There is also some code changes in utils.c. Without making corrections, the source from Darknet won’t write any weights file right. (at least in Windows) Also, you don’t need to change CLASSNUM to any… Read more »
mg5919
7 years 3 months ago

what changes are necessary in the file “utils.c”?

moh dem
7 years 6 months ago

I’m trying to train YOLO on my own data on CPU
but i got a message that the training was killed , any ideas about it?
https://groups.google.com/forum/#!topic/darknet/M63bCX4Gygg

JCHEN
7 years 5 months ago

I have the same problem. have you solved this?

yang
7 years 6 months ago

Hi,Ning.I have some questions about the yolo paper.
Why it divides the image into 7 * 7 grids,and why each grid cell predicts B bounding boxes.
Why can’t we just use 1 * 1 grid to predict B bounding boxes ?

Chanhee Jean
7 years 4 months ago

if it’s 1*1 grid, you can’t detect smaller object.

Eleuterio
7 years 5 months ago

Is it possible to train a detector with images that have more than 3 channels? Like RGBA images

Chanhee Jean
7 years 4 months ago

i don’t see why not. but in order to do so, you have to change the network little bit.
maybe the first layer filter since it is designed to conv with 3channel images.

tanvir
7 years 5 months ago

After putting the data on traning the following error is coming after a week,note i am running darknet on cpu,voc data 2007 and 2012….can any one help

/home/tanvir/darknet/VOCdevkit/VOC2007/JPEGImages/004297.jpg
Cannot load image “/home/tanvir/darknet/VOCdevkit/VOC2007/JPEGImages/004297.jpg”
STB Reason: can’t fopen

Zoro
7 years 5 months ago

I have the same problem. Im training YOLO for 196 classes.

Chanhee Jean
7 years 5 months ago

check if you have image at given path.
if everything is ok and you can open the image fine, then it may be the encoding problem.
i once copied the text file from windows and encountered similar problem.
i solved mine by rewriting all text file in ubuntu with UTF-8 encoding.

Shai
6 years 7 months ago

The UTF-8 didn’t help, but your post gave me the idea to check for similar things. I was editing the text file in notepad++ on windows, so it used the Windows CR LF system, changed it to Unix LF and it worked! thanks!!

haoran liu
7 years 5 months ago

嗨 你的文章写的真棒!我的环境是ubuntu16.04+gcc 4.9和5.4+CUDA 7.5 + GTX1080,但是遇到一个问题,OPENCV=1的时候可以,但是GPU=1的时候报错,是编译错误:
……
/usr/local/cuda/include/surface_functions.h(484): error: expected a “;”
/usr/local/cuda/include/surface_functions.h(485): error: expected a “;”

Error limit reached.
100 errors detected in the compilation of “/tmp/tmpxft_000001b4_00000000-7_softmax_layer_kernels.cpp1.ii”.
Compilation terminated.
Makefile:55: recipe for target ‘obj/softmax_layer_kernels.o’ failed
make: *** [obj/softmax_layer_kernels.o] Error 4

请问这个错误怎么解决呢?

Taras Filatov
7 years 5 months ago

Haoran Liu, I had similar errors when trying to build the original DarkNet with OpenCV3. Check if you can build with GPU, CUDNN and OPENCV set to 0 in Makefile. If that works, you may try the patch from Prabindh Sundareson and make sure you set the path variables etc correctly, I’ve posted my steps here: https://covijn.com/2016/11/fixing-darknet-opencv3-make-error-convolutional_kernels/

Anthony Budd
7 years 5 months ago

Why would you provide all of the data to retrain the model and not provide us the video file to test it?

Also Why would anyone store their videos here? “/video/test.mp4” it should be “video/test.mp4”

Chanhee Jean
7 years 5 months ago

for your first question or grumble, i believe google can provide you a test video very easily.
(I believe it might have taken less time to google any test video than writing your complaints here)

second, i believe he(author) made a typo.

Soh
7 years 5 months ago

Is there any way to supply variable size input images to YOLO for training and testing? Obviously resizing to a fixed size is one option. Is there any other way to get around it?
Thanks in advance for helpful comments.

Chanhee Jean
7 years 5 months ago

as far as i know, you can’t have images of different sizes for conv net for training. Maybe there is a way to do so, but i see no point of doing it. why do you want to do so? I want to know about your idea.

for testing, you can provide images with various size

Yasha
7 years 4 months ago

Is it possible to train on different size images?
The images provided for stopsign and yeildsign are of variable size.
If its not possible to train on different size images , can I convert the annotation data to correspondinly size images?
I dont want to label images again

Chanhee Jean
7 years 4 months ago

when you start training, it read all the images and then automatically resize all images to 448×448 (or according to your setting in cfg)

so example images (stopsign/yieldsign) will be resized before it gets fed to network.

i am not sure what annotation data you’re talking about, but once you convert labeling data using convert.py, it only contains ratio information. so it doesn’t really matter if you resize images to different size at training time.

Taras Filatov
7 years 5 months ago

Ning, many thanks for providing your solution here. I’m running your pre-compiled version on Ubuntu 16.04 trying to feed it my own .mp4 video file and it returns the following error:

./darknet yolo demo_vid cfg/yolo_2class_box11.cfg [my_weights] [my_video.mp4]
/skip/
32: Detection Layer
forced: Using default ‘0’
Loading weights from myyolo.weights…Done!
Video File name is: big_buck_bunny_720p_2mb.mp4
./darknet: symbol lookup error: ./darknet: undefined symbol: _ZN2cv12VideoCaptureC1ERKSs

do you have any recommendations on how to resolve this? The only modification I had to do is copy libraries libcudart, libcublas, libcurand from their 7.5 versions to 7.0 as I have later version of CUDA installed and your pre-compiled requires 7.0.

Taras Filatov
7 years 5 months ago

please disregard the above question as I was able to compile from sources without problems

YAsha
7 years 5 months ago

30: Connected Layer: 12544 inputs, 539 outputs
31: Assertion failed: (side*side*((1 + l.coords)*l.n + l.classes) == inputs), function make_detection_layer, file ./src/detection_layer.c, line 27.
Abort trap: 6

I have only one class to train. So , mu output according to your formula is 539. WHy am I getting the assertion error ?

Chanhee Jean
7 years 5 months ago

you have to change the value of ‘side’ accordingly.
in your case, it seems that you wanted to do 7*7*(5*2+1)=539
make sure you have set ‘side’ value to 7

YAsha
7 years 5 months ago

31: Assertion failed: (side*side*((1 + l.coords)*l.n + l.classes) == inputs), function make_detection_layer, file ./src/detection_layer.c, line 27.

I keep getting this error , I have changed the output to 1452 and Num of classes to 2 ..
I’,m using your data of images and stopsign.

Chanhee Jean
7 years 5 months ago

Make sure if ‘side == 11’

YAsha
7 years 5 months ago

Couldn’t open file: /data/voc/train.txt
When I run the training code , it references this file.
Icouldnt fin in the code where to modify this path?

Chanhee Jean
7 years 5 months ago

you can modify trainlist.txt file path in yolo.c
or you can just put your trainlist file in specified directory.
/data/voc/ in your case

AndDev
7 years 3 days ago

I did this char (I set in yolo.cfg: char *train_images = “/home/osboxes/Desktop/yolo-test/train.txt”; char *backup_directory = “/home/osboxes/Desktop/yolo-test/backup/”;). But it still asks about /data/voc/train.txt because it’s written in cfg/voc.data: “train = /home/pjreddie/data/voc/train.txt”
but example from this site doesn’t mention anything about cfg/voc.dat?
What I’m doing wrong?

Sud
7 years 5 months ago

hello. I followed the training procedure for roadsign detection, using training images and label file that you have provided. However, after the first iteration, I am getting “Avg IOU : nan” for all images. I checked all possible issues but still not able to solve this problem.

dev
7 years 4 months ago

Were you able to fix this problem?

Chanhee Jean
7 years 4 months ago

I do not know exactly what’s happening this problem. but i trained mine with smaller model and smaller learning step. try train with small or tiny model. and have your learning rate low. when you change your learning rate in cfg file, change steps and scales as well. if the first scale is too big, try to make it low.

Zoro
7 years 4 months ago

I am having same issue, all images box(x,y,w,h) = (nan, nan, nan, nan) and detection IOU : nan also. Any idea why? I followed all steps and using same data and labels as given

Weishan
7 years 5 months ago

Hi there, I followed the provided example to reproduce the training of a net for stop & yield sign detection. The training started from scratch. The cfg file used is yolo_2class_box11.cfg. However, the loss remained around 1.75 after 13000 iterations, like this:

13875: 1.657367, 1.751951 avg, 0.005000 rate, 7.913005 seconds, 888000 images

This model cannot correctly detect the signs at all… It looks like the loss cannot decrease efficiently. Using the provided trained model, the loss is around 0.4, and the model produces correct detection.

Can anyone help? Anything I missed here? Many thanks in advance.

Chanhee Jean
7 years 4 months ago

I got my loss stuck around 1.43 like you. I took weight file in which loss hasn’t reach 1.43 and continued train with smaller model. i started with yolo_2class_box11.cfg and then i changed the model to tiny and continued training it. and i got pretty fine result.
oh and also playing with learning rates would help also. i tried with many different learning rates.
I think being stuck at certain loss has something to do with dead neuron. I think we need more study and experience.

Duy Nguyen Nhat
7 years 5 months ago

Sorry, May you explain for me the meaning of the relation between subdivision and batch size? Because I am reading code of YOLO to make a comprehensive evaluation for my thesis and I don’t know the relation of 2 parameters. Thanks!

Chanhee Jean
7 years 4 months ago

batch size : how many images you will load to memory.
subdivision: how many time you will divide those images in a batch to upload.
so, you are actually loading batch/subdivision images at once.

Chanhee Jean
7 years 4 months ago

so if you increase batch size, you are analyzing more images in order to make an update. so update accuracy goes up at the cost of more training time.
i think subdivision is for the case of memory problem. if you encountered memory issue, try to increase subdivision by power of 2. and use the number that does not cause memory problem

Yasha
7 years 4 months ago

What is the “Count” in the training data?
Detection Avg IOU: 0.000000, Pos Cat: 413.734467, All Cat: 310.563416, Pos Obj: -27.928844, Any Obj: -22.873415, count: 32

It takes over 15-20 minutes for each image to train. I feel like it is too much time. How much time should it take for training on the stopsign data?

Chanhee Jean
7 years 4 months ago
count means how many detection was made within one batch. btw, i think something is definitely wrong… all the numbers are very weird. when i trained tiny model with i7 , i think it took few minutes for each batch. so i quit doing it with CPU, with GTX680m, only took 1-2 sec. with GTX1080, less than a sec. you should be aware of the meanings of all the numbers above. it helps you understand what’s actually going on inside. ——————————— Detection Avg IOU (Intersection Over Union): How much predicted bounding box overlaps with ground truth bounding box. (higher, the… Read more »
Markus
7 years 4 months ago

Hi, is there any chance I can reproduce your results using yolo-small.cfg ? I have an old gtx670 with only 2gb of VRAM, thus I cannot use your .cfg file.

Chanhee Jean
7 years 4 months ago

you can reduce batch size or increase subdivision in order to make it work with your GPU.
go head and edit those part in your cfg file and then give it a try.
of course, smaller model will do the work as well

Markus
7 years 4 months ago

thx for your reply. Seems I cannot reduce the size enough by tweaking the parameters. But its very good to know that I can use the small model too. Given your reply to Yasha, will I also get somewhat reasonable results with the tiny model (and save me a lot of time)?

Chanhee Jean
7 years 4 months ago

i don’t know how much accuracy you need from YOLO, but i was able to get pretty satisfying result with tiny model.
from what i remember, tiny model required around 2.xx gb of the memroy, so you may need to increase subdivision parameter by the power of two starting from 2 and then choose the one that doesn’t cause memory problem.

good luck.

Markus
7 years 4 months ago

Actually, that worked out quite fine: 2000 repetitions with a batchsize of 64 (subdivision 4)
already yielded good results. I trained it up till 10500 reps, but the gain was rather minor I would say. I will now work with stop + yieldsign, and then start with my own stuff. Thx for your help!

Chanhee Jean
7 years 4 months ago

my pleasure
Enjoy learning machine learning

SH Lee
7 years 4 months ago
Hello! Thank you for explaining and sharing the way how to train YOLO. When I try to train or validate, i get ‘segmentation fault (core dumped)’ error.. I used same cfg, src, weights files on this website for trying YOLO with stop and yieldsign detection. I am using Ubuntu14.04, GTX1080, CUDA8.0, also with RAM 16G.. I guess this is not because of the memory problem.. I changed subdivision number, however it doesn’t work…. Please help me out… —————————————————————————— 31 connected 4096 -> 588 32 Detection Layer forced: Using default ‘0’ Loading weights from weights/extraction.conv.weights…Done! Learning Rate: 0.0005, Momentum: 0.9, Decay:… Read more »
Chanhee Jean
7 years 4 months ago

Hi.
I’m not 100% sure, but you could check the followings

1. make sure you have compiled with GPU enabled (GPU=1 in your make file)

2. set proper number for gpu-architecture & gpu-code (they are also in makefile)
if you are using GTX1080(pascal) both should be 61

let me know how it goes

SH Lee
7 years 4 months ago

Thank you for the solution.. However, in my case, it doesn’t work… Still I get the same Segmentation fault error…

Andrey
7 years 4 months ago

Segmentation fault also tends to happen when you don’t have enough memory. How much RAM do you have?

Chanhee Jean
7 years 3 months ago

By now, you may have figured it out. But just in case you’re still struggling, read followings:
i happened to encounter same problem yesterday.
Mine was due to misspelling in cfg file.
accidentally added comma(,) or # of steps and scales doesn’t match.
please check your cfg file.

Win2888
5 years 6 months ago

Hello! Thank you for explaining and sharing the way how to train YOLO. When I try to train or validate, i get ‘segmentation fault (core dumped)’ error.. I used same cfg, src, weights files on this website for trying YOLO with stop and yieldsign detection. I am using Ubuntu14.04, GTX1080, CUDA8.0, also with RAM 16G.

Win2888

mơ thấy rắn
5 years 6 months ago

Actually, that worked out quite fine: 2000 repetitions with a batchsize of 64 (subdivision 4)
already yielded good results. I trained it up till 10500 reps, but the gain was rather minor I would say. I will now work with stop + yieldsign, and then start with my own stuff. Thx for your help!

mơ thấy rắn

Andrey
7 years 4 months ago

First of all, thanks a lot for your blog, it is extremely helpful.

Original YOLO tiny contains 20 classes. Is it possible to use the weights of the original model for a training of a different amount of classes? I guess that the difference should be only in the last layer. If not, do you have an idea of how to transfer the weights of the first N-1 layers?
Thanks

wpDiscuz

Start Training YOLO with Our Own Data