This is my first attempt to note blog using English. If you find some grammar mistakes or be confused in some details, please forgive and correct me.

# Dataset Preprocessing

Our task is triple classes problem. due to the limitation of the confidentiality agreement, i do not put any original image in this blog. For the convenience of presentation, then i use another dataset which name is CamVid(download here) to prove that training processing is correct.

Main steps of dataset preprocessing are as follows:

• Data annotation
• Index Creation
• TFRecord data Generation

## Data annotation

First i want to explain a basic concept which is ignore_label, beacuse i have been wasting a lot of time here.

### ignore_label

Be careful do not confuse ignore_label with background. ignore_label is some pixels in image that we do not care about it. You would typically ignore labels for areas that mark delineations between classes, or areas where the class is undefined. As a general comment, note that background should not generally be ignored, and ignore_label is not involved in the calculation of loss.

### grey value of annotation mask

The groundtruth label should contain only 1 channel(Grayscale Image) , and it is recommended to be .png format.

if your dataset have n classes including background, you should label all pixels from 0 to n-1. Note that do not label the grey value of pixels into 10,20,100,etc…(because if you do this and the tensorflow code would match the grey value directly with the object class, and it will interfere with calculation of loss.)

• set all background mask = 0
• set object1~n-1 to 1~1-n
• if your dataset include ignore_label,please set = 255

CamVid dataset have 11 classes(no ignore_label), and my dataset hava 3 classes(including background, no ignore_label).

Here is an example:

original image:

mask image(due to camvid have not ignore_label, so it have not any white pixel):

## Index Creation

Index Creation means spliting dataset to three parts(train/validate/test dataset). We need create three .txt files to indicate how to split dataset.

First, you would put original images and masks into specified folder, here is my setting:

• /root/dataset/CamVid/image: saving all original images(701 images)
• /root/dataset/CamVid/mask: saving all masks(701 masks, corresonding to original images)

Then, we create three index files in /root/dataset/CamVid/index folder:

• train.txt: index of train dataset
• trainval.txt: index of validate dataset
• val.txt: index of test dataset

all .txt files should only include name of original images. For CamVid dataset, the blueprint for the .txt file can be download form here, then you can use sublime Text or other text tools to modify ,txt files.

Here is the screenshot of train/val.txt:

## TFRecord data Generation

so we uses build_voc2012_data.py and the above generated files to make TFRecord files, refer some commends in download_and_convert_voc2012.sh:

• ${IMAGE_FOLDER} : the path of saving original images • ${SEMANTIC_SEG_FOLDER}: the path of saving masks
• \${LIST_FOLDER}: the path of three index files
• image_format: the format of original images, and it is png format in CamVid
• output_dir: the path for saving generated TFRecord files (mkdir by yourself)

For CamVid dataset, using commends like this:

Here is the screenshot of running build_voc2012_data.py script:

and all generated TFRecord files can be find in /root/dataset/CamVid/tfrecord:

by the way, also you can download all the complete files in index.zip

# Modify training script

Base on the DeepLab repo, we mainly need modify the following documents:

• segmentation_dataset.py file
• train_utils.py file

## segmentation_dataset.py

For segmentation_dataset.py file L110, we need add the corresponding description about dataset.

For example, the description of _CAMVID dataset:

For my own dataset, we have three ojects inclding object1,object2,background, adding ignore_label, so num_classes=4:

Register dataset:

Furthermore, for segmentation_dataset.py file L112, also should add the name of dataset description:

## train_utils.py

Since the num_classes may be different, we need modify the setting of restore weight about logits layer in train_utils.py file L109:

### sampling imbance

Refer the explanation of DeepLabv3+’s first author aquariusjay. If the data samples may be strongly biased to one of the classes, we call this imblance.

Because the problem on the CamVid dataset can be ignored, here we take my dataset as example, my task is three classification tasks(background,object1,object2) and has serious imblance problem.

To handle that, we suggest you using loss_weight for the undersampled class intrain_utils.py file L70. In my task, the background pixels account for a large proportion, and object1 more than object2, so the weight ratio is 1:10:15:

In this step, i used to confuse ignore_label and background. Finally i label background=0 and correspending weight label0_weight=1, and object1=1 and label1_weight = 10, etc..

# Training and Visualization

Refer the explanation in github- aquariusjay.

if you want to fine-tune DeepLab on your own dataset, then you can modify some parameters in train.py, here has some options:

• you want to re-use all the trained wieghts, set initialize_last_layer=True
• you want to re-use only the network backbone, set initialize_last_layer=False and last_layers_contain_logits_only=False
• you want to re-use all the trained weights except the logits(since the num_classes may be different), set initialize_last_layer=False and last_layers_contain_logits_only=True

Finally, my setting is as follows:

• initialize_last_layer=False
• last_layers_contain_logits_only=True

## Preliminary training

when we training onCamVid,there is no consideration of the imablance problem, if your task has imblance problem, please refer to the problems in Troubleshot chapter.

follow the demo in deeplab repo, there are some parameters we need modfiy:

• tf_initial_checkpoint: the path of pretrained weights. Because CamVid are similar to CityScapes, so we use pretrain weight on CityScapes

• train_logdir: the path of training checkpoint files

• dataset_dir: the path of dataset TFRecord files
• dataset: the name of dataset description in segmentation_dataset.py

training commend on CamVid are as follows:

Here training step only set 300, crop_size = 513 and batchsize=2, just test whether the training commend can be executed right.

there are screenshot of outputs:

### Visualization

deepLab repo also provide evaluation and visualization tools, here we test the setting about CamVid dataset. Because the image size of CamVid is different from CityScapes, here has some parameters as follows:

• vis_split: the category of tfrecord file
• vis_crop_size: the size of input image (360,480)
• dataset: the name of dataset description in segmentation_dataset.py
• dataset_dir: the path of dataset TFRecord files
• colormap_type: the colormap of annotation

Finally, vis commend on CamVid are as follows:

there are screenshot of outputs:

and i select some prediction as follows:

and we can see that the model can run correctly, although the results are not good.

### Evaluation

There are some parameters in eval commend should be modify:

• eval_split: the category of tfrecord file
• crop_size: the size of input image (360,480)
• dataset: the name of dataset description in segmentation_dataset.py
• dataset_dir: the path of dataset TFRecord files

Finally, eval commend on CamVid are as follows:

there are screenshot of outputs:

The result is not cool(mIoU=0.149), but it proves that there is no big problem in our training commend.

After the preliminary training, we have made a further modify on training commend. I delete the trained weights saving in train_logdir, and modify some parameters as follows:

• training_number_of_steps: set to 3000
• crop_size: set to 321
• batch_size: imporve to 4

and new training commend on CamVid are as follows:

There are screenshot of outputs:

### Visualization and Evaluation

Reusing eval.py for testing:

as we can see, the new result(mIoU:0.401) has been significantly improved.

and reusing vis.py for visualization:

and the new prediction has been pretty face.

# Troubleshot

the main mistake on my own dataset as follows:

• 2: confusion between ignore_label and background
• 3: problem in setting weight

This is very important step about how to generate mask in the front chapter. i make a mistake to set piexl of different objects(including background) to 0,100,150,etc… , and it leads to false prediction and could hardly solve imblance problem.

## Confusion between ignore_label and background

For ignore_label i used to mistakenly set 0 in segmentation_dataset.py:

and also mistakenly set num_classes to 3.

there are screenshot of train_utils.py :

Because ignore_label is set to 0, and that will cuase background not being involved in calculation of loss.

As we can see following image (Only 200 taining step) and it proves model has learn some information but there are some problems:

## Problem in setting weight

### Prediction always same color

Because we have some mistakes on calculation of loss, there is a problem between the weights of the corresponding classes:

The prediction is all blue/black/green, due to corresponding object2/background/object1 weight are too large and do not consider background. that would make model do not calcuate loss of object2/background/object1, so model can get a pretty loss by simply predicting blue/black/green.

## Successful training

we set weight ratio is 1:10:15, we can get accurate weight ratio via object statistic:

This is test result between 4000 step and 200 step:

# References

Refer aquariusjay explanation about training parameters:

Refer aquariusjay explanation about imblance problem: