boclog[2] - Image annotations and Dataset versioning using Weights & Biases

boclog[2] - Image annotations and Dataset versioning using Weights & Biases


4 min read

One of the most challenging things is annotating a dataset, but insanely satisfying when you get your object detection model to perform better.

Initially was going to make this an image classifier but a classifier seemed sort of boring plus everyone's doing it so why not spice things up a bit.

Image annotations using makesense(an image labeling tool) supports exporting the image annotations in YOLO format which I intend to use to train a YOLO object detection model. Now if like me, you have terrible upload speeds you can set it up locally using the following steps from their Github repo.

# clone repository
git clone

# navigate to the main dir
cd make-sense

# install dependencies
npm install

# serve with hot reload at localhost:3000
npm start

Visiting localhost:3000 opens this browser page,


Simply click on getting started, select the images you want to annotate, create the labels and start annotating.

Image uploadCreate labels

When done annotating click on export labels and choose a zip package with YOLO format. This should download the zipped labels to you a directory of your choosing.

Splitting the data into training and validation

Once the labels are downloaded, unzip them to the same directory as the images are stored. Also, make sure to add in extra images with no labels, e.g in my case, I added in a few healthy plant images.

NB: there are two folders, one with the images and one with the labels, these two folders are in the same folder, let's call it dataset.

Splitting the data

We can use the below script to split the data into 2 sets, training, and a validation set.

import os
import random
import shutil

imgList = os.listdir('images')

#shuffling images

split = 0.2

train_path = 'custom_dataset/train'
val_path = 'custom_dataset/val'

if os.path.isdir(train_path) == False:
if os.path.isdir(val_path) == False:

imgLen = len(imgList)
print("Images in total: ", imgLen)

train_images = imgList[: int(imgLen - (imgLen*split))]
val_images = imgList[int(imgLen - (imgLen*split)):]
print("Training images: ", len(train_images))
print("Validation images: ", len(val_images))

for imgName in train_images:
    og_path = os.path.join('images', imgName)
    target_path = os.path.join(train_path, imgName)

    print(og_path, target_path)

    shutil.copyfile(og_path, target_path)

    og_txt_path = os.path.join('labels', imgName.replace('.JPG', '.txt'))
    target_txt_path = os.path.join(train_path, imgName.replace('.JPG', '.txt'))

    try:shutil.copyfile(og_txt_path, target_txt_path)
    except Exception: pass
for imgName in val_images:
    og_path = os.path.join('images', imgName)
    target_path = os.path.join(val_path, imgName)
    shutil.copyfile(og_path, target_path)

    og_txt_path = os.path.join('labels', imgName.replace('.JPG', '.txt'))
    target_txt_path = os.path.join(val_path, imgName.replace('.JPG', '.txt'))

    try:shutil.copyfile(og_txt_path, target_txt_path)
    except Exception: pass

print("Done! ")

The above code creates a validation set of 20 percent, feel free to use a different validation percentage, and copies the labels and the images to the respective folders.

Data versioning using Weights & Biases

I like to think of Weights & Biases data versioning kind of like Git but for my data, read more about it on their website

  • Yolov5 requires a YAML file with data paths, class names, and the number of classes. This is my YAML file
train: ~/data/boc/train/
val: ~/data/boc/val/
# Classes
nc: 8 # number of classes
names: [
    'Apple Scab',
    'Apple Cedar Rust',
    'Apple Frogeye Spot',
    'Maize Gray Leaf Spot',
    'Maize Leaf Blight',
    'Potato Blight',
    'Tomato Bacteria Spot',
    'Tomato Blight',
] # class names

Initially had more classes but after testing it out with a couple of training runs, I reduced the classes, but intend on increasing the number as in the next iterations.

  • Install Weights & Biases and login
    pip install wandb
    # login
    wand login
  • Upload the dataset to W&B as an artifact
    python yolov5/utils/loggers/wandb/ --project <project_name> --data labels.yaml
    The above command uploads the dataset into a W&B project with the specified name and the data path is from the YAML file created earlier.

Once the upload is done a new YAML file with a _wandb appended to it is created in the working folder, this is the file to be used to train the model. Passing this as the data path on a cloud VM or colab will automatically pull the data from W&B and start training.

Following the link to the dataset artifact will open the W&B project where the dataset is stored and can be browsed as a table showing the images, classes, and the image name.


These tables can be used to visualize and query the data interactively from the browser.

Now, all that's left is to train the model and get a performance baseline. Going to be in the next log.