Back to list

Uncovering Helmet Detection with YOLOv5 and FiftyOne: A Deep Dive

Date

May 12, 2025

Author

Akshay Atam

"It's not just building a model - it's about understanding what it sees."

Introduction

Last month, I attended the AI, ML & Computer Vision Meetup and Visual AI Hackathon in New York, where I had the opportunity to talk with some incredibly sharp minds in the field. One conversation that stuck with me was with Daniel Gural, Machine Learning Engineer at Voxel51, the creators of FiftyOne. Daniel's demo of how FiftyOne could help us see what our models were doing in real-time reframed how I thought about debugging ML models.

That moment led to this project - a custom helmet detections model built from scratch using YOLOv5 and analyzed with FiftyOne's web-based UI.

Dataset and Objective

The dataset I used in this project is the Safety Helmet Detection dataset available on Kaggle. It consists of 5000 images with three classes:

Helmet
Head
Person

Annotations for this dataset were in PASCAL VOC format, which were converted to YOLO format for compatibility. Images were split into training (4,400 images) and validation (600 images) sets.

The primary goal was to detect helmets for safety compliance, while also distinguishing them from heads and persons - a subtle but crucial difference in real-world applications in construction safety.

Training YOLOv5

While I could've gone with the route of using a pretrained model for this project, I chose to train the model from scratch. By going on this route, I could better see what the model was doing, where it was getting wrong in its predictions with the use of FiftyOne's web-UI.

My training configuration looked like the following:

Model: yolov5m.yaml
Epochs: 200
Image Size: 416x416
Batch size: 32
Training strategy: Full scratch (—-weights '')
Augmentations: YOLOv5 default
Rectangular training enabled

200 Epochs later…

Here's what the model achieved:

Class	Precision	Recall	mAP@0.5	mAP@0.5:0.95
`helmet`	0.933	0.833	0.945	0.603
`head`	0.887	0.792	0.854	0.542
`person`	0.000	0.000	0.000	0.000
All	0.607	0.558	0.600	0.382

The model performed well on helmet and head class but struggled with the person class, which only had ~80 instances, The imbalance led to low recall and confidence for the class, which was super helpful when inspecting with FiftyOne.

Debugging with FiftyOne

This is where the web-UI of FiftyOne truly shined. Using FiftyOne, I was able to:

Visually inspect the predictions of the model with the ground truth labels
Filter by class and confidence to identify underperforming scenarios
Compare predictions vs. ground truth for every image in a seamless UI

To load your dataset, simply run the following code on your jupyter notebook:

import fiftyone as fo 

name = "test-helmet-v3" 
dataset_dir = "../data" 

# The splits to load 
splits = ["train", "val"] 

# Load dataset 
dataset = fo.Dataset(name=name) 

for split in splits:
    dataset.add_dir(
        dataset_dir=dataset_dir,
        dataset_type=fo.types.YOLOv5Dataset,
        split=split,
        tags=split,
)

Once your dataset is loaded into FiftyOne, you can add your predictions from YOLOv5. Alternatively, you can also train and evaluate your model from FiftyOne as well. FiftyOne's Zoo library consists a plethora of models to choose from.

import os
from fiftyone.core.labels import Detections, Detection

# Path to prediction .txt files (from detect.py)
pred_label_dir = "../runs/detect/hd_preds/labels"
class_map = {0: "helmet", 1: "head", 2: "person"}

# Add predictions to val split
val_view = dataset.match_tags("val")

for sample in val_view:
    basename = os.path.splitext(os.path.basename(sample.filepath))[0]
    pred_path = os.path.join(pred_label_dir, f"{basename}.txt")
    detections = []

    if os.path.exists(pred_path):
        with open(pred_path) as f:
            for line in f:
                fields = line.strip().split()
                if len(fields) != 6:
                    continue
                cls_id, xc, yc, w, h, conf = map(float, fields)
                label = class_map[int(cls_id)]
                bbox = [xc - w/2, yc - h/2, w, h]
                detections.append(Detection(label=label, bounding_box=bbox, confidence=conf))

    sample["predictions"] = Detections(detections=detections)
    sample.save()

The final step is to run the evaluation and check

results = val_view.evaluate_detections(
    pred_field="predictions",
    gt_field="ground_truth",
    method="coco",
    classwise=True,
)

Now, just refresh your browser (or run session.view = val_view in your notebook if you're viewing the GUI on your notebook itself) and you would see something like this:

On the left, you can see the various filter options offered by FiftyOne. You can also view images which has high or low confidence on labels.

Expanding the ground_truth tab and selecting person from it and selecting a random image, I can see the images in which my model was not able to classify. FiftyOne also provides macro level control on the selected image. You can tweak the confidence and see how well your model preformed. All in real-time and not data logging from Voxel51's end.

You can choose what you want to see in order to gain inference from your run.

Insights

After spending a healthy amount of time, scrolling around and viewing the bounding boxes of my model's prediction with the ground truth labels, I found out that:

Most of the helmet detections were spot on. There were some low confidence detections of the tag in vague places.
The person class failed silently, with no confident predictions. FiftyOne made this immediately obvious with its filterable tags and confidence sliders.

What did I learn?

Training from scratch gave me deep control over model behavior, but required longer training.
FiftyOne was crucial in helping me not just see the numbers but understand why my model behaved the way it did.
Debugging in computer vision is no longer a pain - tools like FiftyOne makes it clear, interactive, and fun.

My next steps with this project will be to tinker with the model even more to improve person detection. I would also dive deeper into FityOne - seeing your data and making inference is just the tip of the iceberg. FiftyOne is quite versatile and honing this skill will be crucial in computer vision applications. Once I'll be comfortable, I would even contribute to FiftyOne!

Final Thoughts

This project was meaningful not only technically, but personally — sparked by a real conversation at a local meetup with someone passionate about computer vision. That moment turned into a real-world project that I hope shows the value of building tools you can trust and models you can see working.

If you would like to view the complete project, it is available on my Github.