Avantages of synthetic over real images datasets
Solve data collection issues
Even if possible, in most cases, collecting real images is a daunting task. Privacy issues may also complicate the process. Procedural generation of synthetic datasets is a game changer. You create your images in a few clicks and avoid any privacy issues.
Perfect labeling
Manual labeling of real images is a slow and costly process. QA processes are required to filter out abnormal labels. With synthetic datasets, labels are instant, pixel perfect and bias free.
Fast, low cost and flexible
Our image generation platform is fast and self-service. You can build a synthetic dataset for a fraction of the cost of a real image dataset. A 3D scene and a fully labeled image matching your use case are produced in seconds. It’s a highly flexible platform. You can now easily extend your dataset to match each new edge case throughout your development cycle.
Optimizable
Since you control all the parameters involved in the creation of your synthetic images, you have all the tools to optimize their content. You can easily adjust the variance and distribution of all important parameters for your use case. This is key to get an efficient, high-performance training model that generalizes well.
MAKE YOUR DEVICES HAPPY! FEED THEM DATA!
Are synthetic datasets efficient to train your models?
To evaluate the efficiency of synthetic datasets to train a model, we conducted a series of benchmarks, comparing trainings done with synthetic images against trainings done with real images (COCO dataset https://cocodataset.org/#explore). As of today, the results were established for 2 different models (Yolo V5 and Mask R CNN), for 3 different tasks of increasing difficulty (sofas, beds and potted plants detection). We conducted these first tests with a limited number of assets (1000 assets in our database at this time). Our conclusions are the following:
The domain gap between training sets and validation sets or live images is not exclusive to synthetic datasets. It is a general issue which also exists from real images to real images.
In fact, synthetic images are generally more efficient than real images for training models. This might seem counter intuitive because synthetic images are less realistic than real images.
However, image realism is not key to train a model due to the domain gap. Variance and distribution of the parameters are the crucial factors to obtain a model which generalizes well.
Variance and distribution of parameters are not easily controllable with real images.
Models may be successfully pre-trained on synthetic images and fine-tuned on real images or the other way round. It depends on the task and on the model.
Benchmarking results
You can now implement your various use cases in a few clicks. Building your dataset is now easy, low-cost and fast. No more headaches.
01.
Real images training datasets were extracted from MS Coco ( https://cocodataset.org ) for each class of interest. We obtained 3682 images containing the label “bed”, 4618 containing the label “couch” and 4624 images containing the label “potted plant” from MS Coco.
02.
For each test, we used our procedural engine to generate a synthetic dataset. For beds” detection, we used a 63k synthetic dataset, for “couches”, 72k synthetic images and for “potted plants”, 99k images.
03.
We also used Imagenet ( https://www.image-net.org/download.php ) for pre-training models in several experiments.
04.
Validation Datasets were constructed for each class of interest from OpenImage (https://storage.googleapis.com/openimages/web/index.html ). We extracted 199 images containing the label “bed”, 799 images for the label “couch” and 1533 images for the label “plant”.
MAKE YOUR DEVICES HAPPY! FEED THEM DATA!
Bed AI Verse dataset sample










Experiment | Pretraining | Train set | Validation set | AP | AP50 | AP75 | Learning Rate | iteration/epoch |
---|---|---|---|---|---|---|---|---|
Coco Training | Imagenet | Coco | OpenImage | 44.84 | 82.83 | 44.85 | 0.002/steps | 70k |
AIVerse Training | Imagenet | AIVerse | OpenImage | 52.06 | 82.91 | 51.11 | 0.002/steps | 70k |
AIVerse pre-trained Fine-tuned on Coco | AIVerse | Coco | OpenImage | 51.70 | 84.54 | 52.64 | 0.002/steps | 70k |
Coco pre-trained Fine-tuned on AIVerse | Coco | AIVerse | OpenImage | 57.41 | 87.83 | 59.19 | 0.002/steps | 70k |
Experiment | Pretraining | Train set | Validation set | AP | AP50 | Best among #epoch |
---|---|---|---|---|---|---|
Coco Training | random | Coco | OpenImage | 53.53 | 81.72 | 300 |
AIVerse Training | AIVerse | AIVerse | OpenImage | 64.49 | 85.98 | 100 |
AIVerse pre-trained Fine-tuned on Coco | AIVerse | Coco | OpenImage | 68.31 | 90.92 | 300 |
Coco pre-trained Fine-tuned on AIVerse | Coco | AIVerse | OpenImage | 62.52 | 85.14 | 100 |
MAKE YOUR DEVICES HAPPY! FEED THEM DATA!
“Potted plants” and “Couch” AI Verse dataset sample










Experiment | Pretraining | Train set | Validation set | AP | AP50 | AP75 | Learning Rate | # Iterations |
---|---|---|---|---|---|---|---|---|
Coco Training | Imagenet | Coco | OpenImage | 44.73 | 80.95 | 44.30 | 0.002/steps | 70k |
AIVerse Training | Imagenet | AIVerse | OpenImage | 45.79 | 81.36 | 44.69 | 0.002/steps | 70k |
AIVerse pre-trained Fine-tuned on Coco | AIVerse | Coco | OpenImage | 46.64 | 81.75 | 46.16 | 0.002/steps | 70k |
Coco pre-trained Fine-tuned on AIVerse | Coco | AIVerse | OpenImage | 49.79 | 84.60 | 52.16 | 0.002/steps | 70k |
Experiment | Pretraining | Train set | Validation set | AP | AP50 | Best among #epoch |
---|---|---|---|---|---|---|
Coco Training | Random | Coco | OpenImage | 47.81 | 79.40 | 300 |
AIVerse Training | Random | AIVerse | OpenImage | 47.50 | 78.44 | 100 |
AIVerse pre-trained Fine-tuned on Coco | AIVerse | Coco | OpenImage | 50.86 | 83.83 | 300 |
Coco pre-trained Fine-tuned on AIVerse | Coco | AIVerse | OpenImage | 49.30 | 84.10 | 100 |
Experiment | Pretraining | Train set | Validation set | AP | AP50 | AP75 | Learning Rate | # Iterations |
---|---|---|---|---|---|---|---|---|
Coco Training | Imagenet | Coco | OpenImage | 38.87 | 75.00 | 37.44 | 0.002/steps | 70k |
AIVerse Training | Imagenet | AIVerse | OpenImage | 38.82 | 68.25 | 34.03 | 0.002/steps | 70k |
AIVerse pre-trained Fine-tuned on Coco | AIVerse | Coco | OpenImage | 43.66 | 77.38 | 43.31 | 0.002/steps | 70k |
Coco pre-trained Fine-tuned on AIVerse | Coco | AIVerse | OpenImage | 42.66 | 75.65 | 42.28 | 0.002/steps | 70k |
Experiment | Pretraining | Train set | Validation set | AP | AP50 | Best among #epoch |
---|---|---|---|---|---|---|
Coco Training | Random | Coco | OpenImage | 41.30 | 73.19 | 300 |
AIVerse Training | Random | AIVerse | OpenImage | 42.31 | 69.95 | 100 |
AIVerse pre-trained Fine-tuned on Coco | AIverse | Coco | OpenImage | 52.19 | 79.83 | 300 |
Coco pre-trained Fine-tuned on AIVerse | Coco | AIVerse | OpenImage | 42.95 | 69.72 | 100 |