Vision AI: Our journey teaching tech how to see like humans

Over the past few years, my team and I have been running experiments on vision AI, testing its ability to “see” and recognize objects the way human beings do.

You’re teaching AI too

It’s not just us doing this. Big tech companies have spent years in training their own vision AI. If you’re a Google Chrome user, you’ve probably been asked more than once to click on images of traffic lights or bicycles. Facebook has probably asked you to confirm photo tags of you or your friends. Every time you respond to these prompts, you’re actually helping their AI learn.

Through their users, big tech image training can produce fast results. However, this method can be costly, and we have used that constraint to explore an alternative way.

AI training is expensive

If I were to train a computer’s neural network to be truly visually “intelligent” - for instance if I want it to recognise a physical product - I believe using the actual objects instead of just pictures would produce much more accurate results.

But I’d need thousands and thousands of visual data - every shape, angle, and variation of what a box could look like and how it is used. It’s overwhelming to think about how much it costs to acquire, store, and feed data for the AI to consume. I’ll need lots of people and machines for all the objects I need to bring in.

Combining familiar tools with unfamiliar methods

To save time and money, and to increase prediction accuracy, we wanted to test the idea of using computer-rendered objects. In other words, training our vision AI with 3D objects and environments.

A single physical object render can be viewed in full 360° - generating synthetic data with a hundredfold variations for our AI model to consume. With a handful of samples and adjustable variables, we can multiply our datasets and introduce new variables as quickly as required.

Having a virtual training environment and automating annotation can speed up validation testing - and with our AI on edge, it is cutting us a ridiculous amount of cost.

Does it work?

So far, it is working for us and we are seeing positive results - we estimated that using a virtual 3D environment and objects to train a model, we are 47% faster compared to the standard approach. The current model can recognise physical products with an accuracy of 83%+, our training continues. More than producing more data, our challenge is to find a strategy that will allow us to have the most experiment cycle possible.

Is this the best approach? Perhaps others have tried this before. Tesla does simulation training using a detailed 3D map of the world and culling data from their fleet. There’s a study recommending 3D renders over natural images for teaching view-based OCR. Another research also used virtual data and showed promising results for large-scale training.

Our own testing has shown promise. We have successfully used 3D synthetic environments to train the ML model for Concept SALi, a smart self-service package shipping kiosk for Australia Post.

From there, we have built Viana, a product by meldCX, which is a Vision AI software that allows businesses to gather in-store demographics and content effectiveness data — without collecting any identifiable information on individuals. How, you may ask? By training our AI models with synthetic data.

To learn how your business can benefit by using AI, schedule a live demo with one of our product experts.

Originally published as Training Vision AI the unconventional way