Tagging.tech presents an audio interview with Georgi Kadrev about image recognition
Georgi Kadrev: Hi, Henrik. All good. I am quite enthusiastic to participate in the podcast.
Henrik: Georgi, who are you and what do you do?
Georgi: I’m Co‑Founder and CEO of Imagga, which is one of the pretty good platforms for image recognition as a service. We have auto‑tagging and auto‑categorization services that you can use for practical use cases.
Henrik: Georgi, what are the biggest challenges and successes you’ve seen with image recognition?
Georgi: In terms of challenges, I think, one of the biggest ones is that we, as human beings, as people, we are used to perceive a lot of our world through our eyes. Basically, when people think in general about image recognition, they have a very diverse and a very complete picture of what it should do.
Let’s say from optical character recognition or recognizing texts, to facial recognition of a particular person, to conceptual tagging, to categorization, all these different kinds of aspects of visual perception.
People typically have expectations that it’s the same technology or the same solution, but actually, quite a lot of different approaches needs to be engaged into the actual process of recognizing and understand the semantics of the image.
In terms of successes, like addressing this, I can say that not surprisingly the deep learning thing that is quite a big hype in the last few years have been a huge success into the more conceptual or class‑level object recognition. This is what it is as a type of object.
Is it a bottle? Is it a dog? Is it a cat? Is it a computer? Is it mobile phone? and so on. This has become pretty practical, and right now we can say that we are close to human level in recognition of a lot of different classes of objects.
At the same time, in some other spaces, like lower recognition, like facial recognition, we also see quite big appreciation rates that allow for practical applications.
I can say one of the good things is that we are more and more closely to automating, at least, part of the tasks that needs to be performed by a computer, replacing the need for manual annotation of photos for different use cases.
In terms of challenges, maybe I would also add that you still need a lot of data, a properly annotated data. In machine learning and in deep learning in general, it’s very data‑greedy, so we need an enormous amount of samples to really make something robust enough and practical enough.
We still see the gathering a high‑quality dataset is one of the challenges. This is something that we also try to internally address. It helps us be more competitive in terms of quality and the technology.
Henrik: As of March 2016, how do you see image recognition changing?
Georgi: What we definitely see that there are more and more services. Some of them are pretty good quality that try to automate different aspects of image recognition that I briefly tackled.
We see even big players like Google starting to offer services for some of those things like what they call label recognition or what we call tagging, what they call optical character recognition or most of the vendors call it that way.
We also have seen logo and facial recognition being quite popular and being utilized more and more in different kinds of brand monitoring services.
At the same time, from the perspective of a bit of downside of visual recognition, something that we see when we talk about highly artistic images or some more specific art or other types of specific content, still the technologies needs to be customly‑trained for that.
If possible at all to train a classification‑based image recognition to recognize different kinds of artistic images or different kinds of very specialized image content.
It’s related with what I had mentioned in the beginning, that if you have a specific task, sometimes you need a specific approach. Deep learning to a certain extent has addressed this, but still it’s not like one-size-fits-all solution. We see that in a lot of cases the customers need to define a specific problem so that they can have a very good and precise specific solution.
Henrik: As of March 2016, how much of image recognition is completed by humans versus machines?
Georgi: I would say, [laughs] honestly depends on the task. We’ve seen some cases that machines can be better than humans and not just in theory, in practice.
For example, if we train a custom classifier with the human‑curated data set, and then we do some kind of testing or validation, we see that the errors, the things that are reported as errors in the learning process can actually mean errors by the people.
It’s mistaken when it has annotated the photo so that then it’s false reported as an error, although it’s correct. In a way, this is promising because it shows the automation and consistency that machines can do is pretty good in terms of precision.
At the same time, there are tasks where if you have a lot of explicit or implicit knowledge that you need to get in order to resolve an automation task. A lot of background knowledge that people have is not available for the machine and then you need to figure out a way how to either automate this or use a combination between a computer and a human, or you can decide this as a fully humanic task.
Still, it’s not approachable by technical approach. I cannot give an exact number. Something interesting that I can share is a statistic, we did a pretty interesting experiment called Clash of Tags, where we ask people. We have a data set of stock photography. This stock photography has existing set of tags provided by various people like the stock photographers themselves.
Then we also have the same set of images of stock photos that are annotated using current technology, completely blindly from the original tags that people have put for the image. Then, we do this thing, we ask people, “Type a keyword and then you get search results.”
One of the set of results on the left‑hand side or the right‑hand side is not known in advance, but one of the set of results is based on the tags that people have put, and the other set of results is based on the tags that our API has generated and has been assigned to the images.
The user needs to pick which is the winning set. In a lot of cases, I can say in 45 percent roughly of the cases, people have chosen that result set based on automatically generated tag is better than the set of results based on human‑provided tags. It’s not more than 50, but still means in a lot of cases the machine has been superior to the human performance.
I believe this number will grow in the future. I can say it’s still a way to go to something like complete automation, but we are getting closer and closer and we’re enthusiastic about it.
Henrik: Georgi, what advice would you like to share with people looking into image recognition?
Georgi: I would say, have a very clear idea of what kind of venue you want to drive out of that and try to optimize for that. Either working on it yourself or with a vendor. Make it really clear what are your objectives, what are your objections about image recognition. Just think from the practical aspect.
This is something that me, personally and the whole of our team has always been stressing on. Let’s see what it does and what it can do and what it can’t and address. If they’re really a pain that can be solved right now or not. Also from the vendor side, I would suggest don’t over‑promise because it’s quite easy to get people a bit confused.
They have an expectation like, “It’s AI so it can do anything?”, but you need to be realistic, so you save your time and you save your potential customer time. If the use case is very clear and if he was a professional then commit that this is going to work out, then go for it. Other than that, don’t waste time, yours and your potential customers.
This is something that we saw a lot, because a lot of people ask about features that currently technically are not practical enough or they ask about features that we don’t have. We learn the hard way and to certain extent to say, “This is possible, this is not possible currently from our perspective.”
Henrik: Where can we find more information about image recognition?
Georgi: Depending on what you need. Do you need more data for training, or do you need more basic knowledge, or do you need different kind of inspirations about business applications? There are different sources.
Obviously, ImageNet and all the accompanying information and the algorithms that we have around this pretty nice dataset is quite useful for researchers. We also have for beginners in image recognition, we have all these set of Coursera courses.
One of the most notable one from Stanford University. A few more pretty good ones from most of the top European or American universities. We have different kinds like newsletters and digests. AI Weekly is pretty good inspirational wise. There is some mixture of research topics, business cases, cool hacks and ideas about what you can do with image recognition.
Henrik: Well, thanks, Georgi.
Georgi: Thanks a lot, Henrik. I hope your audience will enjoy the podcast, including our participation in it.
Henrik: For more on this, visit Tagging.tech.
For a book about this, visit keywordingnow.com