Henrik de Gyor: This is Tagging.tech. I am Henrik de Gyor. Today, I’m speaking with Avinash Jain. Avinash, how are you?
Avinash Jain: Good, how are you doing?
Henrik: Good. Avinash, who are you and what do you do?
Avinash: As you said, my name is Avinash Jain. I am the Founder and CEO of KeyIndia Graphics. We do keywording, retouching, video editing services for clients all over the world, in fact.
Henrik: Avinash, what are the biggest challenges and successes you’ve seen with keywording services?
Avinash: We’ve been doing this for 14 years. We’ve seen the keywording services domain sticking to the old way of still doing the manual keywording, but in the last four or five years we’ve seen a lot of good services come on board with technologies where it would really help us expedite keywording.
The challenges in the initial stages when I started this business were to train people as per high-quality standards. This challenge remained the same for the last 10, 12 years. The success has been…You have the right people who will do the keywording for you then you are in good shape. If not, then these keywords can be all over the place. A good keyword is like your best salesperson.
If you don’t have a good keyword, or too many keywords, then you can go all over the place. Now, we’ve started seeing, with the tagging services coming on board, quicker services which we can use in our internal workflow, is really helping our people expedite the entire process.
Henrik: Avinash, as of March 2016, how much of the keywording work is completed by people versus machines?
Avinash: It is still a hybrid solution. We haven’t found a solution where only machines are able to do the job because most of the clients do require high quality work. Compared to machines we’ve tried in the past, it does probably 50 percent of the job.
Still, somebody has to go in and look at it, and maybe delete the bad keywords, add proper keywords to it. There’s still quite amount, I would say 60–65 percent of the time, keywords which come out of a machine need to be rechecked again and again by experts.
Henrik: Avinash, what advice would you like to share with people looking into keywording services?
Avinash: The first important thing is to identify your internal workflow and who are you going to submit these images to. If it is going to be, example, Getty Images, then you want to keyword as per their style. Everybody, every other stock agency, has their own guidelines that they want to use in their keywording services.
First, is to identify what kind of keywords you are looking for. Then make a road map or a guideline to identify what keywords are required in your system. Secondly, identify, do you want keywords in Excel sheet? In IPTC? Do you want keywords to be done on your own site?
Once all these initial challenges are taken cared of, then it becomes much easier for any keywording company or services to go in and start keywording. A standardization has to be done much before, than actually started keywording.
You don’t want keywording one standard, and all of a sudden, after 10,000, 20,000 images down the road, you’d realize you’re not doing the right standard. The initial guideline has to be done correctly.
Henrik: Avinash, where can we find more information about keywording services?
Avinash: You could go to our site, keyindiagraphics.com, but if you Google keywording, keywording services, if you Google keywording tags, or if you Google keywording IPTC metadata information online, you’ll find quite a bit of information. It’s very easy to find out. Anybody who’s starting out to keyword those images should actually do all that research online.
Henrik: Well, thanks Avinash.
Avinash: Well, thank you. Thanks for the interview.
Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Georgi Kadrev. Georgi, how are you?
Georgi Kadrev: Hi, Henrik. All good. I am quite enthusiastic to participate in the podcast.
Henrik: Georgi, who are you and what do you do?
Georgi: I’m Co‑Founder and CEO of Imagga, which is one of the pretty good platforms for image recognition as a service. We have auto‑tagging and auto‑categorization services that you can use for practical use cases.
Henrik: Georgi, what are the biggest challenges and successes you’ve seen with image recognition?
Georgi: In terms of challenges, I think, one of the biggest ones is that we, as human beings, as people, we are used to perceive a lot of our world through our eyes. Basically, when people think in general about image recognition, they have a very diverse and a very complete picture of what it should do.
Let’s say from optical character recognition or recognizing texts, to facial recognition of a particular person, to conceptual tagging, to categorization, all these different kinds of aspects of visual perception.
People typically have expectations that it’s the same technology or the same solution, but actually, quite a lot of different approaches needs to be engaged into the actual process of recognizing and understand the semantics of the image.
In terms of successes, like addressing this, I can say that not surprisingly the deep learning thing that is quite a big hype in the last few years have been a huge success into the more conceptual or class‑level object recognition. This is what it is as a type of object.
Is it a bottle? Is it a dog? Is it a cat? Is it a computer? Is it mobile phone? and so on. This has become pretty practical, and right now we can say that we are close to human level in recognition of a lot of different classes of objects.
At the same time, in some other spaces, like lower recognition, like facial recognition, we also see quite big appreciation rates that allow for practical applications.
I can say one of the good things is that we are more and more closely to automating, at least, part of the tasks that needs to be performed by a computer, replacing the need for manual annotation of photos for different use cases.
In terms of challenges, maybe I would also add that you still need a lot of data, a properly annotated data. In machine learning and in deep learning in general, it’s very data‑greedy, so we need an enormous amount of samples to really make something robust enough and practical enough.
We still see the gathering a high‑quality dataset is one of the challenges. This is something that we also try to internally address. It helps us be more competitive in terms of quality and the technology.
Henrik: As of March 2016, how do you see image recognition changing?
Georgi: What we definitely see that there are more and more services. Some of them are pretty good quality that try to automate different aspects of image recognition that I briefly tackled.
We see even big players like Google starting to offer services for some of those things like what they call label recognition or what we call tagging, what they call optical character recognition or most of the vendors call it that way.
We also have seen logo and facial recognition being quite popular and being utilized more and more in different kinds of brand monitoring services.
At the same time, from the perspective of a bit of downside of visual recognition, something that we see when we talk about highly artistic images or some more specific art or other types of specific content, still the technologies needs to be customly‑trained for that.
If possible at all to train a classification‑based image recognition to recognize different kinds of artistic images or different kinds of very specialized image content.
It’s related with what I had mentioned in the beginning, that if you have a specific task, sometimes you need a specific approach. Deep learning to a certain extent has addressed this, but still it’s not like one-size-fits-all solution. We see that in a lot of cases the customers need to define a specific problem so that they can have a very good and precise specific solution.
Henrik: As of March 2016, how much of image recognition is completed by humans versus machines?
Georgi: I would say, [laughs] honestly depends on the task. We’ve seen some cases that machines can be better than humans and not just in theory, in practice.
For example, if we train a custom classifier with the human‑curated data set, and then we do some kind of testing or validation, we see that the errors, the things that are reported as errors in the learning process can actually mean errors by the people.
It’s mistaken when it has annotated the photo so that then it’s false reported as an error, although it’s correct. In a way, this is promising because it shows the automation and consistency that machines can do is pretty good in terms of precision.
At the same time, there are tasks where if you have a lot of explicit or implicit knowledge that you need to get in order to resolve an automation task. A lot of background knowledge that people have is not available for the machine and then you need to figure out a way how to either automate this or use a combination between a computer and a human, or you can decide this as a fully humanic task.
Still, it’s not approachable by technical approach. I cannot give an exact number. Something interesting that I can share is a statistic, we did a pretty interesting experiment called Clash of Tags, where we ask people. We have a data set of stock photography. This stock photography has existing set of tags provided by various people like the stock photographers themselves.
Then we also have the same set of images of stock photos that are annotated using current technology, completely blindly from the original tags that people have put for the image. Then, we do this thing, we ask people, “Type a keyword and then you get search results.”
One of the set of results on the left‑hand side or the right‑hand side is not known in advance, but one of the set of results is based on the tags that people have put, and the other set of results is based on the tags that our API has generated and has been assigned to the images.
The user needs to pick which is the winning set. In a lot of cases, I can say in 45 percent roughly of the cases, people have chosen that result set based on automatically generated tag is better than the set of results based on human‑provided tags. It’s not more than 50, but still means in a lot of cases the machine has been superior to the human performance.
I believe this number will grow in the future. I can say it’s still a way to go to something like complete automation, but we are getting closer and closer and we’re enthusiastic about it.
Henrik: Georgi, what advice would you like to share with people looking into image recognition?
Georgi: I would say, have a very clear idea of what kind of venue you want to drive out of that and try to optimize for that. Either working on it yourself or with a vendor. Make it really clear what are your objectives, what are your objections about image recognition. Just think from the practical aspect.
This is something that me, personally and the whole of our team has always been stressing on. Let’s see what it does and what it can do and what it can’t and address. If they’re really a pain that can be solved right now or not. Also from the vendor side, I would suggest don’t over‑promise because it’s quite easy to get people a bit confused.
They have an expectation like, “It’s AI so it can do anything?”, but you need to be realistic, so you save your time and you save your potential customer time. If the use case is very clear and if he was a professional then commit that this is going to work out, then go for it. Other than that, don’t waste time, yours and your potential customers.
This is something that we saw a lot, because a lot of people ask about features that currently technically are not practical enough or they ask about features that we don’t have. We learn the hard way and to certain extent to say, “This is possible, this is not possible currently from our perspective.”
Henrik: Where can we find more information about image recognition?
Georgi: Depending on what you need. Do you need more data for training, or do you need more basic knowledge, or do you need different kind of inspirations about business applications? There are different sources.
Obviously, ImageNet and all the accompanying information and the algorithms that we have around this pretty nice dataset is quite useful for researchers. We also have for beginners in image recognition, we have all these set of Coursera courses.
One of the most notable one from Stanford University. A few more pretty good ones from most of the top European or American universities. We have different kinds like newsletters and digests. AI Weekly is pretty good inspirational wise. There is some mixture of research topics, business cases, cool hacks and ideas about what you can do with image recognition.
Henrik: Well, thanks, Georgi.
Georgi: Thanks a lot, Henrik. I hope your audience will enjoy the podcast, including our participation in it.
Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Kevin Townsend. Kevin, how are you?
Kevin Townsend: Good, thank you.
Henrik: Kevin, who are you and what do you do?
Kevin: I’m the CEO and Managing Director for a company called KeedUp. What we do is keywording, but also adding other metadata, fixing images, image flow services; a whole heap of things, but keywording and metadata is really the core of what we do.
What makes us a little bit different to maybe some other keywording companies is that we started out from a basis of being involved in the industry as a syndicator/image seller. We were like a photo agency, photo representative, like many of our customers ‑‑ in fact almost all of our customers.
As a result, we’ve developed services in a somewhat different way. For instance, we operate 24 hours a day, seven days a week. We do celebrity as well as stock. Everybody that works for us pretty much is working in an office. There’s no piecework. Almost all of our staff are university graduates.
Henrik: Kevin, what are the biggest challenges and successes you’ve seen with keywording services?
Kevin: I think the biggest challenge, certainly for us, has been dealing with the multitude of requirements and the different systems that our customers work with. It’s never really a thing where you are just sent some images and are allowed to do whatever you like to them and provide the best keywording or the best metadata you can.
Everybody has their own things that they want done. There are all these different standards, like you might be keywording for a Getty Images standard, or back when it used to be a thing, the Corbis standard, and so on and so forth.
Dealing with all of those different things I think is the real big challenge in keywording and delivering exactly what people want. That’s the real key.
I think the successes, kind of related, is that we’ve built systems that have enabled us to cope with all of those different things, things such as our own workflow system called Piksee, which it really did cut out an awful lot of handling time and wastage just dealing with sets of images.
Or we have our own client database which records and enables all our staff to know exactly, down to the contributor level, all of the things that you maybe want to do differently for one photographer over another when it comes to metadata or fixing your images.
Just a whole series of things that, when I first started, I didn’t realize all of these nuances would come into play, but they really are crucial to delivering a good service.
The result of that has been that our reputation is such that we tend to work for the big names ‑‑ certainly in the news, celebrity, and increasingly in the stock area as well ‑‑ like Associated Press, like Splash News, and like Magnum. It’s being successful in that we’ve managed to defeat the problem, I suppose.
Henrik: As of early March 2016, how much of the keywording work is completed by people versus machines?
Kevin: I guess it depends on how you work that figure out. In terms of, if the question is how many of the images that we work on are touched by human beings deciding on what keywords go into the images, that figure is really 100 percent.
But, and this is important, the technology that you have to assist them in doing that and doing a good job is quite considerable. I don’t think that’s it’s appreciated, I think, often by maybe photographers, or particularly amateurs out there, exactly what goes into what I’d call professional keywording as opposed to “seat of your pants” keywording.
We don’t sit there very often and keyword one image after another, searching into our memory banks, trying to come up with the best keywords. There are systems, vocabularies. There are ways for handling the images, organizing the images.
So much technology is involved there to really make the humans that we have the best that they can be.
I have to say, in that regard, what we always are doing ‑‑ and as I said earlier, we employ almost exclusively university graduates, people who have degrees in communication studies or English, or art history ‑‑ is that we’re trying to have the best supercomputer to do the keywording, which is the human brain, and the most educated and best-programmed supercomputer.
Then we add the technology on top. So, yes, 100 percent of the work in the end is done by people, but certainly with a lot of assistance from technology.
If you look into the future, the far future, I feel sure that one-day artificial intelligence will probably do a lot of things for all of us in all sorts of areas we’re not even vaguely aware of now.
We’re starting to see some of that happen already in all sorts of things to do with apps on your phones that can tell you how to do this, that, and that other, and account for your heartbeat; all sorts of things that are happening with artificial intelligence, which is great.
When it comes to keywording, what I see is not very flattering at the moment, which is not to say that it may not get there in the end. But I think what I need to do is try to put things in a little bit of perspective, at least from where I see it.
The level of complication that I was talking about earlier, which is really the key to good keywording, I think is where at the moment AI keywording falls down completely, and even before that it’s falling over some hurdles right now.
On my blog recently, I did a post about one AI provider, and they invite you to put test images in to see what they can do. Well, [laughs] the result was particularly unedifying, in that a lot of the keywords were just completely wrong. The point of the images was completely missed. They weren’t able to name anybody in the images.
It was really a pretty poor effort, and even the examples they had on their website, showing what they considered to be successes, there were very few keywords in terms of what would be acceptable commercially.
Also, a lot of the keywords were extremely inane and almost pointless; certainly nothing that would fit into a vocab that you would be able to submit to Getty, for instance, or that would be acceptable to Alamy. This is a long, long, way from where it needs to get.
Perhaps the best analogy, that I could explain how I view things at the moment with AI and keywording, is a few years ago I went see the Honda robot which had come to town.
They had spent millions and millions and millions of dollars on this robot, and its big claim to fame was that it could walk upstairs, which it did. Not particularly well, but it did it. It was a great success, and everyone was very happy.
Thing is, any three‑year‑old kid in the audience could have run up and down those stairs and run around the robot many times.
I feel that AI keywording is a bit like that robot at the moment. Yes, it’s doing some rudimentary things, and that looks great, and people who think it’s a good idea and it’s all going to be wonderful, can shout about it, but it’s a long way from the reality of what humans are able to do. A long, long way.
I think where you have to consider the technology has to go is if you want to carry on the robot analogy, is to really be able to do the sort of keywording with concepts and meeting all these challenges of different standards, they have to be more like an android than they need to be like a robot that can assemble a motor vehicle.
Now, how long it’s going to take us to get to that sort of stage, I don’t know. I would be very doubtful that the amount of money and technology, and what have you, that would be needed to get us to that point is going to be directed towards keywording.
I’m sure there’ll be much more important things that sort of level of technology would be directed at. But certainly one day, maybe in my lifetime, maybe not, we’ll probably wake up and there’ll be androids doing keywording.
Henrik: Kevin, what advice would you like to share with people looking into keywording services?
Kevin: I think that it’s one of those things, it’s the oldest cliche, that you do get what you pay for, generally speaking.
We have had so many people who have come to us who have gone down the route of trying to save as much money as they could, and getting a really poor job done, finding it didn’t work for them, it wasn’t delivering what they wanted, and they’ve ended up coming and getting the job done properly.
For instance, at Magnum we have taken over the keywording there from what used to be crowd‑sourced keywording, which was particularly poor. That’s really made a big difference to them, and I know they’re very happy.
There are other examples that we’ve had over the years with people who’ve gone off and got poor keywording and regretted it. Just to use another old saying, no one ever regrets buying quality, and I think that is very true with keywording.
Henrik: Where can we find more information about keywording services?
Kevin: Right. We have a website www.keedup.com. We have a blog. We are also on Facebook, on Twitter, and on LinkedIn. We’re in a lots of different places. If you go there as a starting point, there are links there to other sites that we have. That’s a good place to start.
We have a site called coreceleb.com that’s a site which is an offshoot of what we do, which is focused really on editing down and curating the images that people are creating, so that you have more sales impact.
We also have brandkeywording.com, which is focused on adding information about brands that celebrities are wearing and using; not just fashion, but also what cars they drive, all sorts of things really to add new revenue streams, particularly for celebrity photo agencies, but also there’s no reason why that doesn’t include sports news and even stock.
Those are two which are really pretty important as well.
Henrik: Thanks, Kevin.
Kevin: Good. [laughs] I hope that will give people some food for thought.