tagging.tech

Audio, Image and video keywording. By people and machines.


Leave a comment

Tagging.tech interview with Kevin Townsend

Tagging.tech presents an audio interview with Kevin Townsend about keywording services

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Kevin Townsend. Kevin, how are you?

Kevin Townsend:  Good, thank you.

Henrik:  Kevin, who are you and what do you do?

Kevin:  I’m the CEO and Managing Director for a company called KeedUp. What we do is keywording, but also adding other metadata, fixing images, image flow services; a whole heap of things, but keywording and metadata is really the core of what we do.

What makes us a little bit different to maybe some other keywording companies is that we started out from a basis of being involved in the industry as a syndicator/image seller. We were like a photo agency, photo representative, like many of our customers ‑‑ in fact almost all of our customers.

As a result, we’ve developed services in a somewhat different way. For instance, we operate 24 hours a day, seven days a week. We do celebrity as well as stock. Everybody that works for us pretty much is working in an office. There’s no piecework. Almost all of our staff are university graduates.

Henrik:  Kevin, what are the biggest challenges and successes you’ve seen with keywording services?

Kevin:  I think the biggest challenge, certainly for us, has been dealing with the multitude of requirements and the different systems that our customers work with. It’s never really a thing where you are just sent some images and are allowed to do whatever you like to them and provide the best keywording or the best metadata you can.

Everybody has their own things that they want done. There are all these different standards, like you might be keywording for a Getty Images standard, or back when it used to be a thing, the Corbis standard, and so on and so forth.

Dealing with all of those different things I think is the real big challenge in keywording and delivering exactly what people want. That’s the real key.

I think the successes, kind of related, is that we’ve built systems that have enabled us to cope with all of those different things, things such as our own workflow system called Piksee, which it really did cut out an awful lot of handling time and wastage just dealing with sets of images.

Or we have our own client database which records and enables all our staff to know exactly, down to the contributor level, all of the things that you maybe want to do differently for one photographer over another when it comes to metadata or fixing your images.

Just a whole series of things that, when I first started, I didn’t realize all of these nuances would come into play, but they really are crucial to delivering a good service.

The result of that has been that our reputation is such that we tend to work for the big names ‑‑ certainly in the news, celebrity, and increasingly in the stock area as well ‑‑ like Associated Press, like Splash News, and like Magnum. It’s being successful in that we’ve managed to defeat the problem, I suppose.

Henrik:  As of early March 2016, how much of the keywording work is completed by people versus machines?

Kevin:  I guess it depends on how you work that figure out. In terms of, if the question is how many of the images that we work on are touched by human beings deciding on what keywords go into the images, that figure is really 100 percent.

But, and this is important, the technology that you have to assist them in doing that and doing a good job is quite considerable. I don’t think that’s it’s appreciated, I think, often by maybe photographers, or particularly amateurs out there, exactly what goes into what I’d call professional keywording as opposed to “seat of your pants” keywording.

We don’t sit there very often and keyword one image after another, searching into our memory banks, trying to come up with the best keywords. There are systems, vocabularies. There are ways for handling the images, organizing the images.

So much technology is involved there to really make the humans that we have the best that they can be.

I have to say, in that regard, what we always are doing ‑‑ and as I said earlier, we employ almost exclusively university graduates, people who have degrees in communication studies or English, or art history ‑‑ is that we’re trying to have the best supercomputer to do the keywording, which is the human brain, and the most educated and best-programmed supercomputer.

Then we add the technology on top. So, yes, 100 percent of the work in the end is done by people, but certainly with a lot of assistance from technology.

If you look into the future, the far future, I feel sure that one-day artificial intelligence will probably do a lot of things for all of us in all sorts of areas we’re not even vaguely aware of now.

We’re starting to see some of that happen already in all sorts of things to do with apps on your phones that can tell you how to do this, that, and that other, and account for your heartbeat; all sorts of things that are happening with artificial intelligence, which is great.

When it comes to keywording, what I see is not very flattering at the moment, which is not to say that it may not get there in the end. But I think what I need to do is try to put things in a little bit of perspective, at least from where I see it.

The level of complication that I was talking about earlier, which is really the key to good keywording, I think is where at the moment AI keywording falls down completely, and even before that it’s falling over some hurdles right now.

On my blog recently, I did a post about one AI provider, and they invite you to put test images in to see what they can do. Well, [laughs] the result was particularly unedifying, in that a lot of the keywords were just completely wrong. The point of the images was completely missed. They weren’t able to name anybody in the images.

It was really a pretty poor effort, and even the examples they had on their website, showing what they considered to be successes, there were very few keywords in terms of what would be acceptable commercially.

Also, a lot of the keywords were extremely inane and almost pointless; certainly nothing that would fit into a vocab that you would be able to submit to Getty, for instance, or that would be acceptable to Alamy. This is a long, long, way from where it needs to get.

Perhaps the best analogy, that I could explain how I view things at the moment with AI and keywording, is a few years ago I went see the Honda robot which had come to town.

They had spent millions and millions and millions of dollars on this robot, and its big claim to fame was that it could walk upstairs, which it did. Not particularly well, but it did it. It was a great success, and everyone was very happy.

Thing is, any three‑year‑old kid in the audience could have run up and down those stairs and run around the robot many times.

I feel that AI keywording is a bit like that robot at the moment. Yes, it’s doing some rudimentary things, and that looks great, and people who think it’s a good idea and it’s all going to be wonderful, can shout about it, but it’s a long way from the reality of what humans are able to do. A long, long way.

I think where you have to consider the technology has to go is if you want to carry on the robot analogy, is to really be able to do the sort of keywording with concepts and meeting all these challenges of different standards, they have to be more like an android than they need to be like a robot that can assemble a motor vehicle.

Now, how long it’s going to take us to get to that sort of stage, I don’t know. I would be very doubtful that the amount of money and technology, and what have you, that would be needed to get us to that point is going to be directed towards keywording.

I’m sure there’ll be much more important things that sort of level of technology would be directed at. But certainly one day, maybe in my lifetime, maybe not, we’ll probably wake up and there’ll be androids doing keywording.

Henrik:  Kevin, what advice would you like to share with people looking into keywording services?

Kevin:  I think that it’s one of those things, it’s the oldest cliche, that you do get what you pay for, generally speaking.

We have had so many people who have come to us who have gone down the route of trying to save as much money as they could, and getting a really poor job done, finding it didn’t work for them, it wasn’t delivering what they wanted, and they’ve ended up coming and getting the job done properly.

For instance, at Magnum we have taken over the keywording there from what used to be crowd‑sourced keywording, which was particularly poor. That’s really made a big difference to them, and I know they’re very happy.

There are other examples that we’ve had over the years with people who’ve gone off and got poor keywording and regretted it. Just to use another old saying, no one ever regrets buying quality, and I think that is very true with keywording.

Henrik:  Where can we find more information about keywording services?

Kevin:  Right. We have a website www.keedup.com. We have a blog. We are also on Facebook, on Twitter, and on LinkedIn. We’re in a lots of different places. If you go there as a starting point, there are links there to other sites that we have. That’s a good place to start.

We have a site called coreceleb.com that’s a site which is an offshoot of what we do, which is focused really on editing down and curating the images that people are creating, so that you have more sales impact.

We also have brandkeywording.com, which is focused on adding information about brands that celebrities are wearing and using; not just fashion, but also what cars they drive, all sorts of things really to add new revenue streams, particularly for celebrity photo agencies, but also there’s no reason why that doesn’t include sports news and even stock.

Those are two which are really pretty important as well.

Henrik:  Thanks, Kevin.

Kevin:  Good. [laughs] I hope that will give people some food for thought.

Henrik:  For more on this visit Tagging.tech.

Thanks again.


 

For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Matthew Zeiler

Tagging.tech presents an audio interview with Matthew Zeiler about image recognition

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

Henrik de Gyor:  [00:02] This is TaggingTech. I’m Henrik de Gyor. Today I’m speaking with Matthew Zeiler.

Matthew, how are you?

Matthew Zeiler:  [00:06] Good. How are you?

Henrik:  [00:07] Good. Matthew, who are you, and what do you do?

Matthew:  [00:12] I am a founder and CEO of Clarifai. We are a technology company in New York, that has technology that lets the computer see automatically. You can send us an image or a video, and we’ll tell you exactly what’s in it. That means, all the objects like car, dog, tree, mountain.

[00:32] Even descriptive words like love, romance, togetherness, are understood automatically by our technology. We make this technology available to enterprises and developers, through very simple APIs. You can literally send an image with about three lines of code, and we’ll tell you a whole list of objects.

[00:53] As well as how confident we are that those objects appear within the image or video.

Henrik:  [00:58] Matthew, what are the biggest challenges and successes you’ve seen with image and video recognition?

Matthew:  [01:03] It’s really exciting. We started this company about two years ago, in November, 2013. We scaled it up to now over 30 people. Since the beginning, we kicked it off by winning this competition, called ImageNet. This competition is held every year. An international competition where researchers submit, and the largest companies submit, and we won the top five places.

[01:27] That was key in order to get recognition. Both in the research community, but even more importantly in enterprise community. Since then we’ve had tremendous amount of inbound across a wide variety of verticals. We’ve seen the problems in wedding domain, travel, real estate, asset management. In consumer photos, social media.

[01:50] Every possible vertical and domain you can think of that has image or video content. We have paying customers. We’re solving problems that range from organizing the photos inside your pocket…we actually launched our own consumer app for this in December [2015], called Forevery, which is really exciting. Anyone with an iPhone [could] check it out.

[02:10] All the way to media companies, being able to tag their content for internal use. The tagging is very broad, to understand every possible aspect in the world. We can also get really fine‑grained. Even down to the terms and conditions that you put up for your users to upload content to your products.

[02:33] We can tailor our recognition system to help you moderate that content, and filter out the unwanted content before it reaches your live site. Lots of really exciting applications, and huge successes for both image and video.

[02:48] I think one of the early challenges, when we started two years ago, was really demonstrating that the value of this technology can provide to an enterprise, and explaining what the technology is. A lot of people heard about image recognition, or heard the phrase at least, for decades.

[03:06] It’s because it’s been in research for decades. People have been trying to solve this problem, in making computers see. Not until very recently has this happened. Now they’re seeing this technology actually work in real applications. Not just on the demo that you can see at clarifai.com, where you can throw in your own image.

[03:26] You see it happen in real‑time, but in actual products that people use every day. From customers like Vimeo to improve their video search, or Style Me Pretty to improve their management of all of their wedding albums. Or Trivago, to improve search over hotel listings.

[03:43] When you start seeing these experiences be improved, Clarifai is at the forefront there, of integrating with these leading companies across these different verticals. It went from this challenge of educating the community and enterprises about what this technology does to, now finding the best ways to integrate it.

Henrik:  [04:03] As of early March 2016, how do you see image and video recognition changing?

Matthew:  [04:09] When I started the company about two years ago, a general model that could recognize a 1,000 concepts, was pretty much state of the art. That’s what won ImageNet, when we kicked off the company. Now, we’ve extended that to over 11,000 different concepts that we can recognize and evolved it to recognize things beyond just objects, like I mentioned.

[04:33] Now, you can see these descriptive words, like idyllic, which will bring up beach photos. Or scenic, which will bring up nice mountain shots. Or nice weather shots, where it’s snowing, and snow on the trees. Just beautiful stuff like that. That people would describe images in this way, but we’ve taught machines to do the same thing.

[04:56] I think, going forward, you’ll see a lot more of this expansion in the capability of the machine learning technology that we use. Also a whole personalization of it. What we’ve seen with the expansion of concepts is, it’s never going to be enough. You want to give the functionality to your users, to let them customize it in the way they talk about the world.

[05:21] There’s a few concrete examples here. In stock media, we sit at the upload process of a lot of stock media sites. A pro photographer might upload an image, and they used to have to manually tag it, but this is a very slow process. We do it in real‑time. We give them the ability to remove some tags, and add some tags, and then it’s uploaded to the site.

[05:45] What this does with the stock media company, is give a much more consistent experience for buyers. If you let different people who don’t know each other, and grew up in different backgrounds, in different parts of the world, all tag their own content, they all talk with different vocabularies.

[06:01] When a buyer comes and talks with their vocabulary, and searches on the site, they get pretty much random results. It’s not the ideal and optimal results. Whereas using Clarifai, you’ll get a consistent view of all of your data, and it’s tagged in the same way. It’s much better for the buyer experience as well.

[06:19] Another example is, in our app Forevery, we’ve baked in some new technology, that’s coming later this year to our enterprise customers, which is the ability to really personalize it to you. This is showing in two different parts of the application. One is around people, where you can actually teach the app your friends and family.

[06:42] The other is around things. You can teach it anything in the world. Whether it’s the name of your specific dog, or it’s the Eiffel Tower, or any of your favorite sports car. Something like that. You can customize it. It actually is training a model on the phone to be able to predict these things.

[07:01] I think, the future of machine learning and image and video recognition is this personalization. Because it becomes more emotionally connected to you, and more powerful. It’s the way you speak about the world and see the world. We’re really excited about that evolving.

Henrik:  [07:17] As of March, 2016, how much of the image and video recognition is done by people versus machines?

Matthew:  [07:24] That’s a great question. I don’t know the concrete numbers. There’s a huge portion of our customers who were doing it manually before. We have a few case studies out there, for example, Style Me Pretty. They were doing exactly that. They had users upload a wedding album, which, as you know, might be a 1,000, 2,000 photos from a weekend wedding.

[07:47] They had a moderation team to look through all that content, and tag it. Because ultimately they want other people to come to their site, to search and find inspirations. Now we’re allowing Style Me Pretty to upload over 10 times more content onto their site, which ultimately drives more revenue for them.

[08:06] Because now they advertise next to this content. They need well‑tagged content, so both their users find it interesting, and they can match the best ads to it. Now we’re helping them automate that system. We see that over and over again across these verticals. People were doing it manually before.

[08:23] It was very costly and time‑consuming. We’re either making that faster, or scaling it up by orders and magnitude.

Henrik:  [08:30] Matthew, what advice would you like to share with people looking into image and video recognition?

Matthew:  [08:35] That’s a great question. There’s a few alternatives, and we literally just released a blog post yesterday about this. You want to consider a lot of different things, when deciding about visual recognition providers, or building the technology in‑house. What Clarifai does is take a lot of the pains out of the process.

[08:55] We have experts in‑house that have PhDs in this field of object recognition. Not just myself as a CEO, but also a whole research team, dedicated to pushing this technology forward, and applying it to new application areas. That’s kind of the expertise piece. We also have the data piece covered.

[09:15] If you come to us, and you want to recognize cars and trees and dogs, you don’t need any label data that has those tags already associated with it. We’ve done that process of collecting data, either from the web or from our partners, and we’ve trained a model to recognize these things automatically.

[09:34] This is as broad as possible. We do the job of curating it, so that it’s very high quality, and it doesn’t have any obscene types of concepts, that you wouldn’t want your users to be exposed to. So it’s very nicely packaged for you. Then finally, we take away the need for extensive resources as well.

[09:53] We make it so you don’t need extra machines or specialized machines. We actually use some very specialized hardware to do this efficiently. You don’t need the time it takes to train these models, which takes many weeks, or sometimes months, to get optimal performance. All that is taken care of. You literally just need three lines of code, in order to use Clarifai.

[10:15] Finally, there’s this component of independence that Clarifai has, that some other providers don’t. As a small company, we’re corely focused on understanding every image and video, to improve life. We want to apply this technology to every possible vertical, and solve every possible problem that we can, without competing with our customers.

[10:38] There are some big entries in this space, where they’re building divisions within their companies that end up competing with you. If you’re a big enterprise, looking for image and video recognition, you have to consider that as well. Basically, do you trust the provider of this technology with your data?

[10:56] Because long‑term, you want to make a partnership that you both benefit from, and don’t have to be afraid of. That’s what Clarifai provides, and we make this very affordable for you, and very simple for you to use.

Henrik:  [11:09] Matthew, where can we find out more information about image and video recognition?

Matthew:  [11:13] I would check out Clarifai’s blog. One of the goals of our marketing department, is to educate the world about what visual recognition is. Not only how we do it, but how the technology works, and where you can get more resources for it. That’ll be the one‑stop shot. The first check‑out is that blog.clarifai.com. We regularly update it with information.

[11:37] There’s also a lot of great resources online. The research community…if you really want to dive into the details. What this community has evolved to do, is actually not wait for conferences or journal publications, but actually publish regularly to an open community of publications, so that the latest research is always available.

[12:00] That’s something really unique in this image and video recognition space, that we don’t see in other fields of research. Depending on what stage you’re at in understanding this technology, you’ll get high-level details from Clarifai’s blog. Then low level, all the way from the research community.

Henrik:  [12:16] Well, thanks Matthew.

Matthew:  [12:17] Thank you.

Henrik:  [12:18] For more of this, visit tagging.tech.

Thanks again.


For a book about this, visit keywordingnow.com