tagging.tech

Audio, Image and video keywording. By people and machines.


Leave a comment

Tagging.tech interview with Georgi Kadrev

Tagging.tech presents an audio interview with Georgi Kadrev about image recognition

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoomCastBoxGoogle PlayRadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

 

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Georgi Kadrev. Georgi, how are you?

Georgi Kadrev:  Hi, Henrik. All good. I am quite enthusiastic to participate in the podcast.

Henrik:  Georgi, who are you and what do you do?

Georgi:  I’m Co‑Founder and CEO of Imagga, which is one of the pretty good platforms for image recognition as a service. We have auto‑tagging and auto‑categorization services that you can use for practical use cases.

Henrik:  Georgi, what are the biggest challenges and successes you’ve seen with image recognition?

Georgi:  In terms of challenges, I think, one of the biggest ones is that we, as human beings, as people, we are used to perceive a lot of our world through our eyes. Basically, when people think in general about image recognition, they have a very diverse and a very complete picture of what it should do.

Let’s say from optical character recognition or recognizing texts, to facial recognition of a particular person, to conceptual tagging, to categorization, all these different kinds of aspects of visual perception.

People typically have expectations that it’s the same technology or the same solution, but actually, quite a lot of different approaches needs to be engaged into the actual process of recognizing and understand the semantics of the image.

In terms of successes, like addressing this, I can say that not surprisingly the deep learning thing that is quite a big hype in the last few years have been a huge success into the more conceptual or class‑level object recognition. This is what it is as a type of object.

Is it a bottle? Is it a dog? Is it a cat? Is it a computer? Is it mobile phone? and so on. This has become pretty practical, and right now we can say that we are close to human level in recognition of a lot of different classes of objects.

At the same time, in some other spaces, like lower recognition, like facial recognition, we also see quite big appreciation rates that allow for practical applications.

I can say one of the good things is that we are more and more closely to automating, at least, part of the tasks that needs to be performed by a computer, replacing the need for manual annotation of photos for different use cases.

In terms of challenges, maybe I would also add that you still need a lot of data, a properly annotated data. In machine learning and in deep learning in general, it’s very data‑greedy, so we need an enormous amount of samples to really make something robust enough and practical enough.

We still see the gathering a high‑quality dataset is one of the challenges. This is something that we also try to internally address. It helps us be more competitive in terms of quality and the technology.

Henrik:  As of March 2016, how do you see image recognition changing?

Georgi:  What we definitely see that there are more and more services. Some of them are pretty good quality that try to automate different aspects of image recognition that I briefly tackled.

We see even big players like Google starting to offer services for some of those things like what they call label recognition or what we call tagging, what they call optical character recognition or most of the vendors call it that way.

We also have seen logo and facial recognition being quite popular and being utilized more and more in different kinds of brand monitoring services.

At the same time, from the perspective of a bit of downside of visual recognition, something that we see when we talk about highly artistic images or some more specific art or other types of specific content, still the technologies needs to be customly‑trained for that.

If possible at all to train a classification‑based image recognition to recognize different kinds of artistic images or different kinds of very specialized image content.

It’s related with what I had mentioned in the beginning, that if you have a specific task, sometimes you need a specific approach. Deep learning to a certain extent has addressed this, but still it’s not like one-size-fits-all solution. We see that in a lot of cases the customers need to define a specific problem so that they can have a very good and precise specific solution.

Henrik:  As of March 2016, how much of image recognition is completed by humans versus machines?

Georgi:  I would say, [laughs] honestly depends on the task. We’ve seen some cases that machines can be better than humans and not just in theory, in practice.

For example, if we train a custom classifier with the human‑curated data set, and then we do some kind of testing or validation, we see that the errors, the things that are reported as errors in the learning process can actually mean errors by the people.

It’s mistaken when it has annotated the photo so that then it’s false reported as an error, although it’s correct. In a way, this is promising because it shows the automation and consistency that machines can do is pretty good in terms of precision.

At the same time, there are tasks where if you have a lot of explicit or implicit knowledge that you need to get in order to resolve an automation task. A lot of background knowledge that people have is not available for the machine and then you need to figure out a way how to either automate this or use a combination between a computer and a human, or you can decide this as a fully humanic task.

Still, it’s not approachable by technical approach. I cannot give an exact number. Something interesting that I can share is a statistic, we did a pretty interesting experiment called Clash of Tags, where we ask people. We have a data set of stock photography. This stock photography has existing set of tags provided by various people like the stock photographers themselves.

Then we also have the same set of images of stock photos that are annotated using current technology, completely blindly from the original tags that people have put for the image. Then, we do this thing, we ask people, “Type a keyword and then you get search results.”

One of the set of results on the left‑hand side or the right‑hand side is not known in advance, but one of the set of results is based on the tags that people have put, and the other set of results is based on the tags that our API has generated and has been assigned to the images.

The user needs to pick which is the winning set. In a lot of cases, I can say in 45 percent roughly of the cases, people have chosen that result set based on automatically generated tag is better than the set of results based on human‑provided tags. It’s not more than 50, but still means in a lot of cases the machine has been superior to the human performance.

I believe this number will grow in the future. I can say it’s still a way to go to something like complete automation, but we are getting closer and closer and we’re enthusiastic about it.

Henrik:  Georgi, what advice would you like to share with people looking into image recognition?

Georgi:  I would say, have a very clear idea of what kind of venue you want to drive out of that and try to optimize for that. Either working on it yourself or with a vendor. Make it really clear what are your objectives, what are your objections about image recognition. Just think from the practical aspect.

This is something that me, personally and the whole of our team has always been stressing on. Let’s see what it does and what it can do and what it can’t and address. If they’re really a pain that can be solved right now or not. Also from the vendor side, I would suggest don’t over‑promise because it’s quite easy to get people a bit confused.

They have an expectation like, “It’s AI so it can do anything?”, but you need to be realistic, so you save your time and you save your potential customer time. If the use case is very clear and if he was a professional then commit that this is going to work out, then go for it. Other than that, don’t waste time, yours and your potential customers.

This is something that we saw a lot, because a lot of people ask about features that currently technically are not practical enough or they ask about features that we don’t have. We learn the hard way and to certain extent to say, “This is possible, this is not possible currently from our perspective.”

Henrik:  Where can we find more information about image recognition?

Georgi:  Depending on what you need. Do you need more data for training, or do you need more basic knowledge, or do you need different kind of inspirations about business applications? There are different sources.

Obviously, ImageNet and all the accompanying information and the algorithms that we have around this pretty nice dataset is quite useful for researchers. We also have for beginners in image recognition, we have all these set of Coursera courses.

One of the most notable one from Stanford University. A few more pretty good ones from most of the top European or American universities. We have different kinds like newsletters and digests. AI Weekly is pretty good inspirational wise. There is some mixture of research topics, business cases, cool hacks and ideas about what you can do with image recognition.

Henrik:  Well, thanks, Georgi.

Georgi:  Thanks a lot, Henrik. I hope your audience will enjoy the podcast, including our participation in it.

Henrik:  For more on this, visit Tagging.tech.

Thanks again.


 

For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Brad Folkens

Tagging.tech presents an audio interview with Brad Folkens about image recognition

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

 

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Brad Folkens. Brad, how are you?

Brad Folkens:  Good. How are you doing today?

Henrik:  Great. Brad, who are you and what do you do?

Brad:  My name’s Brad Folkens. I’m the CTO and co‑founder of CamFind Inc. We make an app that allows you to take a picture of anything and find out what it is, and an image recognition platform that powers everything and you can use as an API.

Henrik:  Brad, what are the biggest challenges and successes you’ve seen with image recognition?

Brad:  I think the biggest challenge with image recognition that we have today is truly understanding images. It’s something that computers have really been struggling with for decades in fact.

We saw that with voice before this. Voice was always kind of the promised frontier of the next computer‑human interface. It took many decades until we could actually reach a level of voice understanding. We saw that for the first time with Siri, with Cortana.

Now we’re kind of seeing the same sort of transition with image recognition as well. Image recognition is this technology that we’ve had promised to us for a long time. But it hasn’t quite crossed that threshold into true usefulness. Now we’re starting to see the emergence of true image understanding. I think that’s really where it changes from image recognition being a big challenge to starting to become a success when computers can finally understand the images that we’re sending them.

Henrik:  Brad, as of March 2016, how much of image recognition is done by humans versus machines?

Brad:  That’s a good question. Even in-house, quite a bit of it actually is done by machine now. When we first started out, we had a lot of human-assisted I would say image recognition. More and more of it now is done by computers. Essentially 100 percent of our image recognition is done by computers now, but we do have some human assistance as well. It really kind of depends on the case.

Internally, what we’re going for is what we call six-star answer. If you imagine a five-star answer is something where you take a picture of say a cat or a dog. We know generally what kind of breed it is. A six-star answer is the kind of answer where you take a picture of the same cat, and we know exactly what kind of breed it is. If you take a picture of a spider, we know exactly what kind of species that spider is every time. That’s what we’re going for.

Unsupervised computer learning is something that is definitely exciting, but I think we’re about 20 to 30 years beyond when we’re going to actually see unsupervised computer vision, unsupervised deep learning neural networks as something that actually finally achieves the promise that we expect from it. Until then, supervised deep learning neural networks is something that are going to be around for a long time.

What we’re really excited about is that we’ve really found a way to make that work in a way that’s a cloud site that customers are actually happy. The users of CamFind are happy with the kind of results that they’re getting out of it.

Henrik:  As of March 2016, how do you see image recognition changing?

Brad:  We talk a little bit about image understanding. I think where this is really going is to video next. Now that we’ve got some technology out there that understands images, really the next phase of this is moving into video. How can we truly automate and machine the understanding of video? I think that’s really the next big wave of what we’re going to see evolve in terms of image recognition.

Henrik:  What advice would you like to share with people looking into image recognition?

Brad:  I think what we need to focus on specifically is this new state of the art technology. It’s not quite new but of deep learning neural networks. Really we’ve played around…As computer scientists, we’ve screwed around a lot, for decades, with a lot of different machine learning types.

What really is fascinating about deep learning is it mimics the human brain. It really mimics how we as humans learn about the world around us. I think that we need to really inspire different ways of playing around with and modeling these neural networks, training them, on larger and larger amounts of real-world data. This is what we’ve really experimented is in training these neural networks on real-world data.

What we’ve found is that this is what truly brought about the paradigm shift that we were looking to achieve with deep learning neural networks. It’s really all about how we train them. For a long time, when we’ve been experimenting with image recognition, computer vision, these sorts of things. If you look at an applesto apples analogy, we’re trying to train computers very similarly to if we were to shut off all of our senses.

We have all these different senses. We have sight. We have sound. We have smell. We have our emotions. We learn about the world around us through all of these senses combined. That’s what form these very strong relationships in our memory that really teach us about things.

When you hold a ball in your hand, you see it in three dimensions because you’ve got stereoscopic vision, but you also feel the texture of it. You feel the weight of it. You feel the size. Maybe you smell the rubber or you have an emotional connection to playing with a ball as a child. All of these senses combined create your experience of what you know as a ball plus language and everything else.

Computers on the other hand, we feed them lots of two-dimensional images. It’s like if you were to close one of your eyes and look at the ball, but without any other senses at all, not a sense of touch, no sense of smell, no sense of sound, no emotional connection, none of those extra senses. It’s almost like if you’re flashing your eye for 30 milliseconds to that ball, tons of different pictures of the ball, and expecting to learn about it.

Of course, this isn’t how we learn about the world around. We learn about the world around through all these different senses and experiences and everything else. This is what we would like to inspire other computer scientists and those that are working with image recognition to really take this into account. Because this is really where we’ve seen as a company the biggest paradigm shift in image understanding and image cognition. We really want to try to push the envelope as far the state of the art as a whole. This is kind of where we see it going.

Henrik:  Where can we find more information about image recognition?

Brad:  It’s actually a great question. This is such a buzzword these days, especially in the past couple of years. Really, it sounds almost cheesy but just typing in a search into Google about image recognition brings up so much now.

If you’re a programmer, there’s a lot of different frameworks that you can get started with image recognition. You can get started with one of them’s called OpenCV. This is a little bit more of a toolbox for image recognition. It requires a little bit of an understanding of programming and a little bit of understanding of the math and the sciences behind it. This gives you a lot of tools for basic image recognition.

Then to play around with some of these other things I was talking about, deep learning, neural networks, there’s a couple of different frameworks out there. There’s actually this really cool JavaScript website where you can play around with a neural network in real time and see how it learns. This was really a fantastic resource that I like to send people to, kind of help them, give them an introduction to how neural networks work.

It’s pretty cool. You play with it, parameters. It comes up with…It paints a picture of a cat. It’s all in JavaScript, too, so it’s pretty simple and everything.

There’s two frameworks that we particularly like to play around with. One of them is called Cafe, and the other one is called Torch. Both of these are publicly available, open source projects and frameworks for deep learning neural networks. They’re a great place to play around with and learn, see how these things work.

Those are really what people tend to ask about image recognition and deep learning neural networks, that’s the sort of thing. I like to point them to because it’s great introduction and playground to get your feet wet and dirty with this type of technology.

Henrik:  Thanks, Brad.

Brad:  Absolutely. Thanks again.

Henrik:  For more on this, visit Tagging.tech.

Thanks again.


 

For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Matthew Zeiler

Tagging.tech presents an audio interview with Matthew Zeiler about image recognition

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

Henrik de Gyor:  [00:02] This is TaggingTech. I’m Henrik de Gyor. Today I’m speaking with Matthew Zeiler.

Matthew, how are you?

Matthew Zeiler:  [00:06] Good. How are you?

Henrik:  [00:07] Good. Matthew, who are you, and what do you do?

Matthew:  [00:12] I am a founder and CEO of Clarifai. We are a technology company in New York, that has technology that lets the computer see automatically. You can send us an image or a video, and we’ll tell you exactly what’s in it. That means, all the objects like car, dog, tree, mountain.

[00:32] Even descriptive words like love, romance, togetherness, are understood automatically by our technology. We make this technology available to enterprises and developers, through very simple APIs. You can literally send an image with about three lines of code, and we’ll tell you a whole list of objects.

[00:53] As well as how confident we are that those objects appear within the image or video.

Henrik:  [00:58] Matthew, what are the biggest challenges and successes you’ve seen with image and video recognition?

Matthew:  [01:03] It’s really exciting. We started this company about two years ago, in November, 2013. We scaled it up to now over 30 people. Since the beginning, we kicked it off by winning this competition, called ImageNet. This competition is held every year. An international competition where researchers submit, and the largest companies submit, and we won the top five places.

[01:27] That was key in order to get recognition. Both in the research community, but even more importantly in enterprise community. Since then we’ve had tremendous amount of inbound across a wide variety of verticals. We’ve seen the problems in wedding domain, travel, real estate, asset management. In consumer photos, social media.

[01:50] Every possible vertical and domain you can think of that has image or video content. We have paying customers. We’re solving problems that range from organizing the photos inside your pocket…we actually launched our own consumer app for this in December [2015], called Forevery, which is really exciting. Anyone with an iPhone [could] check it out.

[02:10] All the way to media companies, being able to tag their content for internal use. The tagging is very broad, to understand every possible aspect in the world. We can also get really fine‑grained. Even down to the terms and conditions that you put up for your users to upload content to your products.

[02:33] We can tailor our recognition system to help you moderate that content, and filter out the unwanted content before it reaches your live site. Lots of really exciting applications, and huge successes for both image and video.

[02:48] I think one of the early challenges, when we started two years ago, was really demonstrating that the value of this technology can provide to an enterprise, and explaining what the technology is. A lot of people heard about image recognition, or heard the phrase at least, for decades.

[03:06] It’s because it’s been in research for decades. People have been trying to solve this problem, in making computers see. Not until very recently has this happened. Now they’re seeing this technology actually work in real applications. Not just on the demo that you can see at clarifai.com, where you can throw in your own image.

[03:26] You see it happen in real‑time, but in actual products that people use every day. From customers like Vimeo to improve their video search, or Style Me Pretty to improve their management of all of their wedding albums. Or Trivago, to improve search over hotel listings.

[03:43] When you start seeing these experiences be improved, Clarifai is at the forefront there, of integrating with these leading companies across these different verticals. It went from this challenge of educating the community and enterprises about what this technology does to, now finding the best ways to integrate it.

Henrik:  [04:03] As of early March 2016, how do you see image and video recognition changing?

Matthew:  [04:09] When I started the company about two years ago, a general model that could recognize a 1,000 concepts, was pretty much state of the art. That’s what won ImageNet, when we kicked off the company. Now, we’ve extended that to over 11,000 different concepts that we can recognize and evolved it to recognize things beyond just objects, like I mentioned.

[04:33] Now, you can see these descriptive words, like idyllic, which will bring up beach photos. Or scenic, which will bring up nice mountain shots. Or nice weather shots, where it’s snowing, and snow on the trees. Just beautiful stuff like that. That people would describe images in this way, but we’ve taught machines to do the same thing.

[04:56] I think, going forward, you’ll see a lot more of this expansion in the capability of the machine learning technology that we use. Also a whole personalization of it. What we’ve seen with the expansion of concepts is, it’s never going to be enough. You want to give the functionality to your users, to let them customize it in the way they talk about the world.

[05:21] There’s a few concrete examples here. In stock media, we sit at the upload process of a lot of stock media sites. A pro photographer might upload an image, and they used to have to manually tag it, but this is a very slow process. We do it in real‑time. We give them the ability to remove some tags, and add some tags, and then it’s uploaded to the site.

[05:45] What this does with the stock media company, is give a much more consistent experience for buyers. If you let different people who don’t know each other, and grew up in different backgrounds, in different parts of the world, all tag their own content, they all talk with different vocabularies.

[06:01] When a buyer comes and talks with their vocabulary, and searches on the site, they get pretty much random results. It’s not the ideal and optimal results. Whereas using Clarifai, you’ll get a consistent view of all of your data, and it’s tagged in the same way. It’s much better for the buyer experience as well.

[06:19] Another example is, in our app Forevery, we’ve baked in some new technology, that’s coming later this year to our enterprise customers, which is the ability to really personalize it to you. This is showing in two different parts of the application. One is around people, where you can actually teach the app your friends and family.

[06:42] The other is around things. You can teach it anything in the world. Whether it’s the name of your specific dog, or it’s the Eiffel Tower, or any of your favorite sports car. Something like that. You can customize it. It actually is training a model on the phone to be able to predict these things.

[07:01] I think, the future of machine learning and image and video recognition is this personalization. Because it becomes more emotionally connected to you, and more powerful. It’s the way you speak about the world and see the world. We’re really excited about that evolving.

Henrik:  [07:17] As of March, 2016, how much of the image and video recognition is done by people versus machines?

Matthew:  [07:24] That’s a great question. I don’t know the concrete numbers. There’s a huge portion of our customers who were doing it manually before. We have a few case studies out there, for example, Style Me Pretty. They were doing exactly that. They had users upload a wedding album, which, as you know, might be a 1,000, 2,000 photos from a weekend wedding.

[07:47] They had a moderation team to look through all that content, and tag it. Because ultimately they want other people to come to their site, to search and find inspirations. Now we’re allowing Style Me Pretty to upload over 10 times more content onto their site, which ultimately drives more revenue for them.

[08:06] Because now they advertise next to this content. They need well‑tagged content, so both their users find it interesting, and they can match the best ads to it. Now we’re helping them automate that system. We see that over and over again across these verticals. People were doing it manually before.

[08:23] It was very costly and time‑consuming. We’re either making that faster, or scaling it up by orders and magnitude.

Henrik:  [08:30] Matthew, what advice would you like to share with people looking into image and video recognition?

Matthew:  [08:35] That’s a great question. There’s a few alternatives, and we literally just released a blog post yesterday about this. You want to consider a lot of different things, when deciding about visual recognition providers, or building the technology in‑house. What Clarifai does is take a lot of the pains out of the process.

[08:55] We have experts in‑house that have PhDs in this field of object recognition. Not just myself as a CEO, but also a whole research team, dedicated to pushing this technology forward, and applying it to new application areas. That’s kind of the expertise piece. We also have the data piece covered.

[09:15] If you come to us, and you want to recognize cars and trees and dogs, you don’t need any label data that has those tags already associated with it. We’ve done that process of collecting data, either from the web or from our partners, and we’ve trained a model to recognize these things automatically.

[09:34] This is as broad as possible. We do the job of curating it, so that it’s very high quality, and it doesn’t have any obscene types of concepts, that you wouldn’t want your users to be exposed to. So it’s very nicely packaged for you. Then finally, we take away the need for extensive resources as well.

[09:53] We make it so you don’t need extra machines or specialized machines. We actually use some very specialized hardware to do this efficiently. You don’t need the time it takes to train these models, which takes many weeks, or sometimes months, to get optimal performance. All that is taken care of. You literally just need three lines of code, in order to use Clarifai.

[10:15] Finally, there’s this component of independence that Clarifai has, that some other providers don’t. As a small company, we’re corely focused on understanding every image and video, to improve life. We want to apply this technology to every possible vertical, and solve every possible problem that we can, without competing with our customers.

[10:38] There are some big entries in this space, where they’re building divisions within their companies that end up competing with you. If you’re a big enterprise, looking for image and video recognition, you have to consider that as well. Basically, do you trust the provider of this technology with your data?

[10:56] Because long‑term, you want to make a partnership that you both benefit from, and don’t have to be afraid of. That’s what Clarifai provides, and we make this very affordable for you, and very simple for you to use.

Henrik:  [11:09] Matthew, where can we find out more information about image and video recognition?

Matthew:  [11:13] I would check out Clarifai’s blog. One of the goals of our marketing department, is to educate the world about what visual recognition is. Not only how we do it, but how the technology works, and where you can get more resources for it. That’ll be the one‑stop shot. The first check‑out is that blog.clarifai.com. We regularly update it with information.

[11:37] There’s also a lot of great resources online. The research community…if you really want to dive into the details. What this community has evolved to do, is actually not wait for conferences or journal publications, but actually publish regularly to an open community of publications, so that the latest research is always available.

[12:00] That’s something really unique in this image and video recognition space, that we don’t see in other fields of research. Depending on what stage you’re at in understanding this technology, you’ll get high-level details from Clarifai’s blog. Then low level, all the way from the research community.

Henrik:  [12:16] Well, thanks Matthew.

Matthew:  [12:17] Thank you.

Henrik:  [12:18] For more of this, visit tagging.tech.

Thanks again.


For a book about this, visit keywordingnow.com