Audio, Image and video keywording. By people and machines.

Leave a comment

Tagging.tech interview with Martin Wilson

Tagging.tech presents an audio interview with Martin Wilson about image recognition.


Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.


Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available





Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Martin Wilson. Martin, how are you?

Martin Wilson:  I’m very well, thank you. How are you?

Henrik:  Good. Martin, who are you and what do you do?

Martin:  I am a director at Asset Bank. Being a director, I’ve done an awful lot of different things over the years. I have done some development on our product, Asset Bank. I’ve done sales and I’ve done consultancy while rolling out the product.

Just to explain a little bit about what Asset Bank is as a product, it is a digital asset management solution. Digital asset management is often shortened to DAM. A DAM solution helps clients and the users to organize the digital assets that almost every organization owns and makes use of nowadays.

By digital asset, we mean primary files. Things like images, videos, documents and all of those. A digital asset has an awful lot of value to an organization and it’s very important that they can find them easily, that they don’t waste money recreating digital assets that they already have, and that the assets themselves are used properly in a way that’s consistent with the brand of the organization.

Henrik:  Martin, what are the biggest challenges and successes you’ve seen with image and video recognition?

Martin:  Let me first start by saying how I think that image recognition has a potential to have a really big impact on my industry, digital asset management. Digital asset management is all about being able to find images and then use them properly. That’s the purpose of the DAM system. There’s an old adage which people use and it says that a DAM system is only as good as the metadata that is associated with the assets. The reason for that is, a million images, if you have a million images in any system it’s almost impossible to find the image you want without some sort of a search and or a browse function. Those searches and browse functions at the moment rely on what we call metadata that it is associated with the assets. That metadata is things like title or caption of an image, description, perhaps some keywords that been put in, maybe some information about how that can be used, the image can be used.

The result of this is that people, humans, spend an awful lot of time entering the metadata that is associated with digital assets. Usually, within an organization, the processes, the workflows that are associated with using a DAM application involve uploading one or more or many digital assets, typically images or videos, and then manually entering the data by, for example, looking at the image, seeing what it’s about, what the subject is, who’s in it maybe if it’s of people and then just actually typing in that data.

As you can imagine, that takes a lot of time. It’s also considered quite boring by most people. For that reason, it’s often skipped or not done really well. If it’s not done really well, the data associated with the assets is incomplete and therefore it’s very hard for it to turn up in the right searches.

The idea that it could be automated, this process, and have a computer work out what’s in the image and tag the digital assets appropriately is enormous. It’s almost like the Holy Grail of the upload process for DAM systems.

There was an awful lot of excitement when, for example Google Cloud Vision came out with their service. It’s what called an API which enables other applications to make use of the image recognition functionality. There’s a lot of other services as well that have come out in the last couple of years like Clarify, is another one.

When they came out, lots of DAM vendors got very excited and rushed to add the functionality into their own applications. We did the same. About a year ago we started a project with the objective of developing a component that could be used with Asset Bank in order to add auto-tagging capabilities to asset bank.

Let me just describe some of the challenges then that we found in doing that and when we rolled out some of our clients, the challenges they found. One of the challenges, I suppose which is always like a umbrella challenge over all of it, is people’s expectations.

Humans are very good at looking at images and working out what’s in it. They’ve also got a lot of domain knowledge. Usually, they understand, for example, their products. They can look at a product shot and say, “Yeah, that’s product F-567”, or whatever the code is. It’s actually very hard for computers to do that well. That problem hasn’t been solved that well yet.

What we found is, when compared with how humans tag images, the results coming from the auto-tagging software or APIs was not, to be frank, not of good enough quality for most cases. That’s the second specific challenge then, really. The quality of the raw results coming back from the software. The image, the visual recognition software was not quite good enough for use in most organizations, especially in a commercial sense.

That’s not say that it’s not useful. I’ll come on to that in a bit. What we found, on to the successes, what we found was that certain clients who had more generic or general images, the results were much better. We’ve got some clients who are tourists boards. They’ve got images of landscapes and scenery. Most of the image recognition software is quite good at finding the subjects and suggesting keywords for those types of images.

One of the reasons for that is that most of them have been trained on image data sets, that are images that are found on the internet for example. Of course they’re going to be generic. The other end of the spectrum, where we found it didn’t work that well was for clients that have got quite bespoke business domains or subject domains, images of their own product range. Very hard for these fairly generic image recognition software APIs to be able to come up with the right keywords for those sorts of image.

That’s possibly where there are still gaps. That might be something we’ll talk about in a minute about the future, which is the inability for a lot of this tagging software to learn from bespoke data sets.

Henrik:  Martin, as of December 2016, how do you see image and video recognition changing?

Martin:  I think it’s fair to say that it’s in it’s infancy at the moment. It’s only since it’s become available through the online cloud services or web services that people have found it very easy to start using this technology in their own applications. It’s only been the last couple of years, that really has kind of taken off as something that can be openly or easily used.

Now I think the vendors of this sort of software are learning very quickly from real use cases. I think it’s quite an exciting for where the commercial or non-commercial application of this software can go. I think if we first focus a little bit more on the current problems, that gives some insight into where the software might go, what direction it might go in.

I was just talking then about one of the problems being that is very generic at the moment, the tags that you get back from the online services are going to be fairly generic. That’s obviously the case if you understand how they work and how they learn. I think very quickly we’re going to see these services, and I know some are already, offering the ability for you to train them with your own data sets. That then opens up the application a lot more widely.

One of the things that image recognition and artificial intelligence, in general, is the context in which they’re operating. It’s much easier for image recognition software to work well if it is working within quite a narrow context. As an example, if you’re talking about, or if you want to try and get the software to recognize your product range, then if it’s trained on images that are of product range, and therefore the context is only products within your product range, then it’s a lot easier for it to recognize the right products, rather than having to think of every product that it’s ever seen an image of in the entire world.

Just to reiterate that I think the ability to train the software in bespoke data sets and for it to concentrate on in effect, domain-specific subjects, I think that’s a must and that will start to happen.

I think we will see quite a few hybrid solutions. What we found when we were doing our investigation into the software and what we ended up doing within Asset Bank or within the components which we call QuickTagger that works with Asset Banks is, coming up with hybrid human and computer interaction model where the tags that were being suggested by the visual recognition software were not just accepted as that’s job done. They were used to then group the images so a user could very quickly change the tags that weren’t right.

They could, for example, accept some of the tags, because they’re the right tags and the human agreed with the computer in effect, but then they could quite easily change the tags that were wrong. The key thing here is that the grouping was still being done pretty successfully. Although the tags that the image recognition software was suggesting might not be right, it was recognizing that certain images were of the same subject. That therefore meant that a human could go in and say, “Okay, I’ve got 50 images here that are all of a particular, I don’t know, model of car. They’ve all been grouped together, so that makes it really easy for me as a human to now type in the right name of the car or the model of the car.”

I think that idea, that where we are right now with this technology is that can help facilitate, speed up the human interfaces. That’s a quite a powerful idea I think, but where … I think that will continue, so we’ll see an evolution of that. I think we’re quite a long way off just being able to say, “Okay, you get on with it, computer. Tag these up.” I think we’re going to see improvements and sort of evolution of the idea of humans and computers working together in this auto-tagging sphere.

Henrik:  Martin, what advice would you like to share with people looking at image and video recognition?

Martin:  The first thing I would say is about expectations management. If you are used to having tags generated by humans who know what they’re doing, they understand the domain, the subject domain of the images that they’re tagging, you are likely to be fairly disappointed I would say in the results for most cases.

That’s one thing. See beyond the raw results you’re getting back from the tagging software. Look to how you might use the tags though to your advantage. For example, in hybrid solutions.

Consider what subject matter you’ve got, what your images are actually of and tailor your expectations accordingly. If you’ve got a lot of images that are of fairly generic subjects, you might find a lot of value from the auto tags. If you’ve got quite specific subjects, be prepared to potentially be a bit disappointed and or to have to put in quite a lot of work to either start training some of the software that you’re using or looking at how you can sort of augment the results with human interactions.

Sorry, another bit of advice is shop around. Have a look at the different services that are available. They’re fairly different. We built our QuickTagger in such a way that we can plug in the different services that are available, so we could just simply change it to work with Google Cloud Vision or with Clarify and there’s ten other potential candidates that I could list off the top of my head and probably more out there. They give different results. Some of them are better for different applications as well and different subjects. Usually, very simple to get a free trial and try out the software that’s there. That would be my last bit of advice. Shop around with the auto-tagging technologies that are available.

Henrik:  Martin, where can we find out more information?

Martin:  More information on our product Asset Bank is available on our website, which is www.assetbank.co.uk. If you’re interested in particular in how we, in the experiments that we’ve done and the components that we’ve got for Asset Bank, QuickTagger, then just fill in our contact form and express that interest. I would personally be very happy to talk to people about what we found.

There’s some information about QuickTagger that we’ve developed on our website as well. If you’re interested in finding about the different technologies that are available out there for you to use within your own application, there’s a lot. Personally, I would recommend now the cloud-based ones, because it’s much easier to get up and running with those. There’s quite a lot of information, meaning if you just typed in ‘image recognition software’ or ‘image recognition APIs’, you’ll see there’s quite a few good articles that people have put together on Quora and so on that have done the research for you. Use that as a starting point because as I say, things change all the time. New APIs come out. Do your research, but there is a lot of information available on the internet about this.

Henrik:  Thanks, Martin.

Martin:  You’re welcome.

Henrik:  For more on this, visit Tagging.tech.

Thanks again.


For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Jonas Dahl

Tagging.tech presents an audio interview with Jonas Dahl about image recognition

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.


Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available



Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Jonas Dahl. Jonas, how are you?
Jonas Dahl:  Good. How are you?Henrik:  Good. Jonas, who are you and what do you do?Jonas:  Yeah, so I’m a product manager with Adobe Experience Manager. And I primarily look after our machine learning and big data features across all AEM products, so basically working with deep learning, graph-based methods, NLP, etc.

Henrik:  Jonas, what are the biggest challenges and successes you’ve seen with image recognition?

Jonas:  Yes. Well, deep learning is basically what happened, what defines before and after. So, basically in 2012, there’s a confluence of the data piece that is primarily enabled by the Internet, large amounts of well-labeled images that could drive these huge deep learning networks. There’s the deep learning technology and, obviously, the availability of raw computing power. So, that’s basically what happened. And with that we saw accuracy increase tremendously, and now it’s basically rivaling human performance, right? So we see both accuracy and also kind of the breadth of labeling you can do in classification you can do has just increased and improved tremendously in the last few years.

In terms of challenges, what I see is, I really see this as a path you’re going in or the first step is kind generic tagging of images, right? So what’s in an image? Are their people in it? What are the emotions? Stuff like that that’s pretty generic. And that’s kind of the era we’re in right now where we see a lot of success and where we can really automate these tedious tagging tasks at scale pretty convincingly.

I think the challenge right now is to move to kind of the next step, which is to personalize these tags. So, basically provide tags that are relevant not just to anyone but to your particular company. So, if you’re a car manufacturer and you want to be able to classify different car models. If you’re a retailer, you may want to be able to do fine grain classification of different products. So that’s the big challenge I see now and that’s definitely where we are headed and where we’re focusing on in all apps.

Henrik:  And, as of November 2016, how do you see image recognition changing?

Jonas:  Well, really where I see it changing is, as I said, it’s going to be more specific to the individual customer’s assets. It’s going to be able to learn from your guidance. So, basically, how it works now is that you have a large repository of already-tagged images, then you train networks to do classification. What’s going to happen is that we’re going to add a piece that makes this much more personalized, much more relevant to you, and where the system learns from your existing metadata and your guidance, basically, as you curate the proposed tags.

Another thing I see is video, it’s going to be more important. And video has that temporal component, which makes segmentation important, and that’s how that differs from images. So there’s that, and also the much larger scale that we’re looking at in terms of processing and storage when we’re talking about video. Basically, video is just a series of images, so when we develop technologies to handle images, those can be transferred to the video pieces, as well.

Henrik:  Jonas, what advice would you like to share with people looking at image recognition?

Jonas:  Well, I would say start using it. start doing small POCs [proof of concepts] to get a sense of how well it works for your use case and kind of define small challenges that, small successes you want to achieve and just get into it. This is something that is evolving really fast these days, so getting in and seeing how it performs now, then you’ll be able to provide valuable feedback to companies like Adobe. So you can basically impact the direction that this is going in. It’s something we value a lot. It’s really valuable to us that when we run beta programs, for instance, that people come to us and say, “You know, this is where this worked really well. These are the concrete examples where it didn’t work that well,” or, “These are specific use cases that we really wish that this technology could solve for us.”

So now is a really good time to get in there and see how well it works. And also, I’d say, just stay on top of it. Stay in touch because, as I said, this evolves so fast that you may try it today and then a year from now things can look completely different, and things can have improved tremendously.

So that’s my advice. Now is a good time. I think the technologies have matured enough that you can get real solid value out of them. So this is a good time to see what can these technologies do for you.

Henrik:  Jonas, where can we find more information?

Jonas:  Yeah, so we just at Adobe launched what we call Adobe Sensei, which is the collection of all the AI and machine learning efforts we have at Adobe. And going, just Googling that, and going to that website, that will be updated with all the exciting things that we are doing in that space. And I would recommend that you keep an eye on that because that’s something that’s going to really evolve the next few years.

Henrik:  Great. Well, thanks, Jonas.

Jonas:  Yeah, you’re welcome.

Henrik:  For more on this, visit Tagging.tech.

Thanks again.


For a book about this, visit keywordingnow.com

Leave a comment

Tagging.tech interview with Ramzi Rizk

Tagging.tech presents an audio interview with Ramzi Rizk


Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.


Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available




Henrik:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Ramzi Rizk. Ramzi, how are you?

Ramzi:  Hey Henrik, how are you? I’m good thanks.

Henrik:  Great. Ramzi, who are you and what do you do?

Ramzi:  I’m one of the founders and I’m the CTO at a company called, EyeEm.com. Based out of Berlin, we’re a photography company, been around for 5 and a half years now, where we’re a community and market-based for authentic imagery. Basically, photos taken by average people who have a passion for photography, but aren’t necessarily professionals. Over the past few years, we’ve invested a lot and built quite a few technologies around understanding the content context and aesthetic qualities of images.

Henrik:  Great. What are the biggest challenges and successes with image recognition?

Ramzi:  I think over the past few years there’s been an amazing explosion in the number of tools that are available, particularly out of deep learning that are available to actually automate a big part of the photographers’ workflow, if you want. That includes, of course, recognizing what is in a photo, as well as, was the quality of the photo are and making photos just that much easier to find, to search and to share. I think the greatest successes have been naturally the fact that we’re at a point now where we can, better than human accuracy, I would say, describe the content of a photo. A lot of the challenges would have to be around data. Deep learning is a very data-heavy field and that you need a lot of content that is properly labeled, properly tagged, in order to train these machines to recognize what’s in the images.

Over the past few years it’s gotten, things have gotten more and more accurate to the point where, in a lot of cases, machines are actually more accurate than humans at recognizing the various details in a photo. That being said, we as humans do have this innate ability to understand context and to draw the more subtle abstract notions of what an image is trying to compare and that is definitely significantly more challenging to model in a machine.

Henrik:  As of October 2016, how do you see image recognition changing?

Ramzi:  I think we’re getting to a point where the pure art of recognizing what is in a photo has become a commodity, I would say. In the next 6 months to a year, you should be able to just license a variety of APIs and Google has an API out, so do we, so does a few other companies that are specialized at understanding the content of a photo. I think image recognition in a classical sense, how we understand it. When you think 10 years ago we were talking about how amazing it is that we can now recognize cats in videos. I think that challenge is one that is solved and since it’s now a solved problem, we will be seeing, and we are seeing a lot of applications built on top of this, doing this that were previously not that possible.

That includes also having the ability to run these so-called models, these algorithms on your device, on your phone, and not having to upload content to the cloud, even in real time. Which means we’re at a point now where while you’re taking a photo, you can actually be getting real-time feedback on the quality of the image, on whether the photo that you’re taking is actually aesthetic appealing and the minute you shoot it, your phone has already stored all of the content of that photo, making it searchable right away.

Henrik:  Ramzi, what advice would you like to share with people, looking into image recognition?

Ramzi:  People looking into building image recognition solutions, I would recommend not to anymore, because as I said, the problem is solved. You don’t reinvent email, you build services on top of it, and I think today you’re at a point where you can build a lot of really exciting, interesting services on top of existing image recognition frameworks and existing APIs that offer this out of the box. For people looking at using it, I think this is the perfect time to actually start building these applications because technology is mature enough, it’s more than affordable, and it’s at a point where anyone can really build software, with the assumption that they understand what is in the photo.

Henrik:  Where can we find out more information?

Ramzi:  I would definitely have to pitch, eyeem.com/tech. If you’re interested in looking at applied image recognition. We offer an API where you can actually keyword your entire content, your entire image library for photography professionals or for amateurs. You can also have it caption or have images described in a full sentence, even more interesting is machines that have learned to now understand your personal taste. They can actually surface content that you know you will like, or surface content that you know your customers will like or that your significant other would like and then just simplify that entire process of really taking out the monotonous, boring work out of photography, out of photographers workflow.

As a photographer, you can just focus on the art of creation and on capturing that perfect moment. I think there’s a bunch of other services like Google Cloud Vision and so on, that you can also look at and learn more about what you can do with imagery today.

Henrik:  Thanks Ramzi.

Ramzi:  Thank you, Henrik. Pleasure speaking to you.

Henrik:  For more of this, visit Tagging.tech.

Thanks again.


For a book about this, visit keywordingnow.com