Henrik: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Ramzi Rizk. Ramzi, how are you?
Ramzi: Hey Henrik, how are you? I’m good thanks.
Henrik: Great. Ramzi, who are you and what do you do?
Ramzi: I’m one of the founders and I’m the CTO at a company called, EyeEm.com. Based out of Berlin, we’re a photography company, been around for 5 and a half years now, where we’re a community and market-based for authentic imagery. Basically, photos taken by average people who have a passion for photography, but aren’t necessarily professionals. Over the past few years, we’ve invested a lot and built quite a few technologies around understanding the content context and aesthetic qualities of images.
Henrik: Great. What are the biggest challenges and successes with image recognition?
Ramzi: I think over the past few years there’s been an amazing explosion in the number of tools that are available, particularly out of deep learning that are available to actually automate a big part of the photographers’ workflow, if you want. That includes, of course, recognizing what is in a photo, as well as, was the quality of the photo are and making photos just that much easier to find, to search and to share. I think the greatest successes have been naturally the fact that we’re at a point now where we can, better than human accuracy, I would say, describe the content of a photo. A lot of the challenges would have to be around data. Deep learning is a very data-heavy field and that you need a lot of content that is properly labeled, properly tagged, in order to train these machines to recognize what’s in the images.
Over the past few years it’s gotten, things have gotten more and more accurate to the point where, in a lot of cases, machines are actually more accurate than humans at recognizing the various details in a photo. That being said, we as humans do have this innate ability to understand context and to draw the more subtle abstract notions of what an image is trying to compare and that is definitely significantly more challenging to model in a machine.
Henrik: As of October 2016, how do you see image recognition changing?
Ramzi: I think we’re getting to a point where the pure art of recognizing what is in a photo has become a commodity, I would say. In the next 6 months to a year, you should be able to just license a variety of APIs and Google has an API out, so do we, so does a few other companies that are specialized at understanding the content of a photo. I think image recognition in a classical sense, how we understand it. When you think 10 years ago we were talking about how amazing it is that we can now recognize cats in videos. I think that challenge is one that is solved and since it’s now a solved problem, we will be seeing, and we are seeing a lot of applications built on top of this, doing this that were previously not that possible.
That includes also having the ability to run these so-called models, these algorithms on your device, on your phone, and not having to upload content to the cloud, even in real time. Which means we’re at a point now where while you’re taking a photo, you can actually be getting real-time feedback on the quality of the image, on whether the photo that you’re taking is actually aesthetic appealing and the minute you shoot it, your phone has already stored all of the content of that photo, making it searchable right away.
Henrik: Ramzi, what advice would you like to share with people, looking into image recognition?
Ramzi: People looking into building image recognition solutions, I would recommend not to anymore, because as I said, the problem is solved. You don’t reinvent email, you build services on top of it, and I think today you’re at a point where you can build a lot of really exciting, interesting services on top of existing image recognition frameworks and existing APIs that offer this out of the box. For people looking at using it, I think this is the perfect time to actually start building these applications because technology is mature enough, it’s more than affordable, and it’s at a point where anyone can really build software, with the assumption that they understand what is in the photo.
Henrik: Where can we find out more information?
Ramzi: I would definitely have to pitch, eyeem.com/tech. If you’re interested in looking at applied image recognition. We offer an API where you can actually keyword your entire content, your entire image library for photography professionals or for amateurs. You can also have it caption or have images described in a full sentence, even more interesting is machines that have learned to now understand your personal taste. They can actually surface content that you know you will like, or surface content that you know your customers will like or that your significant other would like and then just simplify that entire process of really taking out the monotonous, boring work out of photography, out of photographers workflow.
As a photographer, you can just focus on the art of creation and on capturing that perfect moment. I think there’s a bunch of other services like Google Cloud Vision and so on, that you can also look at and learn more about what you can do with imagery today.
Henrik: Thanks Ramzi.
Ramzi: Thank you, Henrik. Pleasure speaking to you.
Clemency Wright: Hi. I’m good, thanks Henrik. How are you?
Henrik: Good. Clemency, who are you and what do you do?
Clemency: I’m Clemency Wright. I’m the Owner and Director of Clemency Wright Consulting, which is a UK‑based business and we specialize in providing bespoke keywording services and metadata consultancy, primarily for the creative media industries.
We work with stock photo libraries. We also work with specialist image collections. We work with book publishers and a small number of online retailers. We do some collaborative work with software developers and technical consultants on various projects.
The purpose of our work, mainly, is to help our clients organize their digital assets. These could be visual or text‑based. The idea here is to make the assets found more quickly and more easily by their end users.
Initially, my role in this field was working within the stock photo library, in search data and search vocabulary for a major global stock photo library based in London.
From here, I’ve worked with specialist collections, where the nature of keywording is very different, and also in the museum and heritage sector; again, working with data in a very different format on a digitization process. The experience across those different fields is quite different when you look at it from a keywording perspective.
Just to clarify now, I’m a consultant for various businesses. This is really key, as the proliferation of visual media continues to grow. We’re very closely looking at the way we handle digital content, how we make sense of that digital content, how we make the information relevant, and more available to more people.
It has huge potential for our customers and for their end users, in terms of improving the search experience and the access to these assets. I think that pretty much summarizes where we are at the minute, in terms of who we work with, and what we provide for those people.
Henrik: What are the biggest challenges and successes you’ve seen with keywording services?
Clemency: One of the biggest challenges really is the perception that keywording is pretty much the same as tagging. Obviously with the rise of SEO, we’ve got some confusion here about what keywording is. We started keywording many years ago.
Obviously within librarianship and archival work, people were keywording as a way to retrieve information, which is still what we do, but I think the challenge here is breaking down these perceptions that it’s always a very basic way of tagging content.
We’re trying to differentiate between keywording which is, on its basic level, adding words that define an image or the content of an image, and high performance keywording which is very much a user‑focused exercise.
It’s a very 360‑degree look at the life cycle of the image and how that image will be ultimately consumed and licensed for use in the broader digital environment.
One of the challenges is highlighting the value of a high quality, high performance keywording project to the customers, and also their end users and the various stakeholders therein.
I think working with specialist collections can be quite challenging. We have to create bespoke keywording hierarchies and controlled vocabularies for these clients, which obviously makes the access to the content much more. The performance of that is much greater, but it can be challenging. It can be quite time‑consuming.
There’s a level of education that we need to have with our clients, to illustrate to them and demonstrate to them the return on investment that can be had from a good keywording methodology. By the methodology, I just wanted to define that, which links to the challenges that we have to do with technology and the extent to which we use controlled vocabulary systems and software, and the hierarchies that we build for our clients.
They help to define the depth to which we can classify content, and also, the breadth of that content. The content may be video footage, or it may be photography. It may be illustration.
Obviously, a challenge there is creating a vocabulary or a taxonomy that will cater for an ever‑increasing collection, one that is growing and evolving as businesses themselves incorporate new content into their collections.
Technology is a challenge, but it’s also a great facilitator in the work that we do. It allows us to embed a level of accuracy and consistency to the work that we do for our clients.
When you’ve got measures in place, and you’re creating controlled vocabularies and hierarchies, you’ve got systems there that make sure the right vocabulary is being applied, and it’s being applied consistently and accurately. There’s a level of support that the technology can offer, as well as it having its own challenges.
Perhaps on a more general level, keywording has been tarnished somewhat by some multi‑service agencies which are offering keywording as a bit of a sideline.
Perhaps their core business may be software or systems development or post‑production, but then, by offering keywording as an offshoot, some clients are going down that road and then discovering later on that actually, the keywording side of that was a bit of an afterthought. I think the methodologies and strategies in place have failed some of the clients that we work with, at any rate.
There’s a challenge there for us to make sure that we can differentiate between specialist keywording provider and an agency that offers keywording as an additional add‑on to their core business.
I think another challenge that is worth mentioning is the idea of offshoring keywording to agencies where perhaps the quality is compromised, and this is what I hear from clients. The feedback on some projects has been that there’s been a lack of understanding, due to language barriers mainly, but also cultural understanding of visual content.
It can be quite difficult, across the continents, for people to read and interpret visuals in the way that your market may perhaps be consuming those visuals. There’s a challenge in, again, educating people into the options and the various consequences of using these various agencies.
Henrik: Clemency, as of March, 2016, how much of the keywording work is completed by people versus machines?
Clemency: We know that there is a lot of work being done in auto‑tagging and systems that will automatically add keywords that are relevant to the content. In my business, we define automated systems and keywording in a much more specific way.
We use it to automate the addition of, say, synonyms, or to automatically translate keywords, or to automatically add hierarchical keywords, but I think, Henrik, what you’re asking really is about the image recognition technology, which is something we’re clearly aware of and we have been for some years now.
Image recognition is not something that we currently engage with or consult on. It’s in its infancy, and it will be very exciting to follow these developments, but for now it’s quite limited to reading data in a very simple form.
For example, color and shape, and to some extent, say, for example, number of people within an image is something image recognition technology can do, but I think there is quite a lot of documentation to support the idea that it’s very difficult for a machine to understand the sentiment behind an image, the concept, or the emotion.
I was thinking of a good example of an image of a person smiling. I’m not sure, I’m not convinced, the extent to which a machine could determine whether that smile is one of happiness or one of sarcasm, for example.
A person looking at an image will make a certain assumption about that smile. Maybe it is subjective, but I think it’s just something that’s perhaps a little bit too advanced for machines at the minute, to be able to read the emotional side of visual content, which is really the field that I’m most interested in, most active in.
I think the technology will improve, but underpinning that, it really depends on who will be responsible for managing the architecture and the taxonomy, and maintaining that, and editing it, and developing it, because of course, we need people to put the intelligence into the structure behind the technology.
Although we can increase efficiency, and that’s great, and we need to increase efficiency and reduce costs and increase productivity, I think there’ll be a lot of management required and people involved in making sure that the technology is delivering consistently relative results, and testing, and testing, and testing to see that this is how it’s happening.
But, as I say, we question the extent ultimately to which machines can interpret the more conceptual content and the visual content that we work with primarily, because visual media is always open to interpretation.
It’s a subjective form that perhaps machines will go so far, in terms of classifying basic content, which will be very, very helpful, and it certainly will help speed up the processes for people like us, but I think for the user we have to be mindful that relevance is really the most critical element of this whole process.
Henrik: What advice would you like to share with people looking into keywording services?
Clemency: I’ve been working with keywording for 14 years, and it’s a really varied and rich resource for anybody who’s interested in looking into keywording services.
I have a few ideas here, which are from my experience working with clients and from gathering feedback from clients, but I think the advice would be generally that there is no quick fix. Keywording isn’t something that you can pull out of a box. There’s no standard as such.
Even though we’re told there is a standard,the stock libraries that set standards are having to change those constantly because the distribution networks are changing and the media types are changing.
Be prepared for it to be a fluid project. If you start engaging with a keywording service. It will probably evolve over time. It will change over time, and that’s a good thing.
You need to be prepared to talk quite a lot about your business goals and objectives, perhaps more than you think. A good keywording agency will want to know a lot about your market, about your channels, your network, your distribution.
They won’t want just to see the content, because if they just see the content and they just add keywords, there’s a lack of connection from a marketing and a sales perspective. It’s very important for the keywording agency to understand your business and the context within which your business sits in the bigger picture.
Be prepared to be asked quite a lot of questions before you start engaging with a keywording provider.
The other main thing is to be wary, perhaps, of agencies that seem more focused on volumes and deadlines than they do quality. I alluded to that earlier on, with some of the options to offshore your work.
This can be a bit of a false economy. It can be, in the long run, more expensive to focus on volumes and timeframes. Quality’s always a good groundwork to base your keywording projects on.
Also, I’d advise people to work with someone who’s a communicator, someone who’s going to uncover the problem and really spend time and effort in solving that problem. They’ll want to see samples of your assets before they start giving you prices.
I think that’s a really important conversation to have. It’s really important to have good communication with your provider and also a good level of trust, so I’d advise you to find out who they’ve worked with and if possible try to speak to their clients, who they have worked with.
Another great idea would be to speak to picture researchers, because they use keywords day in and day out. They’re on stock photo websites, publishing, advertising, and design agencies.
People that use picture researchers and picture buyers would be a really great source of information, just to ask them what their experience is working with various providers of the content, because then from there you can track who has been investing well in good keywording, and what that means, and where the value is in that.
Most of the software that you look at will not do everything that you need it to do, and I think that’s another important thing to bear in mind from a technological standpoint, is systems are great and you’d do well to consult with someone who knows a lot about different systems.
But ultimately it’s best to configure a system that’s bespoke for your needs, so perhaps maybe investing a little bit more time than you first anticipated in researching systems that will be fit for your purpose and give your clients the best experience as a user.
Henrik: Great. Where can we find more information about keywording services?
Clemency: There’s various resources online. There are some really interesting blogs. We can put links in here for you for your readers, if they’re interested. One great independent resource, which I think is fantastic for all industry news in general, is Photo Archive News, which is a news aggregation. They list services and providers that you might want to contact and speak to.
You’ll also find information about keywording services on stock library websites. For example, Alamy has a list of resources*, and there are marketing services such as Bikinilists listing various resources available to the industry, but also mentioning keywording agencies that you might be able to work with across the globe. There are keywording agencies based in the US. There are agencies in New Zealand and across Europe.
I think, just to go back on the conversation previously, there’s a lot of research to be done. It does take a little bit of time, but I think when you find an agency that really understands what you’re looking for then you’ve got that conversation to have with them about what you’re specifically looking to achieve.
Henrik: Thanks, Clemency.
Clemency: Yes, thanks, Henrik. I hope it’s been a useful insight into the world of keywording.
Georgi Kadrev: Hi, Henrik. All good. I am quite enthusiastic to participate in the podcast.
Henrik: Georgi, who are you and what do you do?
Georgi: I’m Co‑Founder and CEO of Imagga, which is one of the pretty good platforms for image recognition as a service. We have auto‑tagging and auto‑categorization services that you can use for practical use cases.
Henrik: Georgi, what are the biggest challenges and successes you’ve seen with image recognition?
Georgi: In terms of challenges, I think, one of the biggest ones is that we, as human beings, as people, we are used to perceive a lot of our world through our eyes. Basically, when people think in general about image recognition, they have a very diverse and a very complete picture of what it should do.
Let’s say from optical character recognition or recognizing texts, to facial recognition of a particular person, to conceptual tagging, to categorization, all these different kinds of aspects of visual perception.
People typically have expectations that it’s the same technology or the same solution, but actually, quite a lot of different approaches needs to be engaged into the actual process of recognizing and understand the semantics of the image.
In terms of successes, like addressing this, I can say that not surprisingly the deep learning thing that is quite a big hype in the last few years have been a huge success into the more conceptual or class‑level object recognition. This is what it is as a type of object.
Is it a bottle? Is it a dog? Is it a cat? Is it a computer? Is it mobile phone? and so on. This has become pretty practical, and right now we can say that we are close to human level in recognition of a lot of different classes of objects.
At the same time, in some other spaces, like lower recognition, like facial recognition, we also see quite big appreciation rates that allow for practical applications.
I can say one of the good things is that we are more and more closely to automating, at least, part of the tasks that needs to be performed by a computer, replacing the need for manual annotation of photos for different use cases.
In terms of challenges, maybe I would also add that you still need a lot of data, a properly annotated data. In machine learning and in deep learning in general, it’s very data‑greedy, so we need an enormous amount of samples to really make something robust enough and practical enough.
We still see the gathering a high‑quality dataset is one of the challenges. This is something that we also try to internally address. It helps us be more competitive in terms of quality and the technology.
Henrik: As of March 2016, how do you see image recognition changing?
Georgi: What we definitely see that there are more and more services. Some of them are pretty good quality that try to automate different aspects of image recognition that I briefly tackled.
We see even big players like Google starting to offer services for some of those things like what they call label recognition or what we call tagging, what they call optical character recognition or most of the vendors call it that way.
We also have seen logo and facial recognition being quite popular and being utilized more and more in different kinds of brand monitoring services.
At the same time, from the perspective of a bit of downside of visual recognition, something that we see when we talk about highly artistic images or some more specific art or other types of specific content, still the technologies needs to be customly‑trained for that.
If possible at all to train a classification‑based image recognition to recognize different kinds of artistic images or different kinds of very specialized image content.
It’s related with what I had mentioned in the beginning, that if you have a specific task, sometimes you need a specific approach. Deep learning to a certain extent has addressed this, but still it’s not like one-size-fits-all solution. We see that in a lot of cases the customers need to define a specific problem so that they can have a very good and precise specific solution.
Henrik: As of March 2016, how much of image recognition is completed by humans versus machines?
Georgi: I would say, [laughs] honestly depends on the task. We’ve seen some cases that machines can be better than humans and not just in theory, in practice.
For example, if we train a custom classifier with the human‑curated data set, and then we do some kind of testing or validation, we see that the errors, the things that are reported as errors in the learning process can actually mean errors by the people.
It’s mistaken when it has annotated the photo so that then it’s false reported as an error, although it’s correct. In a way, this is promising because it shows the automation and consistency that machines can do is pretty good in terms of precision.
At the same time, there are tasks where if you have a lot of explicit or implicit knowledge that you need to get in order to resolve an automation task. A lot of background knowledge that people have is not available for the machine and then you need to figure out a way how to either automate this or use a combination between a computer and a human, or you can decide this as a fully humanic task.
Still, it’s not approachable by technical approach. I cannot give an exact number. Something interesting that I can share is a statistic, we did a pretty interesting experiment called Clash of Tags, where we ask people. We have a data set of stock photography. This stock photography has existing set of tags provided by various people like the stock photographers themselves.
Then we also have the same set of images of stock photos that are annotated using current technology, completely blindly from the original tags that people have put for the image. Then, we do this thing, we ask people, “Type a keyword and then you get search results.”
One of the set of results on the left‑hand side or the right‑hand side is not known in advance, but one of the set of results is based on the tags that people have put, and the other set of results is based on the tags that our API has generated and has been assigned to the images.
The user needs to pick which is the winning set. In a lot of cases, I can say in 45 percent roughly of the cases, people have chosen that result set based on automatically generated tag is better than the set of results based on human‑provided tags. It’s not more than 50, but still means in a lot of cases the machine has been superior to the human performance.
I believe this number will grow in the future. I can say it’s still a way to go to something like complete automation, but we are getting closer and closer and we’re enthusiastic about it.
Henrik: Georgi, what advice would you like to share with people looking into image recognition?
Georgi: I would say, have a very clear idea of what kind of venue you want to drive out of that and try to optimize for that. Either working on it yourself or with a vendor. Make it really clear what are your objectives, what are your objections about image recognition. Just think from the practical aspect.
This is something that me, personally and the whole of our team has always been stressing on. Let’s see what it does and what it can do and what it can’t and address. If they’re really a pain that can be solved right now or not. Also from the vendor side, I would suggest don’t over‑promise because it’s quite easy to get people a bit confused.
They have an expectation like, “It’s AI so it can do anything?”, but you need to be realistic, so you save your time and you save your potential customer time. If the use case is very clear and if he was a professional then commit that this is going to work out, then go for it. Other than that, don’t waste time, yours and your potential customers.
This is something that we saw a lot, because a lot of people ask about features that currently technically are not practical enough or they ask about features that we don’t have. We learn the hard way and to certain extent to say, “This is possible, this is not possible currently from our perspective.”
Henrik: Where can we find more information about image recognition?
Georgi: Depending on what you need. Do you need more data for training, or do you need more basic knowledge, or do you need different kind of inspirations about business applications? There are different sources.
Obviously, ImageNet and all the accompanying information and the algorithms that we have around this pretty nice dataset is quite useful for researchers. We also have for beginners in image recognition, we have all these set of Coursera courses.
One of the most notable one from Stanford University. A few more pretty good ones from most of the top European or American universities. We have different kinds like newsletters and digests. AI Weekly is pretty good inspirational wise. There is some mixture of research topics, business cases, cool hacks and ideas about what you can do with image recognition.
Henrik: Well, thanks, Georgi.
Georgi: Thanks a lot, Henrik. I hope your audience will enjoy the podcast, including our participation in it.