Audio, Image and video keywording. By people and machines.

Leave a comment

Tagging.tech interview with Nikolai Buwalda

Tagging.tech presents an audio interview with Nikolai Buwalda about image recognition


Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.


Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available





Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Nikolai Buwalda. Nikolai, who are you, and what do you do?

Nikolai Buwalda:  I support organizations with product strategy, and I’ve being doing that for the last 15 years. My primary focus is products that have social networking components, and whenever you have social networking and user‑generated content, there is a lot of content moderation that’s a part of that workflow.

Recently, I’ve been working with a French company, who’s launched their large social network in Europe, and as a part of that, we’ve spun up a startup that I’m the Founder of called moderatecontent.com, uses artificial intelligence to handle some of the edge cases when moderating content.

Henrik:  Nikolai, what are the biggest challenges and successes you’ve seen with image recognition?

Nikolai:  2015 was really an amazing year with image recognition. A lot of forces really came to maturity and so you’ve seen a lot of organizations deploy products and feature sets in the cloud that used or depend heavily on image recognition. It probably started about 20 years ago with experiments using neural networks.

In 2012, a team from the University of Toronto came forward with a real radical development in how neural networks are used for image recognition. Based on that, there was quite a few open source projects, a lot of video card makers also developed hardware that supported it, and in 2014 you saw another big leap by Google in image recognition.

Those products really matured in 2015, and that’s really allowed for a lot of enterprises to have a very cost effective ability now to integrate image recognition into the work that they do. So 2015 really has seen, in the $1000 range, the ability to buy a video card, use an open source platform, and very quickly have image recognition technology available to your workflow.

In terms of challenges, I continue to see two of the very same challenges existing in the industry. One is the risk to a company’s brand, and that still continues.

Even though image recognition is widely accepted as a technology that can surpass humans in a lot of cases for detecting patterns and understanding content, when you go back to your legal and to your privacy departments, they still want to have an element of humans reviewing content in the process.

It really helps them with their audit, and their ability to represent the organization when an incident does occur. Despite companies like Google going with an image recognition first passing the Turing test, you still end up with these parts of the organization who want human review.

I think it’s still another five years before these groups are going to be swayed to have an artificial intelligence machine‑learning first approach.

The second major issue is context. Machine learning or image recognition is really great at matching patterns in content and understanding these are all the different elements that make up some content, but they are not great at understanding the context ‑‑ the metadata that goes along with a piece of content ‑‑ and making assumptions about how all the elements work together.

To illustrate this, it’s probably a very good use case that’s commonly talked about, which is having a person pouring a glass of wine. Now, in all kinds of different contexts, this content could be recognized as something that you don’t want associated with your brand versus not being an issue at all.

If you think about somebody pouring a glass of wine, say at a cafe in France versus somebody pouring a glass of wine in Saudi Arabia. Between the two, there’s very different context there, but very difficult for machine to draw conclusion about the appropriateness of that.

Another very common edge case that people like to use as example is the bicycle example where machines are great at detecting bicycles. They can do amazing things, far surpass the ability of people to detect this type of object, but if that bicycle was a few seconds away from being into some sort of accident, machines are very difficult at detecting this.

That’s where human review ‑‑ human escalations comes into play for these types of issues and still represent a large portion of the workflow and the cost in moderating content. So, mitigating risk within your organization to have some sort of person review of content.

Then to also really understand the context are two things that I think, in the next five years, will be solved by artificial intelligence and will really put these challenges for image recognition behind them.

Henrik:  As of March 2016, how much of image recognition is completed by people versus machines?

Nikolai:  This is a natural stat to ask about, but I think, with all the advancements in 2015, I really like to talk about a different stat. Right now, anybody developing a platform that has user‑generated content has gone with Computer Vision Machine learning approach first.

They’ll have a 100 percent of their content initially reviewed with this technology and then, depending on the use case and the risk profile, a certain percentage gets flagged and moved on to a human workflow. I really like to think about it in terms of, “What is the number of people globally working in the industry?”

We know today that about 100,000 to 200,000 people worldwide are working at terminals moderating content. That’s a pretty large cost and a pretty staggering human cost. We know these jobs are quite stressful. We know they have high turnover and have long‑term effects on the people doing these jobs.

The stat I like to think about is, “How do we reduce the number of people who have to do this and move that task over to computers?” We also know that it’s about a thousand times less expensive to use a computer to moderate this. It’s about a tenth of a cent per piece of content versus about 10 cents per content to have a piece of content reviewed with human escalation.

In terms of really understanding how far we’ve advanced, I think the best metric to keep is how we can reduce the number of people who are involved in manual reconciliation.

Henrik:  Nikolai, what advice would you like to share with people looking into image recognition?

Nikolai:  My advice is, and it’s something that people have probably heard quite a bit, which is it’s really important to understand your requirements and to gain consensus within your organization about the business function you want image recognition to do.

It’s great to get excited about the technology and to see where the business function can help, but it’s the edge cases that can really hurt your organization. You have to gather all the requirements around.

That means meeting with legal, privacy, security and understanding the use case that you want to use image recognition for and then the edge cases that may pose some risks to your organization. You really have to think about all the different feature sets that go into making a project really successful with image recognition.

Things that are important is how it integrates with your existing content management system. A lot of image recognition platforms use third parties, and they can be offshore in countries like the Philippines and India. Understanding your requirements for sending content over there, your infosec department is really important to know how that integrates.

Having escalation and approval workflows, this is really going to protect you in these edge cases where there is the need for human review. That needs to be quite seamless as there’s still a significant amount of content that gets moderated and approved this way.

Having language and cultural support, global companies really have to consider the impact culturally of content from one region versus another. Having features and an understanding built into your image recognition that it can adapt to that is very important.

Crisis management, this is something that all the big social platforms have playbooks ready to go for. It’s very important because, even if it’s, like I said, one image in a million that gets classified poorly, it can have a dramatic impact in media or even legally for you. You want to be able to get ahead of it very quickly.

A lot of third parties provide these types of playbooks, and it’s a feature set that they offer along with their resources. The usual feature set you have to think about ‑‑ language filters, image, video, chat protection. Edge case that has a lot of business rules associated with is the protection of children, social‑media filtering.

You might want to have a wider band of guardrails to protect you on response rate and throughput. A lot of services have different types of offerings. Some will moderate content over 72 hours, and others you need response rates within the minute.

Understanding your throughput and response rate that’s required is very important and really impacts the cost of the offering that you are looking to provide. Third‑party list support ‑‑ a lot of companies will provide business rule guidance and support on the different rule sets that apply to different regions around the world.

That’s important to understand which ones you need and how to support it within your business process. Important to demonstrate control of your content is having user flags. Being able to have the people who are consuming your content, the ability to flag content into workflow to work through that demonstrates one of the controls that you need to often have in place and the edge cases.

The edge cases are where media and legal really has a lot of traction and are looking for companies to provide really good controls for protecting themselves. Things like suicide prevention, bullying, and hate speech can really dramatically…just one case can have a significant impact on your brand.

The last item is a lot of organizations for a lot of different reasons have their content moderation done within their own organization. They have the human review within their own organization and so having training of that staff for some of the stressful portions of that job and training for HR is very important. It is something to consider when building out of these workflows.

Henrik:  Nikolai, where can we find more information about image recognition?

Nikolai:  The leading research for image recognition really starts at the ImageNet competition that’s hosted at Stanford. If you Google ImageNet in Stanford, you’ll find that the URL isn’t that great and officially it’s called the ImageNet Large Scale Visual Recognition Challenge. This is where all the top organizations, all the top research teams in image recognition compete to have the best algorithms, the best tools, and the best techniques.

This is where all the breakthroughs in 2012, 2014 happened. Right now, Google is the leader, but it’s very close and image recognition at that competition is certainly at a level where these teams are far exceeding the capability of humans. So from there, you get to see all the tools and techniques that the latest organizations are using, and what’s amazing is the same tools and techniques they use on their platforms that exist for integrating within your own organization.

On top of that, the competition between video card providers, between AMD and NVIDIA, has really made the hardware to support this to allow for real‑time image recognition at a very cost-effective manner. The tools that they talk about at this competition leverage that hardware and so it’s a great starting place to understand what the latest techniques are and how you might implement them within your own organization.

Another great site is opencv.org or open computer vision, and they have taken a built‑up framework around taking all the latest tools and techniques and algorithms and packaging them up in a really easy‑to‑deploy toolset. It’s has been around for a long time and so they really have a lot of examples, a lot of the background about how to implement these types of techniques.

If you are hoping to get an experiment going very quickly, using some of the open source platforms from ImageNet competitions and using OpenCV together you can really get something up very quickly.

On top of that, when you’re building out these types of workflows, you need to work closely with a lot of the nonprofits that have great guidance on what are the rule sets, what are the guardrails you need to have in place to protect your users and to protect your organization.

The Facebook has really been a leader in this area and they have spun up a bunch of different organizations they work with ‑‑ the National Cyber Security Alliance, Childnet International, connectsafely.org ‑‑ and there are a lot of region‑specific organizations that you can work with. I definitely recommend that using their guardrails will really be a great starting point for a framework when understanding how image recognition can moderate your content, how image recognition can be used in ethical and legal manner.

In terms of content moderation, it’s a very crowded space right now. Some of the big partners, they don’t talk a lot about their statistics, but they are doing a very large volume of moderation. Companies like WebPurify, Crisp Thinking, and crowdsource.com, they all have an element of machine learning and computer and human interaction.

The cloud platforms like AWS and Azure have offerings for the machine learning side. Adobe definitely is a content management platform. They have great integrated software package if you use that platform.

Another aspect, which is quite important, is a lot of companies do their content moderation internally, and so having training for that staff and training for your HR department is very important. But all in all, there are a lot of resources, a lot of open source platforms that make it really easy to get started.

TensorFlow, which is an open source project from Google, they use it across their platform. I think they have…The last I checked, it was about 40 different product offerings that use the TensorFlow platform, and it is a neural network based image recognition type technology. It’s very visual and it’s very easy to understand and can really help reduce the amount of time to go to production with some of this technology.

Other open source projects, if you don’t want to be attached to Google, include CaffeTorchTheano and NVIDIA. They have a great offering tied to their technology.

Henrik:  Well, thanks Nikolai.

Nikolai:  Thank you, Henrik. I’m excited about content moderation. It’s a topic that’s not really talked a lot about, but it’s really important and I think in the next five years we are really going to see the computer side of content moderation and image recognition take over, understand the context of these items, and really reduce the dependency on people to do this type of work.

Henrik: For more on this, visit Tagging.tech. Thanks again.


For a book about this, visit keywordingnow.com

Leave a comment

Tagging.tech interview with Kevin Townsend

Tagging.tech presents an audio interview with Kevin Townsend about keywording services


Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.


Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available




Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Kevin Townsend. Kevin, how are you?

Kevin Townsend:  Good, thank you.

Henrik:  Kevin, who are you and what do you do?

Kevin:  I’m the CEO and Managing Director for a company called KeedUp. What we do is keywording, but also adding other metadata, fixing images, image flow services; a whole heap of things, but keywording and metadata is really the core of what we do.

What makes us a little bit different to maybe some other keywording companies is that we started out from a basis of being involved in the industry as a syndicator/image seller. We were like a photo agency, photo representative, like many of our customers ‑‑ in fact almost all of our customers.

As a result, we’ve developed services in a somewhat different way. For instance, we operate 24 hours a day, seven days a week. We do celebrity as well as stock. Everybody that works for us pretty much is working in an office. There’s no piecework. Almost all of our staff are university graduates.

Henrik:  Kevin, what are the biggest challenges and successes you’ve seen with keywording services?

Kevin:  I think the biggest challenge, certainly for us, has been dealing with the multitude of requirements and the different systems that our customers work with. It’s never really a thing where you are just sent some images and are allowed to do whatever you like to them and provide the best keywording or the best metadata you can.

Everybody has their own things that they want done. There are all these different standards, like you might be keywording for a Getty Images standard, or back when it used to be a thing, the Corbis standard, and so on and so forth.

Dealing with all of those different things I think is the real big challenge in keywording and delivering exactly what people want. That’s the real key.

I think the successes, kind of related, is that we’ve built systems that have enabled us to cope with all of those different things, things such as our own workflow system called Piksee, which it really did cut out an awful lot of handling time and wastage just dealing with sets of images.

Or we have our own client database which records and enables all our staff to know exactly, down to the contributor level, all of the things that you maybe want to do differently for one photographer over another when it comes to metadata or fixing your images.

Just a whole series of things that, when I first started, I didn’t realize all of these nuances would come into play, but they really are crucial to delivering a good service.

The result of that has been that our reputation is such that we tend to work for the big names ‑‑ certainly in the news, celebrity, and increasingly in the stock area as well ‑‑ like Associated Press, like Splash News, and like Magnum. It’s being successful in that we’ve managed to defeat the problem, I suppose.

Henrik:  As of early March 2016, how much of the keywording work is completed by people versus machines?

Kevin:  I guess it depends on how you work that figure out. In terms of, if the question is how many of the images that we work on are touched by human beings deciding on what keywords go into the images, that figure is really 100 percent.

But, and this is important, the technology that you have to assist them in doing that and doing a good job is quite considerable. I don’t think that’s it’s appreciated, I think, often by maybe photographers, or particularly amateurs out there, exactly what goes into what I’d call professional keywording as opposed to “seat of your pants” keywording.

We don’t sit there very often and keyword one image after another, searching into our memory banks, trying to come up with the best keywords. There are systems, vocabularies. There are ways for handling the images, organizing the images.

So much technology is involved there to really make the humans that we have the best that they can be.

I have to say, in that regard, what we always are doing ‑‑ and as I said earlier, we employ almost exclusively university graduates, people who have degrees in communication studies or English, or art history ‑‑ is that we’re trying to have the best supercomputer to do the keywording, which is the human brain, and the most educated and best-programmed supercomputer.

Then we add the technology on top. So, yes, 100 percent of the work in the end is done by people, but certainly with a lot of assistance from technology.

If you look into the future, the far future, I feel sure that one-day artificial intelligence will probably do a lot of things for all of us in all sorts of areas we’re not even vaguely aware of now.

We’re starting to see some of that happen already in all sorts of things to do with apps on your phones that can tell you how to do this, that, and that other, and account for your heartbeat; all sorts of things that are happening with artificial intelligence, which is great.

When it comes to keywording, what I see is not very flattering at the moment, which is not to say that it may not get there in the end. But I think what I need to do is try to put things in a little bit of perspective, at least from where I see it.

The level of complication that I was talking about earlier, which is really the key to good keywording, I think is where at the moment AI keywording falls down completely, and even before that it’s falling over some hurdles right now.

On my blog recently, I did a post about one AI provider, and they invite you to put test images in to see what they can do. Well, [laughs] the result was particularly unedifying, in that a lot of the keywords were just completely wrong. The point of the images was completely missed. They weren’t able to name anybody in the images.

It was really a pretty poor effort, and even the examples they had on their website, showing what they considered to be successes, there were very few keywords in terms of what would be acceptable commercially.

Also, a lot of the keywords were extremely inane and almost pointless; certainly nothing that would fit into a vocab that you would be able to submit to Getty, for instance, or that would be acceptable to Alamy. This is a long, long, way from where it needs to get.

Perhaps the best analogy, that I could explain how I view things at the moment with AI and keywording, is a few years ago I went see the Honda robot which had come to town.

They had spent millions and millions and millions of dollars on this robot, and its big claim to fame was that it could walk upstairs, which it did. Not particularly well, but it did it. It was a great success, and everyone was very happy.

Thing is, any three‑year‑old kid in the audience could have run up and down those stairs and run around the robot many times.

I feel that AI keywording is a bit like that robot at the moment. Yes, it’s doing some rudimentary things, and that looks great, and people who think it’s a good idea and it’s all going to be wonderful, can shout about it, but it’s a long way from the reality of what humans are able to do. A long, long way.

I think where you have to consider the technology has to go is if you want to carry on the robot analogy, is to really be able to do the sort of keywording with concepts and meeting all these challenges of different standards, they have to be more like an android than they need to be like a robot that can assemble a motor vehicle.

Now, how long it’s going to take us to get to that sort of stage, I don’t know. I would be very doubtful that the amount of money and technology, and what have you, that would be needed to get us to that point is going to be directed towards keywording.

I’m sure there’ll be much more important things that sort of level of technology would be directed at. But certainly one day, maybe in my lifetime, maybe not, we’ll probably wake up and there’ll be androids doing keywording.

Henrik:  Kevin, what advice would you like to share with people looking into keywording services?

Kevin:  I think that it’s one of those things, it’s the oldest cliche, that you do get what you pay for, generally speaking.

We have had so many people who have come to us who have gone down the route of trying to save as much money as they could, and getting a really poor job done, finding it didn’t work for them, it wasn’t delivering what they wanted, and they’ve ended up coming and getting the job done properly.

For instance, at Magnum we have taken over the keywording there from what used to be crowd‑sourced keywording, which was particularly poor. That’s really made a big difference to them, and I know they’re very happy.

There are other examples that we’ve had over the years with people who’ve gone off and got poor keywording and regretted it. Just to use another old saying, no one ever regrets buying quality, and I think that is very true with keywording.

Henrik:  Where can we find more information about keywording services?

Kevin:  Right. We have a website www.keedup.com. We have a blog. We are also on Facebook, on Twitter, and on LinkedIn. We’re in a lots of different places. If you go there as a starting point, there are links there to other sites that we have. That’s a good place to start.

We have a site called coreceleb.com that’s a site which is an offshoot of what we do, which is focused really on editing down and curating the images that people are creating, so that you have more sales impact.

We also have brandkeywording.com, which is focused on adding information about brands that celebrities are wearing and using; not just fashion, but also what cars they drive, all sorts of things really to add new revenue streams, particularly for celebrity photo agencies, but also there’s no reason why that doesn’t include sports news and even stock.

Those are two which are really pretty important as well.

Henrik:  Thanks, Kevin.

Kevin:  Good. [laughs] I hope that will give people some food for thought.

Henrik:  For more on this visit Tagging.tech.

Thanks again.


For a book about this, visit keywordingnow.com