Tagging.tech presents an audio interview with Nikolai Buwalda about image recognition
Nikolai Buwalda: I support organizations with product strategy, and I’ve being doing that for the last 15 years. My primary focus is products that have social networking components, and whenever you have social networking and user‑generated content, there is a lot of content moderation that’s a part of that workflow.
Recently, I’ve been working with a French company, who’s launched their large social network in Europe, and as a part of that, we’ve spun up a startup that I’m the Founder of called moderatecontent.com, uses artificial intelligence to handle some of the edge cases when moderating content.
Henrik: Nikolai, what are the biggest challenges and successes you’ve seen with image recognition?
Nikolai: 2015 was really an amazing year with image recognition. A lot of forces really came to maturity and so you’ve seen a lot of organizations deploy products and feature sets in the cloud that used or depend heavily on image recognition. It probably started about 20 years ago with experiments using neural networks.
In 2012, a team from the University of Toronto came forward with a real radical development in how neural networks are used for image recognition. Based on that, there was quite a few open source projects, a lot of video card makers also developed hardware that supported it, and in 2014 you saw another big leap by Google in image recognition.
Those products really matured in 2015, and that’s really allowed for a lot of enterprises to have a very cost effective ability now to integrate image recognition into the work that they do. So 2015 really has seen, in the $1000 range, the ability to buy a video card, use an open source platform, and very quickly have image recognition technology available to your workflow.
In terms of challenges, I continue to see two of the very same challenges existing in the industry. One is the risk to a company’s brand, and that still continues.
Even though image recognition is widely accepted as a technology that can surpass humans in a lot of cases for detecting patterns and understanding content, when you go back to your legal and to your privacy departments, they still want to have an element of humans reviewing content in the process.
It really helps them with their audit, and their ability to represent the organization when an incident does occur. Despite companies like Google going with an image recognition first passing the Turing test, you still end up with these parts of the organization who want human review.
I think it’s still another five years before these groups are going to be swayed to have an artificial intelligence machine‑learning first approach.
The second major issue is context. Machine learning or image recognition is really great at matching patterns in content and understanding these are all the different elements that make up some content, but they are not great at understanding the context ‑‑ the metadata that goes along with a piece of content ‑‑ and making assumptions about how all the elements work together.
To illustrate this, it’s probably a very good use case that’s commonly talked about, which is having a person pouring a glass of wine. Now, in all kinds of different contexts, this content could be recognized as something that you don’t want associated with your brand versus not being an issue at all.
If you think about somebody pouring a glass of wine, say at a cafe in France versus somebody pouring a glass of wine in Saudi Arabia. Between the two, there’s very different context there, but very difficult for machine to draw conclusion about the appropriateness of that.
Another very common edge case that people like to use as example is the bicycle example where machines are great at detecting bicycles. They can do amazing things, far surpass the ability of people to detect this type of object, but if that bicycle was a few seconds away from being into some sort of accident, machines are very difficult at detecting this.
That’s where human review ‑‑ human escalations comes into play for these types of issues and still represent a large portion of the workflow and the cost in moderating content. So, mitigating risk within your organization to have some sort of person review of content.
Then to also really understand the context are two things that I think, in the next five years, will be solved by artificial intelligence and will really put these challenges for image recognition behind them.
Henrik: As of March 2016, how much of image recognition is completed by people versus machines?
Nikolai: This is a natural stat to ask about, but I think, with all the advancements in 2015, I really like to talk about a different stat. Right now, anybody developing a platform that has user‑generated content has gone with Computer Vision Machine learning approach first.
They’ll have a 100 percent of their content initially reviewed with this technology and then, depending on the use case and the risk profile, a certain percentage gets flagged and moved on to a human workflow. I really like to think about it in terms of, “What is the number of people globally working in the industry?”
We know today that about 100,000 to 200,000 people worldwide are working at terminals moderating content. That’s a pretty large cost and a pretty staggering human cost. We know these jobs are quite stressful. We know they have high turnover and have long‑term effects on the people doing these jobs.
The stat I like to think about is, “How do we reduce the number of people who have to do this and move that task over to computers?” We also know that it’s about a thousand times less expensive to use a computer to moderate this. It’s about a tenth of a cent per piece of content versus about 10 cents per content to have a piece of content reviewed with human escalation.
In terms of really understanding how far we’ve advanced, I think the best metric to keep is how we can reduce the number of people who are involved in manual reconciliation.
Henrik: Nikolai, what advice would you like to share with people looking into image recognition?
Nikolai: My advice is, and it’s something that people have probably heard quite a bit, which is it’s really important to understand your requirements and to gain consensus within your organization about the business function you want image recognition to do.
It’s great to get excited about the technology and to see where the business function can help, but it’s the edge cases that can really hurt your organization. You have to gather all the requirements around.
That means meeting with legal, privacy, security and understanding the use case that you want to use image recognition for and then the edge cases that may pose some risks to your organization. You really have to think about all the different feature sets that go into making a project really successful with image recognition.
Things that are important is how it integrates with your existing content management system. A lot of image recognition platforms use third parties, and they can be offshore in countries like the Philippines and India. Understanding your requirements for sending content over there, your infosec department is really important to know how that integrates.
Having escalation and approval workflows, this is really going to protect you in these edge cases where there is the need for human review. That needs to be quite seamless as there’s still a significant amount of content that gets moderated and approved this way.
Having language and cultural support, global companies really have to consider the impact culturally of content from one region versus another. Having features and an understanding built into your image recognition that it can adapt to that is very important.
Crisis management, this is something that all the big social platforms have playbooks ready to go for. It’s very important because, even if it’s, like I said, one image in a million that gets classified poorly, it can have a dramatic impact in media or even legally for you. You want to be able to get ahead of it very quickly.
A lot of third parties provide these types of playbooks, and it’s a feature set that they offer along with their resources. The usual feature set you have to think about ‑‑ language filters, image, video, chat protection. Edge case that has a lot of business rules associated with is the protection of children, social‑media filtering.
You might want to have a wider band of guardrails to protect you on response rate and throughput. A lot of services have different types of offerings. Some will moderate content over 72 hours, and others you need response rates within the minute.
Understanding your throughput and response rate that’s required is very important and really impacts the cost of the offering that you are looking to provide. Third‑party list support ‑‑ a lot of companies will provide business rule guidance and support on the different rule sets that apply to different regions around the world.
That’s important to understand which ones you need and how to support it within your business process. Important to demonstrate control of your content is having user flags. Being able to have the people who are consuming your content, the ability to flag content into workflow to work through that demonstrates one of the controls that you need to often have in place and the edge cases.
The edge cases are where media and legal really has a lot of traction and are looking for companies to provide really good controls for protecting themselves. Things like suicide prevention, bullying, and hate speech can really dramatically…just one case can have a significant impact on your brand.
The last item is a lot of organizations for a lot of different reasons have their content moderation done within their own organization. They have the human review within their own organization and so having training of that staff for some of the stressful portions of that job and training for HR is very important. It is something to consider when building out of these workflows.
Henrik: Nikolai, where can we find more information about image recognition?
Nikolai: The leading research for image recognition really starts at the ImageNet competition that’s hosted at Stanford. If you Google ImageNet in Stanford, you’ll find that the URL isn’t that great and officially it’s called the ImageNet Large Scale Visual Recognition Challenge. This is where all the top organizations, all the top research teams in image recognition compete to have the best algorithms, the best tools, and the best techniques.
This is where all the breakthroughs in 2012, 2014 happened. Right now, Google is the leader, but it’s very close and image recognition at that competition is certainly at a level where these teams are far exceeding the capability of humans. So from there, you get to see all the tools and techniques that the latest organizations are using, and what’s amazing is the same tools and techniques they use on their platforms that exist for integrating within your own organization.
On top of that, the competition between video card providers, between AMD and NVIDIA, has really made the hardware to support this to allow for real‑time image recognition at a very cost-effective manner. The tools that they talk about at this competition leverage that hardware and so it’s a great starting place to understand what the latest techniques are and how you might implement them within your own organization.
Another great site is opencv.org or open computer vision, and they have taken a built‑up framework around taking all the latest tools and techniques and algorithms and packaging them up in a really easy‑to‑deploy toolset. It’s has been around for a long time and so they really have a lot of examples, a lot of the background about how to implement these types of techniques.
If you are hoping to get an experiment going very quickly, using some of the open source platforms from ImageNet competitions and using OpenCV together you can really get something up very quickly.
On top of that, when you’re building out these types of workflows, you need to work closely with a lot of the nonprofits that have great guidance on what are the rule sets, what are the guardrails you need to have in place to protect your users and to protect your organization.
The Facebook has really been a leader in this area and they have spun up a bunch of different organizations they work with ‑‑ the National Cyber Security Alliance, Childnet International, connectsafely.org ‑‑ and there are a lot of region‑specific organizations that you can work with. I definitely recommend that using their guardrails will really be a great starting point for a framework when understanding how image recognition can moderate your content, how image recognition can be used in ethical and legal manner.
In terms of content moderation, it’s a very crowded space right now. Some of the big partners, they don’t talk a lot about their statistics, but they are doing a very large volume of moderation. Companies like WebPurify, Crisp Thinking, and crowdsource.com, they all have an element of machine learning and computer and human interaction.
The cloud platforms like AWS and Azure have offerings for the machine learning side. Adobe definitely is a content management platform. They have great integrated software package if you use that platform.
Another aspect, which is quite important, is a lot of companies do their content moderation internally, and so having training for that staff and training for your HR department is very important. But all in all, there are a lot of resources, a lot of open source platforms that make it really easy to get started.
TensorFlow, which is an open source project from Google, they use it across their platform. I think they have…The last I checked, it was about 40 different product offerings that use the TensorFlow platform, and it is a neural network based image recognition type technology. It’s very visual and it’s very easy to understand and can really help reduce the amount of time to go to production with some of this technology.
Henrik: Well, thanks Nikolai.
Nikolai: Thank you, Henrik. I’m excited about content moderation. It’s a topic that’s not really talked a lot about, but it’s really important and I think in the next five years we are really going to see the computer side of content moderation and image recognition take over, understand the context of these items, and really reduce the dependency on people to do this type of work.
Henrik: For more on this, visit Tagging.tech. Thanks again.
For a book about this, visit keywordingnow.com