tagging.tech

Audio, Image and video keywording. By people and machines.


Leave a comment

Tagging.tech interview with Nikolai Buwalda

Tagging.tech presents an audio interview with Nikolai Buwalda about image recognition

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

 

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Nikolai Buwalda. Nikolai, who are you, and what do you do?

Nikolai Buwalda:  I support organizations with product strategy, and I’ve being doing that for the last 15 years. My primary focus is products that have social networking components, and whenever you have social networking and user‑generated content, there is a lot of content moderation that’s a part of that workflow.

Recently, I’ve been working with a French company, who’s launched their large social network in Europe, and as a part of that, we’ve spun up a startup that I’m the Founder of called moderatecontent.com, uses artificial intelligence to handle some of the edge cases when moderating content.

Henrik:  Nikolai, what are the biggest challenges and successes you’ve seen with image recognition?

Nikolai:  2015 was really an amazing year with image recognition. A lot of forces really came to maturity and so you’ve seen a lot of organizations deploy products and feature sets in the cloud that used or depend heavily on image recognition. It probably started about 20 years ago with experiments using neural networks.

In 2012, a team from the University of Toronto came forward with a real radical development in how neural networks are used for image recognition. Based on that, there was quite a few open source projects, a lot of video card makers also developed hardware that supported it, and in 2014 you saw another big leap by Google in image recognition.

Those products really matured in 2015, and that’s really allowed for a lot of enterprises to have a very cost effective ability now to integrate image recognition into the work that they do. So 2015 really has seen, in the $1000 range, the ability to buy a video card, use an open source platform, and very quickly have image recognition technology available to your workflow.

In terms of challenges, I continue to see two of the very same challenges existing in the industry. One is the risk to a company’s brand, and that still continues.

Even though image recognition is widely accepted as a technology that can surpass humans in a lot of cases for detecting patterns and understanding content, when you go back to your legal and to your privacy departments, they still want to have an element of humans reviewing content in the process.

It really helps them with their audit, and their ability to represent the organization when an incident does occur. Despite companies like Google going with an image recognition first passing the Turing test, you still end up with these parts of the organization who want human review.

I think it’s still another five years before these groups are going to be swayed to have an artificial intelligence machine‑learning first approach.

The second major issue is context. Machine learning or image recognition is really great at matching patterns in content and understanding these are all the different elements that make up some content, but they are not great at understanding the context ‑‑ the metadata that goes along with a piece of content ‑‑ and making assumptions about how all the elements work together.

To illustrate this, it’s probably a very good use case that’s commonly talked about, which is having a person pouring a glass of wine. Now, in all kinds of different contexts, this content could be recognized as something that you don’t want associated with your brand versus not being an issue at all.

If you think about somebody pouring a glass of wine, say at a cafe in France versus somebody pouring a glass of wine in Saudi Arabia. Between the two, there’s very different context there, but very difficult for machine to draw conclusion about the appropriateness of that.

Another very common edge case that people like to use as example is the bicycle example where machines are great at detecting bicycles. They can do amazing things, far surpass the ability of people to detect this type of object, but if that bicycle was a few seconds away from being into some sort of accident, machines are very difficult at detecting this.

That’s where human review ‑‑ human escalations comes into play for these types of issues and still represent a large portion of the workflow and the cost in moderating content. So, mitigating risk within your organization to have some sort of person review of content.

Then to also really understand the context are two things that I think, in the next five years, will be solved by artificial intelligence and will really put these challenges for image recognition behind them.

Henrik:  As of March 2016, how much of image recognition is completed by people versus machines?

Nikolai:  This is a natural stat to ask about, but I think, with all the advancements in 2015, I really like to talk about a different stat. Right now, anybody developing a platform that has user‑generated content has gone with Computer Vision Machine learning approach first.

They’ll have a 100 percent of their content initially reviewed with this technology and then, depending on the use case and the risk profile, a certain percentage gets flagged and moved on to a human workflow. I really like to think about it in terms of, “What is the number of people globally working in the industry?”

We know today that about 100,000 to 200,000 people worldwide are working at terminals moderating content. That’s a pretty large cost and a pretty staggering human cost. We know these jobs are quite stressful. We know they have high turnover and have long‑term effects on the people doing these jobs.

The stat I like to think about is, “How do we reduce the number of people who have to do this and move that task over to computers?” We also know that it’s about a thousand times less expensive to use a computer to moderate this. It’s about a tenth of a cent per piece of content versus about 10 cents per content to have a piece of content reviewed with human escalation.

In terms of really understanding how far we’ve advanced, I think the best metric to keep is how we can reduce the number of people who are involved in manual reconciliation.

Henrik:  Nikolai, what advice would you like to share with people looking into image recognition?

Nikolai:  My advice is, and it’s something that people have probably heard quite a bit, which is it’s really important to understand your requirements and to gain consensus within your organization about the business function you want image recognition to do.

It’s great to get excited about the technology and to see where the business function can help, but it’s the edge cases that can really hurt your organization. You have to gather all the requirements around.

That means meeting with legal, privacy, security and understanding the use case that you want to use image recognition for and then the edge cases that may pose some risks to your organization. You really have to think about all the different feature sets that go into making a project really successful with image recognition.

Things that are important is how it integrates with your existing content management system. A lot of image recognition platforms use third parties, and they can be offshore in countries like the Philippines and India. Understanding your requirements for sending content over there, your infosec department is really important to know how that integrates.

Having escalation and approval workflows, this is really going to protect you in these edge cases where there is the need for human review. That needs to be quite seamless as there’s still a significant amount of content that gets moderated and approved this way.

Having language and cultural support, global companies really have to consider the impact culturally of content from one region versus another. Having features and an understanding built into your image recognition that it can adapt to that is very important.

Crisis management, this is something that all the big social platforms have playbooks ready to go for. It’s very important because, even if it’s, like I said, one image in a million that gets classified poorly, it can have a dramatic impact in media or even legally for you. You want to be able to get ahead of it very quickly.

A lot of third parties provide these types of playbooks, and it’s a feature set that they offer along with their resources. The usual feature set you have to think about ‑‑ language filters, image, video, chat protection. Edge case that has a lot of business rules associated with is the protection of children, social‑media filtering.

You might want to have a wider band of guardrails to protect you on response rate and throughput. A lot of services have different types of offerings. Some will moderate content over 72 hours, and others you need response rates within the minute.

Understanding your throughput and response rate that’s required is very important and really impacts the cost of the offering that you are looking to provide. Third‑party list support ‑‑ a lot of companies will provide business rule guidance and support on the different rule sets that apply to different regions around the world.

That’s important to understand which ones you need and how to support it within your business process. Important to demonstrate control of your content is having user flags. Being able to have the people who are consuming your content, the ability to flag content into workflow to work through that demonstrates one of the controls that you need to often have in place and the edge cases.

The edge cases are where media and legal really has a lot of traction and are looking for companies to provide really good controls for protecting themselves. Things like suicide prevention, bullying, and hate speech can really dramatically…just one case can have a significant impact on your brand.

The last item is a lot of organizations for a lot of different reasons have their content moderation done within their own organization. They have the human review within their own organization and so having training of that staff for some of the stressful portions of that job and training for HR is very important. It is something to consider when building out of these workflows.

Henrik:  Nikolai, where can we find more information about image recognition?

Nikolai:  The leading research for image recognition really starts at the ImageNet competition that’s hosted at Stanford. If you Google ImageNet in Stanford, you’ll find that the URL isn’t that great and officially it’s called the ImageNet Large Scale Visual Recognition Challenge. This is where all the top organizations, all the top research teams in image recognition compete to have the best algorithms, the best tools, and the best techniques.

This is where all the breakthroughs in 2012, 2014 happened. Right now, Google is the leader, but it’s very close and image recognition at that competition is certainly at a level where these teams are far exceeding the capability of humans. So from there, you get to see all the tools and techniques that the latest organizations are using, and what’s amazing is the same tools and techniques they use on their platforms that exist for integrating within your own organization.

On top of that, the competition between video card providers, between AMD and NVIDIA, has really made the hardware to support this to allow for real‑time image recognition at a very cost-effective manner. The tools that they talk about at this competition leverage that hardware and so it’s a great starting place to understand what the latest techniques are and how you might implement them within your own organization.

Another great site is opencv.org or open computer vision, and they have taken a built‑up framework around taking all the latest tools and techniques and algorithms and packaging them up in a really easy‑to‑deploy toolset. It’s has been around for a long time and so they really have a lot of examples, a lot of the background about how to implement these types of techniques.

If you are hoping to get an experiment going very quickly, using some of the open source platforms from ImageNet competitions and using OpenCV together you can really get something up very quickly.

On top of that, when you’re building out these types of workflows, you need to work closely with a lot of the nonprofits that have great guidance on what are the rule sets, what are the guardrails you need to have in place to protect your users and to protect your organization.

The Facebook has really been a leader in this area and they have spun up a bunch of different organizations they work with ‑‑ the National Cyber Security Alliance, Childnet International, connectsafely.org ‑‑ and there are a lot of region‑specific organizations that you can work with. I definitely recommend that using their guardrails will really be a great starting point for a framework when understanding how image recognition can moderate your content, how image recognition can be used in ethical and legal manner.

In terms of content moderation, it’s a very crowded space right now. Some of the big partners, they don’t talk a lot about their statistics, but they are doing a very large volume of moderation. Companies like WebPurify, Crisp Thinking, and crowdsource.com, they all have an element of machine learning and computer and human interaction.

The cloud platforms like AWS and Azure have offerings for the machine learning side. Adobe definitely is a content management platform. They have great integrated software package if you use that platform.

Another aspect, which is quite important, is a lot of companies do their content moderation internally, and so having training for that staff and training for your HR department is very important. But all in all, there are a lot of resources, a lot of open source platforms that make it really easy to get started.

TensorFlow, which is an open source project from Google, they use it across their platform. I think they have…The last I checked, it was about 40 different product offerings that use the TensorFlow platform, and it is a neural network based image recognition type technology. It’s very visual and it’s very easy to understand and can really help reduce the amount of time to go to production with some of this technology.

Other open source projects, if you don’t want to be attached to Google, include CaffeTorchTheano and NVIDIA. They have a great offering tied to their technology.

Henrik:  Well, thanks Nikolai.

Nikolai:  Thank you, Henrik. I’m excited about content moderation. It’s a topic that’s not really talked a lot about, but it’s really important and I think in the next five years we are really going to see the computer side of content moderation and image recognition take over, understand the context of these items, and really reduce the dependency on people to do this type of work.

Henrik: For more on this, visit Tagging.tech. Thanks again.


 

For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Clemency Wright

Tagging.tech presents an audio interview with Clemency Wright about keywording services

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

 

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Clemency Wright. Clemency, how are you?

Clemency Wright:  Hi. I’m good, thanks Henrik. How are you?

Henrik:  Good. Clemency, who are you and what do you do?

Clemency:  I’m Clemency Wright. I’m the Owner and Director of Clemency Wright Consulting, which is a UK‑based business and we specialize in providing bespoke keywording services and metadata consultancy, primarily for the creative media industries.

We work with stock photo libraries. We also work with specialist image collections. We work with book publishers and a small number of online retailers. We do some collaborative work with software developers and technical consultants on various projects.

 The purpose of our work, mainly, is to help our clients organize their digital assets. These could be visual or text‑based. The idea here is to make the assets found more quickly and more easily by their end users.

Initially, my role in this field was working within the stock photo library, in search data and search vocabulary for a major global stock photo library based in London.

From here, I’ve worked with specialist collections, where the nature of keywording is very different, and also in the museum and heritage sector; again, working with data in a very different format on a digitization process. The experience across those different fields is quite different when you look at it from a keywording perspective.

Just to clarify now, I’m a consultant for various businesses. This is really key, as the proliferation of visual media continues to grow. We’re very closely looking at the way we handle digital content, how we make sense of that digital content, how we make the information relevant, and more available to more people.

It has huge potential for our customers and for their end users, in terms of improving the search experience and the access to these assets. I think that pretty much summarizes where we are at the minute, in terms of who we work with, and what we provide for those people.

Henrik:  What are the biggest challenges and successes you’ve seen with keywording services?

Clemency:  One of the biggest challenges really is the perception that keywording is pretty much the same as tagging. Obviously with the rise of SEO, we’ve got some confusion here about what keywording is. We started keywording many years ago.

Obviously within librarianship and archival work, people were keywording as a way to retrieve information, which is still what we do, but I think the challenge here is breaking down these perceptions that it’s always a very basic way of tagging content.

We’re trying to differentiate between keywording which is, on its basic level, adding words that define an image or the content of an image, and high performance keywording which is very much a user‑focused exercise.

It’s a very 360‑degree look at the life cycle of the image and how that image will be ultimately consumed and licensed for use in the broader digital environment.

One of the challenges is highlighting the value of a high quality, high performance keywording project to the customers, and also their end users and the various stakeholders therein.

I think working with specialist collections can be quite challenging. We have to create bespoke keywording hierarchies and controlled vocabularies for these clients, which obviously makes the access to the content much more. The performance of that is much greater, but it can be challenging. It can be quite time‑consuming.

There’s a level of education that we need to have with our clients, to illustrate to them and demonstrate to them the return on investment that can be had from a good keywording methodology. By the methodology, I just wanted to define that, which links to the challenges that we have to do with technology and the extent to which we use controlled vocabulary systems and software, and the hierarchies that we build for our clients.

They help to define the depth to which we can classify content, and also, the breadth of that content. The content may be video footage, or it may be photography. It may be illustration.

Obviously, a challenge there is creating a vocabulary or a taxonomy that will cater for an ever‑increasing collection, one that is growing and evolving as businesses themselves incorporate new content into their collections.

Technology is a challenge, but it’s also a great facilitator in the work that we do. It allows us to embed a level of accuracy and consistency to the work that we do for our clients.

When you’ve got measures in place, and you’re creating controlled vocabularies and hierarchies, you’ve got systems there that make sure the right vocabulary is being applied, and it’s being applied consistently and accurately. There’s a level of support that the technology can offer, as well as it having its own challenges.

Perhaps on a more general level, keywording has been tarnished somewhat by some multi‑service agencies which are offering keywording as a bit of a sideline.

Perhaps their core business may be software or systems development or post‑production, but then, by offering keywording as an offshoot, some clients are going down that road and then discovering later on that actually, the keywording side of that was a bit of an afterthought. I think the methodologies and strategies in place have failed some of the clients that we work with, at any rate.

There’s a challenge there for us to make sure that we can differentiate between specialist keywording provider and an agency that offers keywording as an additional add‑on to their core business.

I think another challenge that is worth mentioning is the idea of offshoring keywording to agencies where perhaps the quality is compromised, and this is what I hear from clients. The feedback on some projects has been that there’s been a lack of understanding, due to language barriers mainly, but also cultural understanding of visual content.

It can be quite difficult, across the continents, for people to read and interpret visuals in the way that your market may perhaps be consuming those visuals. There’s a challenge in, again, educating people into the options and the various consequences of using these various agencies.

Henrik:  Clemency, as of March, 2016, how much of the keywording work is completed by people versus machines?

Clemency:  We know that there is a lot of work being done in auto‑tagging and systems that will automatically add keywords that are relevant to the content. In my business, we define automated systems and keywording in a much more specific way.

We use it to automate the addition of, say, synonyms, or to automatically translate keywords, or to automatically add hierarchical keywords, but I think, Henrik, what you’re asking really is about the image recognition technology, which is something we’re clearly aware of and we have been for some years now.

Image recognition is not something that we currently engage with or consult on. It’s in its infancy, and it will be very exciting to follow these developments, but for now it’s quite limited to reading data in a very simple form.

For example, color and shape, and to some extent, say, for example, number of people within an image is something image recognition technology can do, but I think there is quite a lot of documentation to support the idea that it’s very difficult for a machine to understand the sentiment behind an image, the concept, or the emotion.

I was thinking of a good example of an image of a person smiling. I’m not sure, I’m not convinced, the extent to which a machine could determine whether that smile is one of happiness or one of sarcasm, for example.

A person looking at an image will make a certain assumption about that smile. Maybe it is subjective, but I think it’s just something that’s perhaps a little bit too advanced for machines at the minute, to be able to read the emotional side of visual content, which is really the field that I’m most interested in, most active in.

I think the technology will improve, but underpinning that, it really depends on who will be responsible for managing the architecture and the taxonomy, and maintaining that, and editing it, and developing it, because of course, we need people to put the intelligence into the structure behind the technology.

Although we can increase efficiency, and that’s great, and we need to increase efficiency and reduce costs and increase productivity, I think there’ll be a lot of management required and people involved in making sure that the technology is delivering consistently relative results, and testing, and testing, and testing to see that this is how it’s happening.

But, as I say, we question the extent ultimately to which machines can interpret the more conceptual content and the visual content that we work with primarily, because visual media is always open to interpretation.

It’s a subjective form that perhaps machines will go so far, in terms of classifying basic content, which will be very, very helpful, and it certainly will help speed up the processes for people like us, but I think for the user we have to be mindful that relevance is really the most critical element of this whole process.

Henrik:  What advice would you like to share with people looking into keywording services?

Clemency:  I’ve been working with keywording for 14 years, and it’s a really varied and rich resource for anybody who’s interested in looking into keywording services.

I have a few ideas here, which are from my experience working with clients and from gathering feedback from clients, but I think the advice would be generally that there is no quick fix. Keywording isn’t something that you can pull out of a box. There’s no standard as such.

Even though we’re told there is a standard,the stock libraries that set standards are having to change those constantly because the distribution networks are changing and the media types are changing.

Be prepared for it to be a fluid project. If you start engaging with a keywording service. It will probably evolve over time. It will change over time, and that’s a good thing.

You need to be prepared to talk quite a lot about your business goals and objectives, perhaps more than you think. A good keywording agency will want to know a lot about your market, about your channels, your network, your distribution.

They won’t want just to see the content, because if they just see the content and they just add keywords, there’s a lack of connection from a marketing and a sales perspective. It’s very important for the keywording agency to understand your business and the context within which your business sits in the bigger picture.

Be prepared to be asked quite a lot of questions before you start engaging with a keywording provider.

The other main thing is to be wary, perhaps, of agencies that seem more focused on volumes and deadlines than they do quality. I alluded to that earlier on, with some of the options to offshore your work.

This can be a bit of a false economy. It can be, in the long run, more expensive to focus on volumes and timeframes. Quality’s always a good groundwork to base your keywording projects on.

Also, I’d advise people to work with someone who’s a communicator, someone who’s going to uncover the problem and really spend time and effort in solving that problem. They’ll want to see samples of your assets before they start giving you prices.

I think that’s a really important conversation to have. It’s really important to have good communication with your provider and also a good level of trust, so I’d advise you to find out who they’ve worked with and if possible try to speak to their clients, who they have worked with.

Another great idea would be to speak to picture researchers, because they use keywords day in and day out. They’re on stock photo websites, publishing, advertising, and design agencies.

People that use picture researchers and picture buyers would be a really great source of information, just to ask them what their experience is working with various providers of the content, because then from there you can track who has been investing well in good keywording, and what that means, and where the value is in that.

Most of the software that you look at will not do everything that you need it to do, and I think that’s another important thing to bear in mind from a technological standpoint, is systems are great and you’d do well to consult with someone who knows a lot about different systems.

But ultimately it’s best to configure a system that’s bespoke for your needs, so perhaps maybe investing a little bit more time than you first anticipated in researching systems that will be fit for your purpose and give your clients the best experience as a user.

Henrik:  Great. Where can we find more information about keywording services?

Clemency:  There’s various resources online. There are some really interesting blogs. We can put links in here for you for your readers, if they’re interested. One great independent resource, which I think is fantastic for all industry news in general, is Photo Archive News, which is a news aggregation. They list services and providers that you might want to contact and speak to.

You’ll also find information about keywording services on stock library websites. For example, Alamy has a list of resources*, and there are marketing services such as Bikinilists listing various resources available to the industry, but also mentioning keywording agencies that you might be able to work with across the globe. There are keywording agencies based in the US. There are agencies in New Zealand and across Europe.

I think, just to go back on the conversation previously, there’s a lot of research to be done. It does take a little bit of time, but I think when you find an agency that really understands what you’re looking for then you’ve got that conversation to have with them about what you’re specifically looking to achieve.

Henrik:  Thanks, Clemency.

Clemency:  Yes, thanks, Henrik. I hope it’s been a useful insight into the world of keywording.

Henrik:  For more on this, visit Tagging.tech. Thanks again.


 

For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Joe Dew

Tagging.tech presents an audio interview with Joe Dew about image recognition

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

 

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Joe Dew. Joe, how are you?

Joe Dew:  I’m well. How are you?

Henrik:  Good. Joe, who are you and what do you do?

Joe:  I am the Head of Product for a company called JustVisual. JustVisual is a deep learning company focused on computer vision and image recognition. We’ve been doing this for almost eight years. What my role is in the company is…think of me as the interface between engineering and computer vision scientist and end customers.

We have a very deep technology bench and technology stack that does very sophisticated things, but translating a lot of that technology and capabilities to end‑consumers can be a challenge. Likewise, we have customers who are interested in the space, but aren’t really clear how to use it. My role is to translate their needs into requirements for engineering.

Henrik:  Joe, what are the biggest challenges and successes you’ve seen with image and video recognition?

Joe:  I think the biggest challenge is, for a little perspective, is that the human brain has evolved for millions of millions of years to be able to handle and process visual information very easily. A lot of the things that we as humans can recognize and do ‑‑ even a two‑ or three‑year‑old child can do ‑‑ is actually quite difficult to do for computers and takes a lot of work.

The implication of this is that the expectations from users on precision and accuracy when it comes to visual recognition is very, very high. I like to say there’s no such thing as a visual homonym.

Meaning that, if you did a text search, for example, and you typed in the word jaguar and it comes back with a car, and it comes back with a cat, you can understand why the search result came back that way. If I had asked the question with a visual ‑‑ if I queried a search engine with an image ‑‑ and it came back with a car when I meant for a cat it would be a complete fail.

When we’ve done testing with users, on visual similarity for example, the expectations of the similarity is very, very high. They expect something like almost an exact match when they’re asking. It’s largely because we, as humans, expect that. Again, if you think about how we interact with the world digitally, it’s actually a very unnatural thing.

When you search for things, you have to translate that, oftentimes, into a word or a phrase. You type it into a box and it returns words and phrases at which point you then need to translate again into the real world.

In the real world, you just look at something, you say, “Hey, I want something like that.” It is a picture in your mind, and you expect to receive something like that. What we’re trying to do is solve that problem, which is very tricky thing for computers to do at this point. But, having said that, in the field there’s been tremendous improvements in this capability.

Companies from Google to Facebook to Microsoft, for example, are doing some very interesting work in that field.

Henrik:  Joe, as of March 2016, how do you see image in video recognition changing?

Joe:  I think the three big factors that are impacting this field is increasing rise in processing power of a hardware, just the chip technology, Moore’s law, that type of thing.

Secondly is a vast improvement in the sophistication of algorithms or, specifically, deep learning algorithms that are getting smarter and smarter in training.

The third is, the increase in data. There is just so much visual data now ‑‑ which has not been true in years past ‑‑ that can be used for training and for increase in precision and recall. Those are the things that are happening on the technology field.

The translation of all of these is the accuracy of image recognition and, for that matter, video recognition will see exponential improvements in the next few months even, let alone years. You started to see that already. You start seeing that in the client‑side applications and robotics, websites, and the ability to extract pieces out of an image and see visually similar results.

Henrik:  Joe, what advice would you like to share with people looking at image and video recognition?

Joe:  I think the understanding the use case is probably the most important thing to think about. Oftentimes, you hear about the technology and what it can do, but you need to really think thoroughly about what, exactly, do you want the technology to do.

As an example, a lot of the existing technology today does what we called image recognition, or the idea of taking an image or a video clip and essentially tagging it with the English language words. Think of it as translating an image into text. That’s very useful for a lot of cases, but oftentimes, from a use case ‑‑ from a user ‑‑ it’s not that useful.

If you take a picture of a chair, for example, and it returns back chair, the users says, “I know it’s a chair. Why do I need this technology to tell me it’s a chair?” But, “What I’m really looking for is a chair that looks like this. Where can I find it?” That is a harder question to answer, and that is not an exercise where you’re simply translating it to words.

We found that there are companies that use Mechanical Turk techniques, etc. to essentially tag images, but users have not really adopted to that because, again, it’s not that useful. That’s one thing, is think about the use case of what exactly do you want the technology to do.

A lot of the machine learning and deep learning systems involve a lot of training. The other part you need to think about is, what do you want the algorithm to train for? Is it simply tagging or is it to extract certain visual attributes? Is it pattern? Is it color? What is it that you actually want the algorithm to see, essentially?

Then the third area is, right now, user adoption of the technology is still pretty low. I think that as it becomes broader and broader and more commonplace, you start seeing it in more and more applications, it will increase in adoption, but the concept of using an image as a query is still very foreign to most people.

When you say visual search, it doesn’t really mean anything to them. There’s a whole user adoption curve that has to happen before they can catch up to the technology.

Henrik:  Where can we find out more information about image and video recognition?

Joe:  You can go to our site, justvisual.com, to give you some background of what we do. There’s just a lot of interesting companies and researches happening right now in the field. It’s little bit all over the place, so there isn’t necessarily one place that has all the information, because the field is changing so quickly. It’s exciting times for this field.


 

For a book about this, visit keywordingnow.com