tagging.tech

Audio, Image and video keywording. By people and machines.


Leave a comment

Tagging.tech interview with Nicolas Loeillot

Tagging.tech presents an audio interview with Nicolas Loeillot about image recognition

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

 

Henrik de Gyor:  This is Tagging.Tech. I’m Henrik de Gyor. Today, I’m speaking with Nicolas Loeillot. Nicolas, how are you?

Nicolas Loeillot:  Hi, Henrik. Very well, and you?

Henrik:  Great. Nicolas, who are you, and what do you do?

Nicolas:  I’m the founder of a company which is called LM3Labs. This is a company that is entering into its 14th year of existence. It was created in 2003, and we are based in Tokyo, in Singapore, and in Sophia Antipolis in South France.

We develop computer vision algorithm software, and sometimes hardware. Instead of focusing on some traditional markets for this kind of technology, like military or security and these kind of things, we decided to focus on some more fun markets, like education, museums, entertainment, marketing.

What we do is to develop unique technologies based on computer vision systems. Initially, we are born from the CNRS, which is the largest laboratory in France. We had some first patents for triangulations of finger in the 3D space, so we could very accurately find fingers a few meters away from the camera, and to use these fingers for interacting with large screens.

We thought that it would be a good match with large projections or large screens, so we decided to go to Japan and to meet video projector makers like Epson, Mitsubishi, and others. We presented the patent, just the paper, [laughs] explaining the opportunity for them, but nobody understood what would be the future of gesture interaction.

Everybody was saying, “OK, what is it for? There is no market for this kind of technology, and the customers are not asking for this.” That’s a very Japanese way to approach the market.

The very last week of our stay in Japan, we met with NTT DoCoMo, and they said, “Oh, yeah. That’s very interesting. It looks like Minority Report, and we could use this technology in our new showroom. If you can make a product from your beautiful patent, then we can be your first customer, and you can stay in Japan and everything.”

We went back to France. We met the electronics for supporting their technology. Of course, some pilots were already written, so we went back to NTT DoCoMo, and we installed them in February 2004.

From that, NTT DoCoMo introduced us to many big companies, like NEC, DMP, and some others in Japan, and they all came with different type of request. “OK. You track the fingers, but can you track the body motion? Can you track the gestures? Can you track the eyes, the face, the motions and everything?”

We made a strong evolution of the portfolio with something like 12 products today, which are all computer vision‑related, which are usually pretty unique in their domain, even if we have seen some big competitors like Microsoft [laughs] on our market.

In 2011, we were the first to see the first deployment of 4G networks in Japan, and we said, “OK. What do we do with the 4G? That’s very interesting, very large broadband, excellent response times and everything. What can we do?”

It was very interesting. We could do what we couldn’t do before, which is to put the algorithm on the cloud and to use it on the smartphone, because the smartphone were becoming very smart. It was just beginning of the smartphones at the time, with the iPhone 4S, which was the first one which was really capable of something.

We started to develop Xloudia, which is today one of our lead products. Xloudia is mass recognition of images, products, colors, faces and everything from the cloud, and in 200 milliseconds. It goes super fast, and we search in very large databases. We can have millions of items in the base, and we can find the object or the specific item in 200 milliseconds.

Typically, applying the technology to augmented reality, which was done far before us, we said, “OK. The image recognition can be applied to something which is maybe less fun than the augmented reality, but much more useful, which is the recognition of everything.”

You just point your smartphone to any type of object, or people, or colors, or clothes, or anything, and we recognize it. This can be done with the algorithm, with the image recognition and the video recognition. That’s a key point, but not only with these kind of algorithms.

We need to develop some deep learning recognition algorithm for finding some proximities, some similarities, and to offer the users more capabilities than saying, “Yes, this is it,” or, “No, this is not it.” [laughs]

We focus on this angle, which is, OK. Computer vision is under control. We know our job, but we need to push the R&D into something which is more on the distribution of the search on the network ought to go very fast. That’s the key point. The key point was going super fast, because for the user experience, it’s absolutely momentary.

On the other hand is, “If we don’t find exactly what is searched by the user, how can we find something which is similar or close to what they are looking for?” There is an understanding of the search, which is just far beyond the database that we have in catalog, and just to make some links between the search and the environment of the users.

The other thing that we focus on was actually the user experience. For us, it was absolutely critical that the people don’t press any button for finding something. They just have to use their smartphone, to point it to the object or to the page, or to the clothes, or anything that they want to search, and the search is instantaneous, so there is no other action.

There is no picture to take. There is no capture. There is no sending anything. It’s just capturing in real time from the video flow of the smartphone, directly understanding what is passing in front of the smartphone. That was our focus.

On this end, it implies a lot of processes, I would say, for the synchronization between the smartphone and the cloud. Because you can’t send all the information permanently to the cloud, so there is some protocol to follow in terms of communication. That was our job.

Of course, we don’t send pictures to the cloud because it’s too heavy, too data‑consuming. What we do is making a big chunk of the extractions or of the work on the smartphone, and sending only the necessary data for the search to the cloud.

The data, they can be feature points for the image. They can be a color reference extracted from the image. They could be vectors, or they could be a series of images from a video, for instance, just to make something which is coherent from frame to frame.

That’s Xloudia, super fast image recognition with the smartphone, but cloud‑based, I would say, and the purpose is really to focus on the user experience, to go super fast, and to always find something back [laughs] as a reference.

The target market may be narrower than what we had before with augmented reality, and what we target is to help the e‑commerce, or more specifically, the mobile commerce players to be able to implement the visual search directly into their application.

The problem today that we have even in 2016, the problem is that when you want to buy something on your smartphone, it’s very unpleasant. Even if you go to bigger e‑commerce companies like Amazon and the others, what you have on your smartphone is just a replication of what you can see on the Web, but it’s not optimized to your device. Nobody’s using the camera, or very few are using the camera for search.

The smartphone is not a limited version of the Web, typically. It’s coming with much more power. There is cameras. There are sensors, and many things that you’d find on a smartphone which are not on a traditional PC.

The way we do mobile commerce must be completely different from the traditional e‑commerce. It’s not a downgraded version of the e‑commerce. It must be something different.

Today, we see that 50 percent of the Internet traffic to big brand website is coming from the smartphone. 50 percent, and 30 percent of the e‑commerce is done from mobile.

It means that there is a huge gap between these 50 percent and these 30 percent. There is 20 percent of the visitors who don’t buy on the smartphone because of this lack of confidence or economics or something.

There is something wrong on the road to [laughs] the final basket. They don’t buy with the smartphone, and this smartphone traffic is definitely increasing with time, as well. It’s 50 percent today for some big brands, but it’s increasing globally for everybody.

There are some countries, very critical countries like Indonesia or India, who have a huge population, more than 300 million in Indonesia, one billion people in India. These guys, they go straight from nothing to the latest Samsung S6 or 7.

They don’t go through the PC stage, so they directly buy things from the smartphone, and there’s a huge generation of people who will just buy everything on their smartphone without knowing the PC experience, because there is no ADSN lines because there are so many problems with the PC. It’s too expensive, no space, or whatever.

We target definitely these kind of markets, and we want to serve the e‑commerce or the mobile commerce pioneers, people who really consider that there is something to be done in the mobile industry for improving the user experience.

Henrik:  What are the biggest challenges and successes you’ve seen with image and video recognition?

Nicolas:  If you want to find something which is precise, where everything is fine today, 2016 saw many technologies, algorithms, where you can compare, “OK. Yes, this is a Pepsi bottle, and this is not a Coca‑Cola bottle,” so that’s pretty under control today. There is no big issue with this.

The challenge ‑‑ I would prefer to say war ‑‑ is really understanding the context, so bringing more context than just recognizing a product is, “What is the history? What is the story of the user, the location of the user? If we can’t find, or if we don’t want to find a Pepsi bottle, can we suggest something else, and if yes, what do we suggest?”

It’s more than just tagging things which are similar. It’s just bringing together a lot of sources of information and providing the best answer. It’s far beyond pure computer vision, I would say.

The challenge for the computer vision industry today, I would say, is to merge with other technologies, and the other technologies are machine learning, deep learning, sensor aggregations, and just to be able to merge all these technologies together to offer something which is smarter than previous technologies.

On the pure computer vision technologies, of course, the challenge is to create database or knowledge where we can actually identify that some object are close to what we know, but they are not completely what we know, and little by little, to learn or to build some knowledge based on what is seen or recognized by the computer vision.

One of the still‑existing challenge…It’s a few decades that I am in this industry, but [laughs] there is still a challenge which is remaining, which is actually the, I would call it the background abstraction or the noise abstractions, is, “How can you extract what is very important in the image from what is less important?”

That’s still something which is a challenge for everyone, I guess, is just, “What is the focus? What do you really want? Within a picture, what is important, and what is not important?” That is a key thing, and algorithms are evolving in this domain, but it’s still challenging for many actors, many players in this domain.

Henrik:  As of March of 2016, how do you see image and video recognition changing?

Nicolas:  The directions are speed. Speed is very important for the user experience. It must be fast. It must be seamless for the users.

This is the only way for service adoption. If the service is not smooth, is not swift ‑‑ there is many adjectives for this in English [laughs] ‑‑ but if the experience is not pleasant, it will not be adopted, and then it can die by itself.

The smoothness of the service is absolutely necessary, and the smoothness for the computer vision is coming from the speed of the answer, or the speed of the recognition. It’s even more important to be fast and swift than to be accurate, I think. That’s the key thing.

The other challenge, the other direction for our company is definitely deep learning. Deep learning is something which is taking time, because we must run algorithms on samples on big databases for building an experience, and building something which is growing by itself.

We can’t say that the deep learning for LM3Labs, or for another company, is ready and finished. It’s absolutely not. It’s something which is permanently ongoing.

Every minute, every hour, every day, it’s getting there, because the training is running on, and we learn more to recognize. We improve the recognitions, and we use the deep learning for two purpose at LM3Labs.

One of them is for the speed of recognitions, so it’s the distribution of the search on the cloud. We use deep learning technologies for smartly distributing the search and going fast.

The other one is more computer vision‑focused, which is to, if we don’t find exactly something that the user is trying to recognize, we find something which is close and we can make recommendations.

These recommendations are used for the final users so they can have something at the end, and it’s not just a blank answer. There is something to propose, or it can be used between the customers.

We can assess some trends in the search, and we can provide our customers, or B2B customers, we can provide them with recommendations saying, “OK. This month, we understand that, coming from all our customers, the brand Pepsi‑Cola is going up, for instance, instead of Coca‑Cola.” This is just an example. [laughs] That’s typically the type of application that we use with the deep learning.

Henrik:  What advice would you like to share with people looking at image and video recognition?

Nicolas:  Trust the vision. The vision is very important. There are a lot of players in the computer vision community today.

Some have been acquired recently, like Metaio by Apple, or Vuforia by PTC are two recent examples, and some people are focused on the augmented reality, so really making the visual aspect of things. Some others are more into cloud for the visual search, and just improving the search for law enforcements and these kind of things.

The scope, the spectrum of the market is pretty wide, and there are probably someone who has exactly the same vision than you [laughs] on the market.

On our side, LM3Labs, we are less interested in augmented reality clients, I would say. We are less interested in machine‑to‑machine search because this is not exactly our focus, either.

We are very excited by the future of mobile commerce, and this is where we focus, and our vision is really on this specific market segment. I would say the recommendation is find a partner who is going with you in terms of vision. If your vision is that augmented reality will invade the world, go for a pure player in this domain.

If you have a smart vision for the future of mobile commerce, join us. [laughs] We are here.

Henrik:  Thanks, Nicolas. For more on this, visit Tagging.tech.

Thanks again.


 

For a book about this, visit keywordingnow.com


Leave a comment

Tagging.tech interview with Kevin Townsend

Tagging.tech presents an audio interview with Kevin Townsend about keywording services

 

Listen and subscribe to Tagging.tech on Apple PodcastsAudioBoom, CastBox, Google Play, RadioPublic or TuneIn.

Keywording_Now.jpg

Keywording Now: Practical Advice on using Image Recognition and Keywording Services

Now available

keywordingnow.com

 

Transcript:

Henrik de Gyor:  This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Kevin Townsend. Kevin, how are you?

Kevin Townsend:  Good, thank you.

Henrik:  Kevin, who are you and what do you do?

Kevin:  I’m the CEO and Managing Director for a company called KeedUp. What we do is keywording, but also adding other metadata, fixing images, image flow services; a whole heap of things, but keywording and metadata is really the core of what we do.

What makes us a little bit different to maybe some other keywording companies is that we started out from a basis of being involved in the industry as a syndicator/image seller. We were like a photo agency, photo representative, like many of our customers ‑‑ in fact almost all of our customers.

As a result, we’ve developed services in a somewhat different way. For instance, we operate 24 hours a day, seven days a week. We do celebrity as well as stock. Everybody that works for us pretty much is working in an office. There’s no piecework. Almost all of our staff are university graduates.

Henrik:  Kevin, what are the biggest challenges and successes you’ve seen with keywording services?

Kevin:  I think the biggest challenge, certainly for us, has been dealing with the multitude of requirements and the different systems that our customers work with. It’s never really a thing where you are just sent some images and are allowed to do whatever you like to them and provide the best keywording or the best metadata you can.

Everybody has their own things that they want done. There are all these different standards, like you might be keywording for a Getty Images standard, or back when it used to be a thing, the Corbis standard, and so on and so forth.

Dealing with all of those different things I think is the real big challenge in keywording and delivering exactly what people want. That’s the real key.

I think the successes, kind of related, is that we’ve built systems that have enabled us to cope with all of those different things, things such as our own workflow system called Piksee, which it really did cut out an awful lot of handling time and wastage just dealing with sets of images.

Or we have our own client database which records and enables all our staff to know exactly, down to the contributor level, all of the things that you maybe want to do differently for one photographer over another when it comes to metadata or fixing your images.

Just a whole series of things that, when I first started, I didn’t realize all of these nuances would come into play, but they really are crucial to delivering a good service.

The result of that has been that our reputation is such that we tend to work for the big names ‑‑ certainly in the news, celebrity, and increasingly in the stock area as well ‑‑ like Associated Press, like Splash News, and like Magnum. It’s being successful in that we’ve managed to defeat the problem, I suppose.

Henrik:  As of early March 2016, how much of the keywording work is completed by people versus machines?

Kevin:  I guess it depends on how you work that figure out. In terms of, if the question is how many of the images that we work on are touched by human beings deciding on what keywords go into the images, that figure is really 100 percent.

But, and this is important, the technology that you have to assist them in doing that and doing a good job is quite considerable. I don’t think that’s it’s appreciated, I think, often by maybe photographers, or particularly amateurs out there, exactly what goes into what I’d call professional keywording as opposed to “seat of your pants” keywording.

We don’t sit there very often and keyword one image after another, searching into our memory banks, trying to come up with the best keywords. There are systems, vocabularies. There are ways for handling the images, organizing the images.

So much technology is involved there to really make the humans that we have the best that they can be.

I have to say, in that regard, what we always are doing ‑‑ and as I said earlier, we employ almost exclusively university graduates, people who have degrees in communication studies or English, or art history ‑‑ is that we’re trying to have the best supercomputer to do the keywording, which is the human brain, and the most educated and best-programmed supercomputer.

Then we add the technology on top. So, yes, 100 percent of the work in the end is done by people, but certainly with a lot of assistance from technology.

If you look into the future, the far future, I feel sure that one-day artificial intelligence will probably do a lot of things for all of us in all sorts of areas we’re not even vaguely aware of now.

We’re starting to see some of that happen already in all sorts of things to do with apps on your phones that can tell you how to do this, that, and that other, and account for your heartbeat; all sorts of things that are happening with artificial intelligence, which is great.

When it comes to keywording, what I see is not very flattering at the moment, which is not to say that it may not get there in the end. But I think what I need to do is try to put things in a little bit of perspective, at least from where I see it.

The level of complication that I was talking about earlier, which is really the key to good keywording, I think is where at the moment AI keywording falls down completely, and even before that it’s falling over some hurdles right now.

On my blog recently, I did a post about one AI provider, and they invite you to put test images in to see what they can do. Well, [laughs] the result was particularly unedifying, in that a lot of the keywords were just completely wrong. The point of the images was completely missed. They weren’t able to name anybody in the images.

It was really a pretty poor effort, and even the examples they had on their website, showing what they considered to be successes, there were very few keywords in terms of what would be acceptable commercially.

Also, a lot of the keywords were extremely inane and almost pointless; certainly nothing that would fit into a vocab that you would be able to submit to Getty, for instance, or that would be acceptable to Alamy. This is a long, long, way from where it needs to get.

Perhaps the best analogy, that I could explain how I view things at the moment with AI and keywording, is a few years ago I went see the Honda robot which had come to town.

They had spent millions and millions and millions of dollars on this robot, and its big claim to fame was that it could walk upstairs, which it did. Not particularly well, but it did it. It was a great success, and everyone was very happy.

Thing is, any three‑year‑old kid in the audience could have run up and down those stairs and run around the robot many times.

I feel that AI keywording is a bit like that robot at the moment. Yes, it’s doing some rudimentary things, and that looks great, and people who think it’s a good idea and it’s all going to be wonderful, can shout about it, but it’s a long way from the reality of what humans are able to do. A long, long way.

I think where you have to consider the technology has to go is if you want to carry on the robot analogy, is to really be able to do the sort of keywording with concepts and meeting all these challenges of different standards, they have to be more like an android than they need to be like a robot that can assemble a motor vehicle.

Now, how long it’s going to take us to get to that sort of stage, I don’t know. I would be very doubtful that the amount of money and technology, and what have you, that would be needed to get us to that point is going to be directed towards keywording.

I’m sure there’ll be much more important things that sort of level of technology would be directed at. But certainly one day, maybe in my lifetime, maybe not, we’ll probably wake up and there’ll be androids doing keywording.

Henrik:  Kevin, what advice would you like to share with people looking into keywording services?

Kevin:  I think that it’s one of those things, it’s the oldest cliche, that you do get what you pay for, generally speaking.

We have had so many people who have come to us who have gone down the route of trying to save as much money as they could, and getting a really poor job done, finding it didn’t work for them, it wasn’t delivering what they wanted, and they’ve ended up coming and getting the job done properly.

For instance, at Magnum we have taken over the keywording there from what used to be crowd‑sourced keywording, which was particularly poor. That’s really made a big difference to them, and I know they’re very happy.

There are other examples that we’ve had over the years with people who’ve gone off and got poor keywording and regretted it. Just to use another old saying, no one ever regrets buying quality, and I think that is very true with keywording.

Henrik:  Where can we find more information about keywording services?

Kevin:  Right. We have a website www.keedup.com. We have a blog. We are also on Facebook, on Twitter, and on LinkedIn. We’re in a lots of different places. If you go there as a starting point, there are links there to other sites that we have. That’s a good place to start.

We have a site called coreceleb.com that’s a site which is an offshoot of what we do, which is focused really on editing down and curating the images that people are creating, so that you have more sales impact.

We also have brandkeywording.com, which is focused on adding information about brands that celebrities are wearing and using; not just fashion, but also what cars they drive, all sorts of things really to add new revenue streams, particularly for celebrity photo agencies, but also there’s no reason why that doesn’t include sports news and even stock.

Those are two which are really pretty important as well.

Henrik:  Thanks, Kevin.

Kevin:  Good. [laughs] I hope that will give people some food for thought.

Henrik:  For more on this visit Tagging.tech.

Thanks again.


 

For a book about this, visit keywordingnow.com