Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Jonas Dahl. Jonas, how are you?Jonas Dahl: Good. How are you?Henrik: Good. Jonas, who are you and what do you do?Jonas: Yeah, so I’m a product manager with Adobe Experience Manager. And I primarily look after our machine learning and big data features across all AEM products, so basically working with deep learning, graph-based methods, NLP, etc.
Henrik: Jonas, what are the biggest challenges and successes you’ve seen with image recognition?
Jonas: Yes. Well, deep learning is basically what happened, what defines before and after. So, basically in 2012, there’s a confluence of the data piece that is primarily enabled by the Internet, large amounts of well-labeled images that could drive these huge deep learning networks. There’s the deep learning technology and, obviously, the availability of raw computing power. So, that’s basically what happened. And with that we saw accuracy increase tremendously, and now it’s basically rivaling human performance, right? So we see both accuracy and also kind of the breadth of labeling you can do in classification you can do has just increased and improved tremendously in the last few years.
In terms of challenges, what I see is, I really see this as a path you’re going in or the first step is kind generic tagging of images, right? So what’s in an image? Are their people in it? What are the emotions? Stuff like that that’s pretty generic. And that’s kind of the era we’re in right now where we see a lot of success and where we can really automate these tedious tagging tasks at scale pretty convincingly.
I think the challenge right now is to move to kind of the next step, which is to personalize these tags. So, basically provide tags that are relevant not just to anyone but to your particular company. So, if you’re a car manufacturer and you want to be able to classify different car models. If you’re a retailer, you may want to be able to do fine grain classification of different products. So that’s the big challenge I see now and that’s definitely where we are headed and where we’re focusing on in all apps.
Henrik: And, as of November 2016, how do you see image recognition changing?
Jonas: Well, really where I see it changing is, as I said, it’s going to be more specific to the individual customer’s assets. It’s going to be able to learn from your guidance. So, basically, how it works now is that you have a large repository of already-tagged images, then you train networks to do classification. What’s going to happen is that we’re going to add a piece that makes this much more personalized, much more relevant to you, and where the system learns from your existing metadata and your guidance, basically, as you curate the proposed tags.
Another thing I see is video, it’s going to be more important. And video has that temporal component, which makes segmentation important, and that’s how that differs from images. So there’s that, and also the much larger scale that we’re looking at in terms of processing and storage when we’re talking about video. Basically, video is just a series of images, so when we develop technologies to handle images, those can be transferred to the video pieces, as well.
Henrik: Jonas, what advice would you like to share with people looking at image recognition?
Jonas: Well, I would say start using it. start doing small POCs [proof of concepts] to get a sense of how well it works for your use case and kind of define small challenges that, small successes you want to achieve and just get into it. This is something that is evolving really fast these days, so getting in and seeing how it performs now, then you’ll be able to provide valuable feedback to companies like Adobe. So you can basically impact the direction that this is going in. It’s something we value a lot. It’s really valuable to us that when we run beta programs, for instance, that people come to us and say, “You know, this is where this worked really well. These are the concrete examples where it didn’t work that well,” or, “These are specific use cases that we really wish that this technology could solve for us.”
So now is a really good time to get in there and see how well it works. And also, I’d say, just stay on top of it. Stay in touch because, as I said, this evolves so fast that you may try it today and then a year from now things can look completely different, and things can have improved tremendously.
So that’s my advice. Now is a good time. I think the technologies have matured enough that you can get real solid value out of them. So this is a good time to see what can these technologies do for you.
Henrik: Jonas, where can we find more information?
Jonas: Yeah, so we just at Adobe launched what we call Adobe Sensei, which is the collection of all the AI and machine learning efforts we have at Adobe. And going, just Googling that, and going to that website, that will be updated with all the exciting things that we are doing in that space. And I would recommend that you keep an eye on that because that’s something that’s going to really evolve the next few years.
Henrik: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Ramzi Rizk. Ramzi, how are you?
Ramzi: Hey Henrik, how are you? I’m good thanks.
Henrik: Great. Ramzi, who are you and what do you do?
Ramzi: I’m one of the founders and I’m the CTO at a company called, EyeEm.com. Based out of Berlin, we’re a photography company, been around for 5 and a half years now, where we’re a community and market-based for authentic imagery. Basically, photos taken by average people who have a passion for photography, but aren’t necessarily professionals. Over the past few years, we’ve invested a lot and built quite a few technologies around understanding the content context and aesthetic qualities of images.
Henrik: Great. What are the biggest challenges and successes with image recognition?
Ramzi: I think over the past few years there’s been an amazing explosion in the number of tools that are available, particularly out of deep learning that are available to actually automate a big part of the photographers’ workflow, if you want. That includes, of course, recognizing what is in a photo, as well as, was the quality of the photo are and making photos just that much easier to find, to search and to share. I think the greatest successes have been naturally the fact that we’re at a point now where we can, better than human accuracy, I would say, describe the content of a photo. A lot of the challenges would have to be around data. Deep learning is a very data-heavy field and that you need a lot of content that is properly labeled, properly tagged, in order to train these machines to recognize what’s in the images.
Over the past few years it’s gotten, things have gotten more and more accurate to the point where, in a lot of cases, machines are actually more accurate than humans at recognizing the various details in a photo. That being said, we as humans do have this innate ability to understand context and to draw the more subtle abstract notions of what an image is trying to compare and that is definitely significantly more challenging to model in a machine.
Henrik: As of October 2016, how do you see image recognition changing?
Ramzi: I think we’re getting to a point where the pure art of recognizing what is in a photo has become a commodity, I would say. In the next 6 months to a year, you should be able to just license a variety of APIs and Google has an API out, so do we, so does a few other companies that are specialized at understanding the content of a photo. I think image recognition in a classical sense, how we understand it. When you think 10 years ago we were talking about how amazing it is that we can now recognize cats in videos. I think that challenge is one that is solved and since it’s now a solved problem, we will be seeing, and we are seeing a lot of applications built on top of this, doing this that were previously not that possible.
That includes also having the ability to run these so-called models, these algorithms on your device, on your phone, and not having to upload content to the cloud, even in real time. Which means we’re at a point now where while you’re taking a photo, you can actually be getting real-time feedback on the quality of the image, on whether the photo that you’re taking is actually aesthetic appealing and the minute you shoot it, your phone has already stored all of the content of that photo, making it searchable right away.
Henrik: Ramzi, what advice would you like to share with people, looking into image recognition?
Ramzi: People looking into building image recognition solutions, I would recommend not to anymore, because as I said, the problem is solved. You don’t reinvent email, you build services on top of it, and I think today you’re at a point where you can build a lot of really exciting, interesting services on top of existing image recognition frameworks and existing APIs that offer this out of the box. For people looking at using it, I think this is the perfect time to actually start building these applications because technology is mature enough, it’s more than affordable, and it’s at a point where anyone can really build software, with the assumption that they understand what is in the photo.
Henrik: Where can we find out more information?
Ramzi: I would definitely have to pitch, eyeem.com/tech. If you’re interested in looking at applied image recognition. We offer an API where you can actually keyword your entire content, your entire image library for photography professionals or for amateurs. You can also have it caption or have images described in a full sentence, even more interesting is machines that have learned to now understand your personal taste. They can actually surface content that you know you will like, or surface content that you know your customers will like or that your significant other would like and then just simplify that entire process of really taking out the monotonous, boring work out of photography, out of photographers workflow.
As a photographer, you can just focus on the art of creation and on capturing that perfect moment. I think there’s a bunch of other services like Google Cloud Vision and so on, that you can also look at and learn more about what you can do with imagery today.
Henrik: Thanks Ramzi.
Ramzi: Thank you, Henrik. Pleasure speaking to you.
Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today, I’m speaking with Mark Sears. Mark, how are you?
Mark Sears: I’m doing great, Henrik, Thank you.
Henrik: Mark, who are you and what do you do?
Mark: My name is Mark Sears. I’m Founder and CEO of Cloud Factory. We spend a lot of time leveraging an on-demand workforce to structure data. We take a lot of unstructured data per clients and we process that in the cloud using a combination of human and machine intelligence. We do that for a lot of, mostly tech companies. We work a lot with technology companies that are looking for an API driven workforce to do tons of different use cases very relevant often to tagging tech would be things like tagging images for the purpose of machine learning. Or tagging images in terms of core business processes for things like intelligence. We do transcription and translation. We do a lot of document processing, again, trends like processing receipts and invoices. We do web research going out to do human powered screen scraping for lead generation, serum enrichment.
A lot of different, very tedious, routine, repetitive work. We do it in a bit of a different model. Again, what we refer to as cloud labor. The ability for organizations to send their work to the clouds and have it come back done accurately, quickly, cost-effectively in hours if not minutes. So that’s kind of the world that we claim.
Henrik: Mark, what are the biggest challenges and successes you’ve seen with crowdsourcing?
Mark: When we think of crowdsourcing, we often like to look at it compared to maybe more traditional outsourcing model. We actually consider ourselves to be somewhere in between. So, my view of the world is that traditionally having a large number of people working in a delivery center … Offshoring, outsourcing. You need to get work done. This is one option that obviously a lot of companies have used in the last 20 years. Is to send that work to a team, maybe thousands of people that are sitting in urban India, Philippines or China. That’s one way to get a lot of this type of paperwork done.
Another way, that’s more popular, recently, is to send it to a crowd and to do crowdsourcing. Our kind of view of the world is that crowdsourcing and sending out work to anonymous crowds, someone who maybe just signs up online and there’s not a really high level of engagement, accountability or ability to get quality from out of an anonymous, faceless crowd. We see that on one side of the spectrum. We see the other side of the spectrum being a traditional outsourcing. The view of the world that we have is right in between. It’s the idea of having an on-demand workforce that is leveraging automation and is highly efficient because of technology. But, at the same time, is not an anonymous crowd. We actually know and train, professionally managed and curated crowd. I think that’s a roundabout way of talking about how we view the world that I’ve seen and learned through a lot of different projects … The biggest challenge is often quality.
It’s really harnessing the tower of an anonymous crowd is something that’s quite hard to do. So we love kind of playing in the hybrid and finding that radical middle where you get the best of all worlds in terms of quality, scalability, elastically, cost-effectiveness, speed of turn around, etc. to accomplish your large data work projects.
Henrik: Mark, as of April, 2016, how do you see crown sourcing changing?
Mark: Moving forward, there’s no question that the rise of robots and the flattening of the world are two major trends that are affecting, not just crowdsourcing, but really the future of work and really how enterprises get their work done. As we think of both of those trends, the world becoming more and more flat because of mostly the internet as well as just the cost of devices to access the internet. We’ve had 1.1 billion people have come online in the last five years’ And there’s another billion expected in the next five years.
So you have this massive, global workforce that are now able to contribute to the tagging, and again, the routine repetitive work that every organization has deep inside that needs to get done. This new, untapped potential in being able to do online work and to leverage the talent that is equally distributed around the world. Again, acknowledging that opportunity is not. And so, we can really flatten the world with the internet with crowdsourcing and other online work approaches.
The other side of it again, is automation and the rise of robots. Any project or solution that is not thinking first how do we automate this … Is going to be left behind. We absolutely have to leverage technology. Automation takes on a different forms. Actually, automating the work itself, using AI, ML, etc, to automate pieces of our tagging, labeling, video, audio, transcribing processing type of workloads is definitely essential to do that. But a lot of the technology just is not there. Looking first to see what pieces can you actually automate.
And then also, of course, there’s the delivery and the receipt of the work. Being able to have the API to be able to send the work in and have it sent back once the work is completed, that automation. Having the automation of the workflow is well to streamline and speed things up and make things more cost-effective.
There’s automating the actual work and there’s the automating of processes of getting the work done and delivering and receiving that one. Really, I see that’s a huge trend that everyone is how do we make this more streamlined, more efficient, faster, more cost effective, less manual touches in these projects to really, really make things more effective. That does include, as well, trying to automate as much of the work that we can do -That’s one thing that we have really seen just the desire and requirement to find the right mix of human and machine intelligence for every project. For every solution. It really is different for every solution.
Trying to automate as much as we can with the approach, but obviously, there’s a lot of nuances in doing, kind of split, AB testing to kind of understand really what is the best, total cost of ownership of the solution depending on how much automation you include. Those are two trends definitely play into the future of getting this type of work done.
Henrik: Mark, what else would you like to share with people looking into crowdsourcing?
Mark: I think the key thing is understanding self serve versus full serve. There’s no question there’s power in leveraging a global workforce and accessing online and being able to send your repetitive data projects to a crowd. The question is that there is experience in doing that. A lot of people do like to have a self serve approach and accessing it themselves. Other people prefer to have experts that are there to help along the way in terms of making sure that you’re getting the quality out of the crowd that you’re expecting.
I think that as we look at the landscape, one way, I think somebody should be thinking about their project is, am I ready to do this on my own or is it better to maybe work with a little more enterprise-grade approach? We often encourage people to think about that span. If you’ve got a smaller project that you need done really quick, quality is not the highest priority. It’s going to be more that you just need it done quick and cheap. I think self-serve options to send that work out and get it back it really where you want to be going.
If you have a larger project or an ongoing project, one that requires really getting good, accurate work done, maybe there’s an opportunity to find a portion of that to be automated. All of those things, I think, you want to be looking for a little bit more of an enterprise-grade. Maybe a full service, professional service type approach. I think is a key thing that we would recommend that people think through as they begin to look at crowdsourcing as a way to get their project done.
Henrik: Mark, where can we find more information about crowdsourcing?
Mark: Crowdsourcing as a term has definitely been broad and changed. I think the usual source of Googling crowdsourcing is going to lead you in a lot of different directions from crowdfunding to Wikipedia to a lot of different directions. There definitely are some sources that are out there, but there’s not that many players that are really in this space. I think it’s great to take a look at everyone’s approach in terms of how, exactly the tools that they provide access to … Where you’d access the crowd. The services that they provide. How they manage, recruit and train and vet their workforce, their crowd. I think probably the best way is really to get out there and explore some of the different options that are available from different partners.
Specifically, in terms of finding some other places online to learn crowdsourcing.org is one good resource. Specifically, they have a cloud labor tab that has some good information. You can follow along and see how people are leveraging these distributed, virtual labor pools to fulfill a large variety of tasks. That’s one great place. Obviously, our particular take on the world at cloudfactory.com is another option … Thoughts and resources and some articles and such again that help people think through how to really leverage the technology platform with a global workforce to accomplish their large data projects.