Magic Leap, a secretive, well-funded company making Augmented Reality (AR) glasses in suburban Ft. Lauderdale, has popularized their vision of hands-free, contextual, mobile computing even though it has yet to launch a product. This is a totally different kind of AR than what we’re familiar with on iOs devices, Heads Ups Display (HUD) systems, or microdisplays like the soon to be released Kopin Solos, which I wrote about last week. Spatial computing is different because it uses precise location-awareness to anchor data to places or objects in the real world to create a ubiquitous digital data overlay on the physical world. This technology is maturing soon and thanks to tech giants and startups racing to build it, the world might soon be painted with data.
In this nightmarish view of mobile contextual computing, objects, places, and people, all painted with data, are all shouting for attention.
A world painted with data could easily be an unruly, noisy mess like Keiichi Matsuda’s nightmarish Hyper-Reality video, where every billboard, storefront, and product is shouting for our attention. Nevertheless, looking beyond the dystopian implications, this is a pretty exciting vision, because it represents the next logical step for computing after the mouse and multitouch screens.
Let’s say you set your UVB filter to “sexy”…
This trillion dollar opportunity depends on three things working seamlessly together; computer vision, AI, and the AR cloud. Not surprisingly, tech giants Amazon, Google, and Facebook, we well as startups large (Magic Leap) and small (6D.ai, Placenote) are working to address some or all of these problems.
Facebook knows you.
The big dog of computer vision is Amazon Web Services (AWS) which offers Amazon Rekognition (sic) — “Image Detection and Recognition Powered by Deep Learning.”
"Maybe you see a pet, a dog, or a Golden Retriever," explained Rekognition evangelist Jeff Barr in a blog post in November of 2016. "The association between the image and these labels is not hard-wired into your brain. Instead, you learned the labels after seeing hundreds or thousands of examples. Operating on a number of different levels, you learned to distinguish an animal from a plant, a dog from a cat, and a Golden Retriever from other dog breeds." To address this, Rekognition combines deep learning with computer vision to create computer vision software which it offers as a service (SaaS) that is accurate enough [>95%] for most applications. Rekognition comprehends scenes, objects, and faces. Given an image with one or more faces, it will return bounding boxes for each face." In November 2017, AWS introduced Rekognition Video, which extracts even more data, correctly identifying the activity, direction, and intention of people and objects, making the system capable to tracking subjects that are obscured. Robert Scoble, co-author of The Fourth Transformation with Shel Isreal, told me he thinks Amazon is way, way ahead of everyone else.
There are several kinds of computer vision each with varying degree of intelligence. Vuforia, for example, can recognize markers, like a barcode reader for the physical world, to place information and 3D objects, while AR Kit uses surface and depth detection, but it is not married to deep learning like Rekognition.
Celebrities, fruits, cars, and pets, along with Blippar's specialty, brand recognition, are just some of the thing that can be identified using their proprietary knowledge graph.
“The holy grail of computer vision is to understand images and videos beyond labeling of objects,” Blippar co-founder Omar Tayeb told me in a phone call last week. Blippar is a computer vision and augmented reality start-up which has raised $120 M since it was founded in 2o11 by Tayeb (CTO), Ambarish Mitra (CEO), Steve Spencer (CCO) and Jessica Butcher (Executive Director). “Blippar is where AR and AI meet,” says Tayeb. “It’s not enough to identify a woman and a stroller. A universal visual browser will also understand the relationship. The stroller could have a baby inside it and the woman is possibly the mother. It’s not just visual, it’s contextual. That’s what makes Blippar unique.”
Computer vision is at the heart of Blippar, as their free app demonstrates (download from the app store or Google Play). Point your camera around the room and watch it identify every object, and every synonym of every object. “Blippar’s proprietary knowledge graph makes our technology capable of much more than labeling,” says Tayeb, “There are 4 billion facts about the world compiled from over sixty trusted sources, like Wikipedia, in Blippar’s database.” It’s true. If it’s in Wikipedia, the app will recognize it. Test it by pointing at a popular magazine like “Forbes” or “People”.
Blippar has also focused its business on brand recognition. Any object, poster or ad that has the Blippar “B” logo has a hidden, digital layer of augmented reality. Moreover, companies can also integrate this technology into their own app via Blippar’s AR SDK. The app can also identify company logos and use them as triggers.
Anything that has this little “B” is painted with a layer of digital data that can be unlocked by the Blippar app.
In November 2017, Blippar released AR City which leverages computer vision to provide more accurate location data than GPS (more than double the accuracy) and can cover entire cities (city-scale). It is designed to power the new generation of location-based AR, which requires precise pose estimation to overlay complex virtual content onto physical shops, tourist attractions and any other point of interest. AR City can help navigate a complex intersection or find a hidden restaurant by overlaying virtual roads and directions onto the physical world.
"The technology helps to align and superimpose the physical and virtual worlds more accurately than GPS and will not only empower the industry to create more sophisticated AR experiences but it has potential to significantly impact areas like tourism, city mapping, real-world 3D gaming, and more,” explained Tayeb. "“What we have achieved with Computer Vision in our Urban Visual Positioning System, that we are able to do this at city-scale and with high accuracy, is an important breakthrough in the industry."
Toronto based startup Vertical.ai, a 2015 Y- Combinator graduate which has raised 2M to use computer vision and simultaneous localization and mapping (SLAM) to build an AR Cloud, just announced the launch of Placenote. an app that allows mobile developers to build AR apps that permanently place virtual objects in the real world for others to discover. If Blippar works at the city scale, Placenote works best at the room scale. Imagine pointing your camera at an office printer and finding instructions on how to fix a paper jam directly overlaid on the printer. "We're building AR for the real world," said Neil Mathew, founder of Vertical.ai, in an interview last week.
The information you deposit at your location, accurate to 10", will be accessible to anyone who uses the Placenote app in the same location.
“Mapping retail stores, industrial locations, even my AirBnb are hugely enhanced by Placenote,” said Matthew. Finding your way around a new place, or in a new city are simple examples, but they illustrate the transformative potential of AR for retail, tourism, and education. It suddenly makes ARKit the big deal we were all waiting for when they launched in September of 2017.
Here are some examples of the kinds of location-based AR experiences, built by Vertical.ai, that make the promise of AR self-evident.
Placenote is available as an SDK (software developer kit) that lets mobile AR developers to build AR experiences. "Basically, we are giving mobile cameras the ability to scan any physical space and turn it into a persistent, shared canvas for AR content," said Mathew. "That means augmented reality on iPhones and Android phones can finally be as good as the Hololens and enable apps like indoor navigation, AR product manuals, multiplayer games, and industrial data visualization."
“One of the big things holding back engaging AR is for content to feel like it’s actually physically part of the world,” 6D.ai CEO Matt Miesnieks told TechCrunch in an interview.“To really make that effect possible, you need to have a 3D model of at least your room, if not the whole world.” This means as well as persistence, a shared multi-user view of AR content is possible, and that content can be occluded by the 3D real world.
Miesnieks, a veteran of Samsung's AR team, is well known in the world of mobile AR computing. He's partnered with Ori Inbar (co-founder of AWE), Tom Emrich, and Mark Billinghurst (HIT Lab) in Super Ventures, a seed fund for AR startups. Indeed, he contributed a chapter on the AR Cloud to my recent book. Mienieks was looking into Computer Vision when he met Victor Prisacariu, whose work at Oxford University focused on the AR Cloud. The two teamed up to create 6D.ai together.
Miesnieks describes 6D.ai as "Waze for AR". The app would take all those rooms, all those locations, all those cities, and stitch together buildings, objects, landmarks and other camera data to create a world map, both inside and outside the home, school, office, or wherever the app is used. In addition to the providing developers the 3D structure of the world without needing depth cameras, 6D.ai will semantically identify 3D objects to enable intelligent AR applications. Miesnieks told me in an email 6D.ai will have significant company news within the next four weeks. The company first garnered attention when Tim Cook visited Oxford in 2017 and expressly asked to see Prisacariu's project.
Finally, just as we were wrapping up our research, we got these mind-blowing Twitter videos from Andrew Hart, founder of Dent Reality.
Dent Reality of London is announcing a family of developer tools to make it simple to create AR location experiences, beginning with "Point Of Interest" experiences.The toolkit uses Computer Vision to identify the surrounding landscape, and then precisely overlays digital content.
"Replicating what our human eye and brain does naturally is very complex and this list of startups and organizations just begins to scratch the surface on the technological solutions needed to recognize text, people, buildings, and so on," said Tom Emrich, of Super Ventures, when I consulted him on this story. With more than twenty companies working on these challenges we're likely to see many AR clouds before a standard UVB is adopted. Who is seeding the AR cloud? Everyone.
For this to be more than an iWatch on your face, it needs to be powered a Universal Visual Browser.
Finally, it's important to note that all these companies have something in common: their success depends on the success of developers who incorporate their API in their technology stack. For example, Blippar would be amazing for language learning. Tyeb agreed. An entrepreneur or company in that space must build, market and scale such a language learning product using Blippar SaaS.
Just after I started promoting this story on social media, I got the following message from a colleague: "I hear rumors that Google will announce something at their developers' conference that will wipe out all other persistent solutions. It's in testing with some partners already." Looks like I'll be writing about this again soon.