We decided to find out if what Microsoft says is true: remote volumetric telepresence and collaboration can and will be done, sooner than people think and — despite obvious technical hurdles — it will be the killer app of Augmented and Virtual Reality.
Rewind. It took the personal computer roughly 15 years to hit an inflection point and become a consumer product everyone had to have. At first its killer app, email, which most people first got at work, didn’t seem so revolutionary. Hardly anyone outside the company was using it. The network effect, a phenomenon whereby a service becomes more valuable when more people use it, hadn’t kicked in. New technology always penetrates the enterprise before the home. Once people started getting Internet online services with a personal email address, it made the PC something everyone had to have at home. The telephone is another great example. The more people who got one, the more people had to have one.
A still taken from the 2016 video of the demonstration of Telepresence, or Holoportation, with the Microsoft HoloLens.
Similarly, messaging and social media are the killer apps of smartphones. Our need to connect with other people follows us, no matter where technology takes us. New technology succeeds when it makes what we are already doing better, cheaper, and faster. It naturally follows that Telepresence should likewise be one of the killer apps for both AR and VR. A video of Microsoft Research’s 2016 Holoportation experiment suggests Microsoft must have been working on this internally for some time, maybe even before the launch of the HoloLens itself.
Telepresence, meaning to be electronically present elsewhere, is not a new idea. As a result, the term describes a broad range of approaches to virtual presence. It breaks down into six main types:
1. 2D video conference systems. These have gotten incredibly sophisticated and include eye tracking to help create presence for colleagues who are still seen on a monitor. Cisco’s Spark System dominates the billion-dollar teleconferencing industry.
2. Robotic telepresence. Describes any remotely operated vehicle with a driver’s view such as Remote Underwater Vehicles (ROVs) and Unmanned Aerial Vehicles (UAVs), or RPAs (Remotely Piloted Aircraft). NASA has long dreamed of true, real-time robotic telepresence, which was, in fact, one of the initial purposes of their VR research in the 80s. However, due to the time-delay lag of signals to travel from Earth to Mars and back, NASA scientists can’t directly tele-operate a robotic explorer like the Curiosity Mars rover. However, it’s possible astronauts aboard a spacecraft orbiting Mars may be able to.
CREECH AIR FORCE BASE, NV — AUGUST 08: United States Air Force Senior Airman William Swain operates a sensor control station for an MQ-9 Reaper during a training mission August 8, 2007 at Creech Air Force Base in Indian Springs, Nevada. The Reaper is the Air Force’s first ‘hunter-killer’ unmanned aerial vehicle (UAV) and is designed to engage time-sensitive targets on the battlefield as well as provide intelligence and surveillance. The jet-fighter sized Reapers are 36 feet long with 66-foot wingspans and can fly for as long as 14 hours fully loaded with laser-guided bombs and air-to-ground missiles. They can fly twice as fast and high as the smaller MQ-1 Predators reaching speeds of 300 mph at an altitude of up to 50,000 feet. The aircraft are flown by a pilot and a sensor operator from ground control stations. (Photo by Ethan Miller/Getty Images)
3. Remote experts. They use AR to see what you’re seeing, although they cannot see you. They can even draw on the live feed you are sharing with them, interacting with real objects in your field of view in real time. Remote experts turn low-skilled employees into higher skilled ones.
4. VR telepresence. This allows us to share a virtual world like Oculus Rooms or AltSpace VR where we are represented by an avatar. Today most avatars are cartoon-like, but they will soon be able to use 3D volumetric captures taken on a cellphone to skin avatars that are eerily accurate. Lip sync (more precisely real-time lip animation) and eye contact introduced by Sansar and High Fidelity, already make you can feel very, very present.
5. AR telepresence. This allows two or more remote people to have volumetric presence in the same room, which Microsoft calls Holoportation, because it uses their HoloLens. This has been convincingly demonstrated, and now companies are seeking to bring that technology to business conferencing. However, not all the technical and practical issues around this have been solved. Several companies are working on solutions that could disrupt the teleconferencing business Cisco dominates. Cisco itself recently added a VR Collaboration feature to Spark.
6. True holographic (visible to the unaided eye) telepresence, is illustrated by Star War’s Jedi Council, pictured below. This unaided volumetric holographic presence can be done today with holographic projection, mirrors, and an invisible projection surface. This works well under very specific circumstances. In no way would the participants perceive each other, but to people outside the simulation, it is completely real. They’d see two (or more) people in remote locations in real life, interacting, on stage, without headsets, in a shared 3-D space. However, the players could not see each other, they’d be looking past the reflections at a monitor. From the audience, you’d never know.
The next best thing to being there.
Like Star Wars, the Steven Speilberg movie “Minority Report,” also features augmented reality in the scene where data floats in front of Tom Cruise without a projection surface, visible to the naked eye, and he manipulates it with his hands. This would only be possible if Cruise’s character had either contacts or some sort of neural input that could send images directly to his brain. Otherwise, projected holograms can only visible to the naked eye if there is a transparent projection surface.
The HoloLens and other AR HMDs are equipped with inside-out cameras. In order to create a telepresence app, however, an outside-in camera that can face you and take videos of you is necessary.
I visited Steve McNelley, co-founder, and CEO of DVE Telepresence, in his workshop. DVE has been working for the Department of Energy and some of the largest companies in the world to provide what he calls the “only true telepresence”. This requires three things, he explained, “absolute photorealism, perfect camera alignment for eye contact, and augmented reality images (holograms) appearing in space with no glasses required.” DVE has a podium based system called “the 4Dp Telepresence Podium” which accomplishes all these things in a portable solution. The speaker behind the podium is captured in a remote location (such as a classroom room or a personal office) and projected in real time onto an invisible translucent surface and seen in the middle of the room by an audience. The speaker is projected onto the surface, and the camera is positioned to maintain eye contact with the audience.
Which one is the hologram?
DVE has demonstrated and patented many different technologies to create this holographic experience from OLED, LED, direct projection and a variant of an illusion enabling natural telepresence called “Pepper’s Ghost”, first demonstrated by stage artist John Henry Pepper in 1862. This method creates a “ghost” by reflecting an object onto a translucent surface, like a pane of glass, so the image seems to float in front of us. Today DVE has advanced this to create bright solid looking people that look like they are really in the room, as can be seen in the above image where I appear to be in the same room with Zach McNelley, DVE’s 3D content creator, appearing as a hologram. The two requirements are a perfectly black background and a translucent projection surface.
Pepper’s Ghost was most famously deployed in Disneyland’s “Haunted Mansion” to create the illusions of spectral dinner parties and hitchhiking ghosts. In fact, the Star Wars Jedi Challenge VR Headset from Lenovo uses a similar method of bouncing an image off a mirror onto a transparent projection surface to create the illusion of 3D characters floating in space before us.
How to make a ghost — Pepper’s Ghost
Microsoft has been promoting another vision of telepresence and remote collaboration for the HoloLens that they call HoloPortation. It was first demonstrated in this video from Microsoft Research, which allowed participants in remote locations (they were actually down the hall) to be present in each other’s physical reality. Multiple 3D cameras were placed in each room. These inputs were fed into local computers which broadcast the compressed 3D image to the user’s HoloLens. This video was posted on November 2016, which means that MS engineers must have already been working on Holoportation when the HoloLens was released in March 2016.
Microsoft Research’s Room2Room is a life-size telepresence system that uses projected augmented reality to enable co-present interaction between two remote participants without using a HoloLens. This solution recreates the experience of a face-to-face conversation by performing 3D capture of the local user with 3D cameras and then projecting the volumetric copy into the remote space at life-size scale, instead of using the HoloLens. This creates an illusion of the remote person’s physical presence in the local space, as well as a shared understanding of verbal and non-verbal cues (e.g., gaze, pointing) as if they were there.
This is what happens when there is no mediating projection surface, which for the purposes research the engineers have eschewed in favor of flexibility and intelligence.
In early 2017, Microsoft spent millions of dollars to create a video which portrays the future (or one potential future) of holographic telepresence, called “Penny Walks in a.k.a. Envisioning the Future with the HoloLens.”
“Penny” is an extraordinarily well-produced science fiction video dramatization of a telepresence use case, starring a retail designer (Penny) and her client in Asia. There’s more than just telepresence going on. The client also has a floating, visible, seemingly sentient digital assistant, one of Cortana’s fantasy offspring. Setting Cortana aside, and the subtle but ambitious scale of the simulated use case in the demo, this isn’t crazy, far off, or impossible. BUT remember the network effect. It needs scale to reach that magic inflection point, where rooms are scanned in real time by 3D cameras, awaiting Penny and the rest of us.
Microsoft’s research teams continue to explore Holoportation, along with several universities, notably Warsaw University in Technology in Poland, where Marek Kowalski and Jacek Naruniec have been developing a Holopresence app, LiveScan3D.
LiveScan3D does real-time 3D reconstruction by using multiple Kinect v2 depth sensors simultaneously to produce a colored point cloud, compressing the 3D video inputs. Each Kinect v2 sensor is connected to a separate computer. Each of those computers is connected to a server which allows the user to perform calibration, filtering, synchronized frame capture, and to visualize the acquired point cloud live in a remote location. Consistent with their role as academics, Kowalski and Naruniec have shared LiveScan3D as source code on https://github.com/MarekKowalski/LiveScan3Dallowing others to build on their work.
Private companies are also making impressive progress with 3D volumetric conferencing using both VR and AR, notably, Valorem, whose system enables multiple participants in Europe, India, and North America to be volumetrically present in the users’ physical office in real time. Mimesys and Meetingroom.io use VR to create volumetric presence in a shared virtual world.
René Schulte, who is leading the HoloBeam development effort for Valorem, is based in Dresden, Germany. He described how the company’s unique 3D real-time conferencing system works, and how it is transforming collaboration among his cross-continental teams in Germany, Seattle, and India.
“This was captured real-time in HD using a depth camera to collect 3D volumetric video point cloud data consisting of color and depth information. The point cloud data is then streamed or ‘beamed’ across the internet over a customized WebRTC stream. The holographic stream is decoded by an app and rendered in real-time 3D, providing a shockingly good volumetric representation of the senders’ likeness on VR and Mixed Reality devices like the HoloLens, but also other devices are enabled by our cross-platform development approach. It runs over a normal internet connection and requires 3–5 Mbits/sec bitrate that even works below 1 Mbit thanks to our adaptive, depth encoding and streaming [CF note:Adaptive Streaming is what Netflix does to adapt to your connection speed.]. It’s real-time without delay and even works if the parties are behind firewalls for example in corporate network settings. There’s no special connection or setup needed. The connection is established via a routing mechanism to connect peer-to-peer for the best transfer rates.”
HoloBeam brings volumetric conferencing to life using standard internet with normal bandwidth requirements (1–5 Mbits/sec). No delay. Real-time. Full HD volumetric video.
The HoloBeam system does not provide the kind of resolution we saw in the MS Holoportation videos, but we’re now told those 2016 videos were only local proof-of-concepts, not something to set up in real offices. In contrast, Valorem’s system today (12/17) produces 3D volumetric video with a simple setup, using off-the-shelf hardware.
The system can have varying amounts of “dust artifacts” (drop out), depending on how much the adaptive streaming has to ratchet down the bandwidth. As a result, remote participants look like victims of a Star Trek transporter accident: only 80% there. However, everyone I talked to, and everything I experienced myself researching this story has proven that 80% is enough to create deep, compelling presence.
The holographic point cloud will have more resolution in the future with improvements, not only with increasing depth camera resolutions and bandwidth but algorithms that fill in missing pixels in decompressed video files to reduce the broadcast dropout or dust as the HoloPortation products evolve.
Schulte sees bright things on the horizon. In the office of the future, multiple depth-sensing cameras could literally merge it with remote locations around the world or we could even just use our mobile phones which start to integrate depth mapping sensors and dual-lenses in consumer products. The next guy knocking on your door could literally be in China. Valorem expects to start broader trials with clients in early 2018.
Mimesys of Paris, and Meetingroom.io, of Dublin, are startups taking a different approach, using VR as the basis for shared collaborative meeting spaces, which can include users on multiple devices like PCs and Smartphones. At their core, these systems bring volumetric captures of remote participants into a virtual room much like we see in social VR like AltSpace and Oculus Rooms today. Mimesys allows users to log into their virtual meetings using any device, including HoloLens, tablets and smartphones.
Mimesys Connect is centered on a shared virtual conference room that allows participants to import and share 3D objects, watch videos, and do just about everything you can do in a real business meeting. Unlike the social space for consumers, however, the participants are not avatars, but volumetrically present. This distinction is incredibly important.
Accessing a meeting via Mimesys Connect using the iPad iOs 11 ARKit.
Here’s a video that accurately reflects the experience I had using Mimesys Connect on the Vive. The founder and CEO Remi Rousseau and I saw one anothers’ real avatars (wearing our Vive HMDs) and were able to pass and manipulate 3D and 2D objects. The feeling of being withhim, of presence, was extraordinary.
Rousseau believes the VR-centric approach is the most flexible and easy to use. “HoloLens Teleportation doesn’t allow users to share and collaborate the way they do with Mimesys Connect. We can’t collaborate on a shared whiteboard, for example.” I asked Russeau about barriers to entry and how his small start-up, in use with perhaps a dozen pilot clients, could defend this kind of VR approach from low or no cost competitors like AltSpace and Oculus Rooms.
“There is a potential risk, especially regarding Facebook spaces,” he said, “which is also why we focus today on B2B rather on B2C. That being said, the communication space is huge. Platforms like WhatsApp, messenger, hangouts, facetime, co-exist today and that would probably be the same for VR and AR communication. There will be different experiences with different audiences.”
“We’re still at the beginning but the portability is the game changer here,” added Jonny Cosgrove, founder and CEO of Meetingroom.io. “C-suite and sales directors can meet and manage salesforces, companies can engage with more customers.”
OJ Winge, currently SVP of Cisco’s Video Technologies Group has been working with telepresence in one form or another for most of his career. “Cisco’s Spark system already provides a new richness of experience,” he said. “Right now quality isn’t good enough for volumetric telepresence, which we see as something different from Spark. Complementary, different, but not a replacement. For a normal meeting, the technology needs to be transparent and natural.” He is confident of Spark’s position and plans to grow the business.
Telepresence will happen very slowly, and then all at once, dramatically disrupting not only the conferencing business but business management and collaboration itself, to say nothing of the multi-billion dollar business travel category.
All the key premises of my upcoming book about AR and VR are present in this story. We consistently overestimate the present and underestimate the future. Products succeed because they make what we’re already doing better. The killer app is other people.