“The idea of static, normal video is the past. Dynamic, volumetric video, that’s the future.” - Ted Schilowitz, Futurist, Paramount Pictures.
Konstantinos Rematas and a team of researchers at the University of Washington use AI to turn 2D videos into volumetric videos that can be seen from any angle.
This video of a soccer march blew my mind when I saw it on Twitter the other day. I have been using volumetric sports viewing on tabletops as a use case in my presentations for over a year. I never dreamed months later I’d be watching a demo of it on YouTube. The demo comes out of the University of Washington Reality Lab, where post-doc computer vision researcher, Konstantinos Rematas and his team used AI to turn 2D video into what appears to be 3D volumetric videos. This is a markedly different approach to the one being taken by Intel, HypeVR, DepthKit, and others seeking to solve the problem of capturing full 3D volumetric scenes for VR and AR platforms.
Here’s what their published paper says, in part: “We evaluate our approach on a collection of soccer videos downloaded from YouTube with 4K resolution. The initial sequences were trimmed to 10 video clips shot from the main game camera. Each sequence is 150-300 frames and contains various game highlights (e.g., passing, shooting, etc.) for different teams and with varying numbers of players per clip... It is easier for the network to learn individual player’s configuration rather than whole-scene arrangements... Finally, to watch a full, live game in a HoloLens, we need both a real-time reconstruction method and a method for efficient data compression and streaming.” This is the least technical part.
“What's particularly unique about the team at University of Washington's approach is that it doesn't require special equipment. This technique is able to extract players from existing 2D sports broadcasts. However, to get great volumetric captures that can be seen from any angle, nothing beats having equipment designed intentionally for volumetric capturing the action on location.” Said James George, co-founder of Scatter, maker of the volumetric capture solution Depthkit.
An illustration of the process UW's AI uses to turn 2D video into 3D volumetric video from a single camera game view. To stream in real time, more bandwidth and an enormous amount of storage is needed.
“There are so many ways to skin the volumetric cat,” says Ted Schilowitz, Paramount Pictures’ resident futurist. “Intel is interpolating data from multiple 2D cameras to create volumetric 3D basketball highlights. This is being done today and soon it will capture the entire game.” However, there are a few gnarly issues for any undertaking a live sporting event OR a volumetric movie: storage and real-time compression.
A quality volumetric capture of actors or players is done on a soundstage with a green screen for a background. The subject is situated alone inside a box of lights and cameras. Tracking shots are not possible with this rig, nor are close-ups or shots of multiple actors together. Depthkit has a simpler, more flexible and inexpensive approach which uses a 3D Kinect camera and a regular SLR camera to take volumetric video adequate for VR, without the need for a capture rig with dozens of lights and cameras. The low cost and portability make Depthkit the solution of choice for indie makers. Schilowitz and Intel, on the other hand, want to make cinematic 360 videos we can walk around in. “Everything we see in the spatial world, in AR or VR, can’t be done with CGI,” Schilowitz told me. “We will want live action video as well, just as we have in traditional video capture.
Intel has created a mammoth 25,000 sf sound stage near LAX using 3D cameras on an industrial scale to make volumetric movies. These would be much more than 3D movies we see wearing plastic polarizing glasses today. They would be the movies of tomorrow. “In a few years we’re going to be able to watch big action movies this way,” Schilowitz says.
Schilowitz is also co-founder of HypeVR, which also has a unique system for volumetric capture using 14 Red Epic cameras. Schilowitz was, not coincidentally, the first employee of Red. HypeVR describes itself on its website as “a computer vision technology company focused on developing ultra-high resolution live action VR capture and playback with six degrees of freedom. HypeVR’s next-generation technology stack includes proprietary volumetric VR capture, reconstruction, compression codec, custom graphics engine for playback and volumetric streaming. HypeVR’s current offering includes a full capture, rendering, compression and streaming suite, along with creative production and post-production services.” Volumetric video differs from 360 as you are untethered from the camera, and able to navigate through space, or light field, that has been captured. Using a tablet, AR or VR HMD, the viewer can move around in the captured scene.
When I spoke with Schilowitz on the phone a few days ago, he emphasized that HypeVR is not a camera company. “We’re a math company. We’re engineering and patent focused” Hype is funded by Paul Allen’s Vulcan Ventures, Shari Redstone’s Advantix early stage fund, and several celebrity investors who shall remain nameless.
Not co-incidentally, HypeVR is working with Intel, at the Intel sound stages in Manhattan Beach CA. “We’re down there experimenting and learning,” Schilowitz said. “We’re talking about a lot of computing. A lot of data that has to get moved around.” In the existing sports broadcast world, Intel places 40 cameras 5K resolution camera in a stadium for each game. The system produces data at a rate of 3 terabytes per minute. This creates petabytes of storage. “Intel is in the business of providing computing power to all types of enterprises, so the concept of advancing video capture and playback in a way that can utilize more and more advanced computing systems is of course highly interesting and appealing to Intel, and to all of us looking to advance the idea of what video can be.”
Tonaci Tran, CEO and co-founder of HypeVR with co-founder Ted Schilowitz, Paramount Pictures' resident futurist.
Joe Lemire, who writes about technology and sports for SportTechie, told me in an email when I asked him to comment on this story: "This tabletop soccer demo offers great potential for bringing a new, accessible experience to the home viewer without the burden of an elaborate setup. That the research team includes professors with Google and Facebook affiliations is noteworthy as those tech giants continue to rethink the way we consume all media, including sports."
For his part, Konstantinos Rematas at UW told me he and his team are looking forward to tackling basketball next while they continue to investigate if AI can replace those acres of data with a volumetric broadcast of manageable size in real time. “It’s not going to be an either-or proposition,” Schilowitz told me. “These are all legitimate tools that will be deployed for the jobs they do best.”