Developing an Augmented Reality Browser: WebXR Viewer for iOS

Tony Morales – January 31, 2020
I develop AR apps for clients big and small. Want to build something awesome? Get in touch.

I recently wrapped up an extremely enjoyable gig working with Mozilla’s Emerging Technologies (R&D) team. I developed an open-source, augmented reality iOS browser that plays nicely with the new AR/VR-focused, web-based standard: WebXR.

That’s...quite a mouthful. Let me take a step back and try to explain the basics of an AR browser and web-based AR. Then I’ll dive into how working on this project affected which elements of the AR future I’m most excited about – plus how it convinced me there will be awful, soul-crushingly depressing, unethical AR products abound in the near future (spoiler: they’re already here).

Two Paths to Build AR Experiences (At Least)

If you want to build an AR product today there are a couple ways to go about it: native vs. web-based.

Right now, most augmented reality experiences use a specialized native app that’s built specifically for the AR device at hand. If you have a HoloLens, you strap it on and boot up an app coded specifically for HoloLens to do business-y stuff. If you have an iPhone, you launch an AR game or AR interior decoration app built on ARKit or RealityKit specifically for iOS. The downside here is the laborious process of discovering these single-use native AR apps, downloading them, and having them take up precious room on your homescreen.

Web-based augmented reality experiences have existed in some form forever (i.e. over 10 years), but they’ve been gaining steam recently. Apple’s first foray was AR Quicklook in 2018, and Google followed up a year later by tossing some AR objects in search results. But those two implementations of web-based AR are narrowly focused on displaying 3D objects – there’s no room for people to creatively play with the technology, dream up some crazy new AR webpage, and then start building it.

AR can be much more than walking around floating 3D models in your living room. AR encompasses recognizing surfaces to map your world, mixing virtual items into your world, remixing your world, displaying heads-up displays that measure or annotate your world, playing games in your world versus on a screen, and countless other things. AR Quicklook and AR results in Google search are mere basic stepping stones on our way to a much fancier, featured-packed, web-based AR future.

While web-based AR may not have access to all the latest native AR capabilities, the benefit is clear: Each web-based augmented reality experience is accessible via a URL instead of via a downloaded app. Tapping on a link to launch a web-based AR experience is easier than downloading a native AR app for a solitary, thirty-second session.

The catch here is that to access something on the web you need a browser. Traditional, older browsers don’t know how to handle links that launch users into a VR or AR experience. With rare exceptions, browsers haven’t had to know how to deal with 3D models, changing perspectives based on device location, and other elements crucial to XR. Since few browsers and websites know how to tap into a device's camera to enable an augmented reality experience, a new standard was needed. This new standard would help people create, and browsers understand, web-based AR. Hence the creation of a new standard called...

WebXR

I had the good fortune of being tossed into the mix to work on the Swift iOS side of Mozilla’s WebXR mobile browser project. If you’re a developer, you should give the project a go by checking out the latest code in this Firefox branch. If you’re not a developer, you can download an older version of the app here. Older versions of the app were a standalone AR browser using SceneKit as the engine, but after I rebuilt the app on Apple’s Metal framework and spliced it into the Firefox codebase it runs at a smooth 60 fps. Some of the initial functionality includes image recognition, face tracking, world sensing, and hit-testing. Take a look at these AR experiences running via WebXR website as opposed to a native app!

Perhaps at some point in the future WebXR will be baked into Safari, the default browser on iOS. However, there’s no need to wait for that since WebXR Viewer is the first iOS browser to support WebXR and you can use it right now. These are very early days for WebXR and its ecosystem. So far I’ve heard from a DARPA-funded lab, an entrepreneur in France, and an agency owner in Africa that are all using WebXR Viewer for vastly different purposes. I’m very excited to see what happens as WebXR adds support for more features, more browsers enable WebXR, and more people build weird & useful things with WebXR.

AR I'm Pumped to Work On

By way of working on WebXR Viewer, I spent a lot of time getting AR building blocks working in a browser. While getting web-based AR up to speed with circa ARKit 1.5/ARKit 2 capabilities, I couldn’t help but think which features WebXR should integrate from ARKit 3 – and from future releases. This inevitably led to the limitless “what if”s I hope to see coming down the pike in AR SDKs, libraries, and products.

I’ve worked on AR browsers, utilities, games, gimmicks, art projects, consumer products, and technical demos so I have a rough baseline (at least) on many facets of AR development & design. To me right now the most interesting, fertile grounds for AR experimentation lie within three areas:

Finger-recognition based controls in AR
Combining AI (vision, speech, or otherwise) and AR
Multiplayer AR

I’ll post any updates or demos on those fronts to Twitter in the coming months.

AR I Hate

Different AR experiences will use different sensors and data gathered from your device to augment your reality. Some experiences require little information about the world around you for the experience to work. Others may surreptitiously hoover up ridiculous amounts of information to be able to store a recreated 3D model of your home in some random database somewhere. The different tiers of access built into WebXR Viewer, where each level progressively shares more and more information from your device with the website, are:

Device Motion: This sends only enough information to allow your device to orient itself in 3D space. This is the safest, most basic level of WebXR that’s ideal for viewing and walking around a 3D model floating in front of you.
Lite Mode: This is a privacy-focused mode unique to WebXR Viewer. Lite Mode allows the user to limit the data sent to the website and choose a single plane (e.g. the floor) to use for hit-testing. Lite Mode is ideal for placing 3D models of furniture at specific places in your home without the website being able to cobble together a floorplan of your home.
World Sensing: This mode scans your surroundings to recreate the planes around you (e.g. floors, tables, walls, ceilings), tracks faces, and looks for any images tagged for image recognition. World Sensing mode is ideal for Snapchat-style face lens effects, more detailed home decor apps, or tying AR experiences to specific products by using a label (or movie poster or photo) as the base image that launches a video/animation/other effect.
Video Camera Access: This mode does everything World Sensing does, but it also exposes the camera feed for processing by the website. That “processing” could mean running the camera’s image through a machine learning algorithm, uploading the image somewhere to try and recognize objects, or many other things. Video Camera Access is ideal for running the camera’s viewpoint through SimpleCV or OpenCV (frameworks & libraries that simplify building computer vision apps).

WebXR Viewer automatically takes into account which permission-level a site is requesting to then prompt the user for the appropriate permissions. The user has the ability to deny all requests or pick and choose what level of data they’re comfortable sharing with the site. The system isn’t perfect, but the core idea driving these decisions is to place the power directly in the hands of the user. The user is made aware what type of data a site wants to utilize and the user has the option to deny all access before anything happens. This lies in direct contrast to the advertising hellscape we live in where trillion dollar companies harvest our every movement and create accurate facial recognition databases of all of us...and years later “apologize” and hide an opt-out toggle to limit future data-gathering behind twelve hamburger-bar clicks.

However, the concern for the user built into WebXR Viewer took time, consideration, and experimentation. Not everyone building an AR app is going to take that time, nor recognize the extremely personal nature of the data an AR app may collect, nor care either way. I’ve encountered several AR startups that demand users add a 3D scan of their house to the company’s proprietary world-map (for future monetization) before users can derive any questionable value from the app. Other apps trick users into uploading a facial scan for the sole purpose of amassing a huge facial recognition databank without giving users the slightest hint beforehand of what’s going on.

AR and VR apps are still the Wild West in some regards. It’ll take years, if not longer, for awareness, norms, and safeguards to form around the pitfalls of sharing too much with an XR app. Until then, it’ll be very interesting to try and build valuable AR products whose survival doesn’t rely on underhanded tactics like duping users or farming users for monetizable data.

Want more? Read my next post about building with RealityKit.

If you have just such a product in mind, reach out!