At ACM Multimedia 2012, we present a method that enables meter-accurate indoor positioning using visual information recorded by a smartphone’s camera. The position and orientation of the device is estimated by comparing the camera images to a database of previously computed virtual views. This comparison is carried out by an optimized image search engine and can be done within miliseconds. The paper covers the view generation process and explains how the virtual views are used for visual localization.
In a nutshell, the approach employs a novel combination of a content-based image retrieval engine and a method to generate virtual viewpoints for the reference database. In a preparation phase, virtual views are computed by transforming the viewpoint of images that were captured during the mapping run. The virtual views are represented by their respective bag-of-features vectors and image retrieval techniques are applied to determine the most likely pose of query images. As virtual image locations and orientations are decoupled from actual image locations, the system is able to work with sparse reference imagery and copes well with perspective distortion.
Warping & Feature Extraction
Please check out our ACM Multimedia 2012 paper for all the details.
This is a live demonstration of the visual indoor localization & navigation system developed for the project NAVVIS at the Institute for Media Technology at Technical University Munich:
Download: 720p WMV 1080p MP4
The video shows our lab demonstrator running on an Android phone. By analyzing the camera images for distinctive visual features, the position and orientation of the smartphone is recognized and displayed on the map. For this demonstration, only visual information has been used – other localization sources have not been used to improve the vision-based results. In order to compare localization accuracy, Android’s network-based position estimate (Wifi and cellular networks) is displayed.
At IPIN 2012, we present a visual odometry system for indoor navigation with a focus on long-term robustness and consistency. As our work is targeting mobile phones, we employ monocular SLAM to jointly estimate a local map and the device’s trajectory. We specifically address the problem of estimating the scale factor of both, the map and the trajectory.
State-of-the-art solutions approach this problem with an Extended Kalman Filter (EKF), which estimates the scale by fusing inertial and visual data, but strongly relies on good initialization and takes time to converge. Each visual tracking failure introduces a new arbitrary scale factor, forcing the filter to re-converge.
We propose a fast and robust method for scale initialization that exploits basic geometric properties of the learned local map. Using random projections, we efficiently compute geometric properties from the feature point cloud produced by the visual SLAM system. From these properties (e.g., corridor width or height) we estimate scale changes caused by tracking failures and update the EKF accordingly. As a result, previously achieved convergence is preserved despite re-initializations of the map.
We evaluate our approach using extensive and diverse indoor datasets. Results demonstrate that errors and convergence times for scale estimation are considerably reduced, thus ensuring consistent and accurate scale estimation. This enables long-term odometry despite of tracking failures which are inevitable in realistic scenarios.
For more information, take a look at our IPIN 2012 paper.
Thanks to the recent press release by TU Munich, NAVVIS got quite some attention from the press during the last few days. We would like to thank TUM and all authors for recognizing and reporting about our indoor navigation system!
In the following you find a short selection of referring articles:
We just uploaded our VidSnaps and VidSnaps-Offtrack query images to the dataset page. We use these images to evaluate the quality of algorithms for visual localization. The 768 images in the VidSnaps dataset were recorded close to the mapping trajectory, i.e., there are reference images in the 2011-11-28 dataset that have a similar perspective. The 252 images in the VidSnaps-Offtrack dataset, in contrast, were recorded a few meters away from the mapping trajectory, hence they have a different perspective than the most similar reference images. At each location, we took six images and manually added the location (ground truth).
The NAVVIS Indoor Viewer has just been updated. The point cloud data as well as high resolution imagery recorded by the two DSLRs are now visualized. Please have a look at the Indoor Viewer menu to display the point cloud and adjust the properties.
There is also an animated tour through TU Munich. Feel free to modify the track of the tour by adding new keyframes!
The NAVVIS Indoor Viewer is now public, head over to the “Indoor Viewer” tab to try it out. The Indoor Viewer is a browser based research tool to explore the available TUMindoor datasets and to evaluate feature extraction as well as localization results. Please contact us if you are interested in working with this viewer.