Our extensive indoor dataset, recorded at the main site of Technische Universität München and comprising several thousand high-resolution images with camera pose information, as well as 2D grid maps and 3D point clouds, is available free of charge for download as a contribution to the scientific community. The dataset is licensed under the Creative Commons Attribution 3.0 Unported License, which allows you to use the dataset both commercially and non-commercially (attribution is required).

For details about the dataset, please refer to our publication cited below. Feel free to contact us with requests on additional recordings, e.g., new query sequences.

Reference images

Mapper run Track length Ladybug3 images DSLR images Pointcloud
11-11-28 (1st floor)
Map & Metadata
1169 m 9438 images

(1.3 GiB)
3146 images

(20 GiB)
59.7 M points

(139 MiB)
11-12-13 (1st floor N1)
Map & Metadata
838 m 5634 images

(1.2 GiB)
1878 images

(13 GiB)
34.0 M points

(77 MiB)
11-12-17a (4th floor)
Map & Metadata
246 m 1734 images

(195 MiB)
578 images

(3.4 GiB)
10.1 M points

(23 MiB)
11-12-17b (3rd floor)
Map & Metadata
396 m 2532 images

(296 MiB)
844 images

(5.4 GiB)
14.7 M points

(34 MiB)
11-12-17c (Ground I)
Map & Metadata
569 m 3858 images

(641 MiB)
1286 images

(8.3 GiB)
23.3 M points

(54 MiB)
11-12-18a (Ground II)
Map & Metadata
523 m 3756 images

(444 MiB)
1252 images

(7.6 GiB)
16.9 M points

(39 MiB)
11-12-18b (2nd floor)
Map & Metadata
781 m 4464 images

(558 MiB)
1488 images

(9.3 GiB)
25.7 M points

(59 MiB)

The map & metadata archive contains the occupancy grid maps at various resolutions (2 cm, 10 cm, 50 cm and 1 m per pixel) as well as a .YAML file that specifies the origin of the coordinate system in pixel coordinates. See the ROS map_server wiki for more information about the file format. Further, the archive contains two files with information about the position and orientation of images: a .CSV (semi-colon separated values) and a .MAT (MATLAB/Octave) file. When loading the .MAT file, you will find two variables in your workspace: image_files lists all the image filenames, and image_T holds 4 x 4 matrices that specify the location and rotation of the cameras in the (metric) map coordinate system. Please note that the six Ladybug3 images that were captured at a location have the same transformation matrix since the panorama camera is treated as a single device.

The Ladybug3 archive contains six 1616 x 1232 JPEG images per location. See PointGrey’s website for more information about this panorama camera system. The DSLR archive contains two 5184 x 3456 JPEG images per location (dslr_left and dslr_right).

The point cloud archive contains a .PCD file with point cloud data, sampled using a regular voxel grid at a resolution of 2 cm. This file format is supported by the Point Cloud Library (PCL), which can either be used directly or from within the Robot Operating System (ROS). Typical installations of PCL come with a simple point cloud viewing application called pcd_viewer, which is able to load and display .PCD files.

Query sequence

Query Track length Video duration Video file
11-12-23 (1st floor) 434 m 23:36 MAH04212.MP4

(2 GiB)

The query video has been compressed by the camera using H.264. In order to produce the image files as we used them for our experiments, use FFmpeg with a command line like this:

ffmpeg -i MAH04212.MP4 -f image2 -r 5 fps5-frame%05d.png

This will extract five frames per second and write them to PNG files in the current directory. If you must use JPEG for some reason, please make sure to specify the compression quality – by default, FFmpeg produces JPEG files with very low quality.

The first CSV file contains the locations (X, Y, Z) of the query frames, the second file the full (approximate) 4×4 transformation matrices (the location is in the last column). Please note that the rotation is not perfectly accurate for the query sequence. The location should be accurate up to a few centimeters, though. The ground truth was recorded by mounting the query camera on the trolley and putting the trolley into a localizer mode where it used the laser scanner to localize itself on the map of the 11-11-28 dataset, i.e., the coordinates for the query sequence are in the same coordinate system as the images of this dataset.

Query images

Query Images Acquisition Date File
Vidsnaps 768 (at 128 locations) February 2012 VidSnaps.tar.bz2

(2.1 GiB)
Vidsnaps Offtrack 252 (at 42 locations) March 2012 VidSnaps-Offtrack.tar.bz2

(748 MiB)

These query images have been recorded on the 1st floor of the main building (the corresponding reference mapper run is 11-11-28). The images of the first dataset are close to the mapper trajectory (VidSnaps), the images from the second dataset keep a distance to the mapper trajectory (VidSnaps-Offtrack). A .mat file is included in the archives which contains the approximate ground truth position (X, Y and Z). At each location, a short video sequence has been recorded, and the six least blurry frames have been selected.

Calibration data

These are the calibration matrices and distortion coefficients for the cameras:
\[\begin{array}{r c l}
\mathbf{K}_\mathit{DSLR,left} & = & \left[\begin{array}{c c c}
3579.03 & 0 & 2612.08\\
0 & 3576.44 & 1719.79\\
0 & 0 & 1
\mathbf{k}_\mathit{DSLR,left} & = & \left[\begin{array} -0.118829 & 0.082360 & -0.001720 & -0.000532 & 0 \end{array}\right]\\
\mathbf{K}_\mathit{DSLR,right} & = & \left[\begin{array}{c c c}
3556.29 & 0 & 2568.05\\
0 & 3553.11 & 1770.36\\
0 & 0 & 1
\mathbf{k}_\mathit{DSLR,right} & = & \left[\begin{array} -0.126627 & 0.087393 & -0.000971 & -0.000463 & 0 \end{array}\right]\\
\mathbf{K}_\mathit{query} & = & \left[\begin{array}{c c c}
1180.90 & 0 & 713.46\\
0 & 1576.60 & 542.43\\
0 & 0 & 1
\mathbf{k}_\mathit{query} & = & \left[\begin{array} -0.263945 & 0.138761 & 0.001057 & -0.000190 & 0 \end{array}\right]\\

Please refer to the OpenCV documentation for an explanation of these parameters. Please contact us if you need calibration data of the Ladybug3 camera system.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License. We would like to ask you to cite the following paper when referring to this datatset:

Robert HuitlGeorg SchrothSebastian HilsenbeckFlorian SchweigerEckehard Steinbach,
TUMindoor: an extensive image and point cloud dataset for visual indoor localization and mapping“, In IEEE International Conference on Image Processing (ICIP), Orlando, September 2012
  author = "R. Huitl and G. Schroth and S. Hilsenbeck and F. Schweiger and E. Steinbach",
  title = "{TUM}indoor: An Extensive Image and Point Cloud Dataset for Visual Indoor Localization and Mapping",
  booktitle = "Proc. of the International Conference on Image Processing",
  month = sep,
  year = 2012,
  address = "Orlando, FL, USA",
  url = {},
  note = {Dataset available at \url{}}

ICIP 2014 update

An updated version of the dataset has been used in our publication “Camera-Based Indoor Positioning Using Scalable Streaming Of Compressed Binary Image Signatures” in ICIP 2014. It includes the reference images, query images, and ground truth and is available for download here.

We would like to ask you to cite the following paper when referring to this datatset:

Dominik van Opdenbosch, Georg Schroth, Robert Huitl, Sebastian Hilsenbeck, Adrian GarceaEckehard Steinbach,
Camera-Based Indoor Positioning Using Scalable Streaming Of Compressed Binary Image Signatures“, In IEEE International Conference on Image Processing (ICIP), Paris, October 2014
  author = "Dominik van Opdenbosch AND Georg Schroth AND Robert Huitl AND Sebastian Hilsenbeck AND Adrian Garcea AND Eckehard Steinbach",
  title = "Camera-based Indoor Positioning using Scalable Streaming of Compressed Binary Image Signatures",
  booktitle = "{IEEE} International Conference on Image Processing ({ICIP} 2014)",
  month = "Oct",
  year = "2014",
  address = "Paris, France"

Comments are closed.