At Parkopedia we apply cutting edge machine learning and computer vision methods to solve difficult parking problems. Several of our projects involve using street-level imagery and video data, using sources such as dash cams to extract parking insights.
In recent years, significant progress in computer vision solutions has been made. Today, great effort is put into making these developments generally applicable, however, most are still only applied and tested in laboratories on the same limited standard datasets. When working with real-world images, you often find yourself confronted with unexpected problems.
At Parkopedia, we are constantly dealing with these issues while working with ‘real world’ dash cam footage. In our efforts to use this data to extract useful parking information, we started on the basis of our standard street segmentation model which classifies each pixel on an image into one of 16 classes. The simple version of that model has 5 classes: road, vehicle, sidewalk, curb and other. The model was originally trained on the A2D2 dataset, a public dataset provided by Audi that contains a set of videos recorded in Germany. Performance was strong, with the model able to discern the different classes with an accuracy of 97.5%.
Top left: Image from the A2D2 dataset. Top right: ground truth segmentation (38 classes). Bottom left: predictions with 5 classes.
When applied to our dash cam videos recorded in London, the performance of the model significantly deteriorated.
Segmentation model tested on the dashcam dataset
The model seemed to malfunction in areas of the image where strong reflections appear. These are the reflections of objects left on the vehicle dashboard or the reflection of the dashboard itself. Although humans have become accustomed to ignoring this, a computer is more restricted to what it has analysed before. The training data previously did not incorporate reflections so it was no surprise we started to see the model fail here.
One could also argue that the model fails because it has been trained on videos of German streets as opposed to London variants. However, the model provides consistent results on the ‘Camvid’ dataset, another publicly available dataset recorded in the UK.
Parkopedia segmentation model tested on the CamVid dataset
The next error source checked was the reflection itself. This kind of reflection can be diminished when recording a video by using a ‘dash mat’ which is a non-reflective cloth laid on the dashboard. However, we are not always able to control how the videos are being recorded as many are received from third parties. To use this form of data, we needed our segmentation model to be robust to reflection interference.
Augmenting images with artificial reflections
Data augmentation is a technique used in machine learning that consists of randomly applying slight modifications to data so that the model sees beyond the original dataset. For instance, if your training data only contains bright images, your model might not perform as expected on dark images. Rather than collecting a new dataset of dark images, a simple solution is to artificially make your training images darker. You can apply the same logic to contrast, colours, etc.
Similarly, our dataset did not contain images with reflections, so we started to simulate artificial reflections to our training images. In this case, the inside of the vehicle is being reflected onto the image which means that anything laying on the dashboard could end up appearing on the image. The most visible and damaging reflections are those of the actual dashboard, but also notebooks, wrappers or anything left there by the driver. We reproduced these reflections on our training images by adding such objects to the images to make it appear like they are a reflection of items lying on the dashboard.
Same image with two different artificial reflections. On the left, a reflection of the inside of the vehicle. On the right, reflections of notebooks.
We successfully trained our model on the augmented dataset and it now reaches a performance of 97.2%, very similar to the original version, indicating it has learned how to handle the artificial reflections well. Analysing the model’s performance based on the ‘real world’ data, we can see that it is significantly less disorientated by reflections coming from the dashboard, as illustrated with the following images.
When working with real-world data, our researchers are constantly confronted with new problems coming from imperfect data. This can include reflections, occlusions, vandalised information signs or perhaps poorly maintained parking infrastructure with completely washed out demarcations. Sometimes the data is unfit for purpose and there is no option other than discarding the data completely. However, more often this just presents one more interesting problem for our team to resolve, and as shown in this example, sometimes some creative manipulations can do the trick!
Co-Author: Yannick Terme, Computer Vision Research Engineer