Saliency maps - A Visual explanation for popularity of pet images in Deep neural networks
Projet Paw-pularity: This paper attempts to address this Corgi's question rephrased as following:
Background:
Despite the numerous applications of AI aiding all forms of human intelligence, one domain that has not yet been much explored is Naturalistic Intelligence.
In this project, we challenged ourselves to explore the applications of Deep Learning to emulate human’s perception or emotional stimuli towards subjects captured in images, specifically of pets by detecting a popularity score - A metric derived to measure a pet’s cuteness or attractiveness and was termed "paw-pularity". By understanding what features make an image popular, we can leverage those features to take photos that maximizes the cuteness of a pet. This can be useful for pet adoption services wherein most people shortlist pets that they want to consider adopting prior to visiting the shelter based on how these pets look in profile images.
Motivation - Global animal welfare:
PetFinder.my is Malaysia’s leading animal welfare platform, featuring over 180,000 animals with 54,000 happily adopted. Currently, PetFinder.my uses a basic Cuteness Meter to rank pet photos. It analyzes picture composition and other factors compared to the performance of thousands of pet profiles. We set out to improve this.
Our solution can be integrated into composed AI tools and libraries that will guide shelters and rescuers around the world to improve the appeal of their pet profiles, automatically enhance photo quality and recommend composition improvements. As a result, stray dogs and cats can find their "furever" homes much faster. Many precious lives could be saved and more happy families created.
Approach:
In this paper, we explored the use of existing pre-trained deep neural network architectures (namely ResNet, VGG, DenseNet,and EfficientNet) for transfer learning and built custom layers on top to include the metadata features in determining the popularity of pets (cats & dogs). We further use three different saliency map techniques: Vanilla Gradient, GradCam and RISE, to visually explain the important features of the pet’s image. We came up with several hypotheses to interpret popularity among different images and using saliency maps, we presented evidence to confirm or reject the hypothesis as below:
Custom architecture - Deep convolutional layers
We leveraged select famous CNN architectures pretrained on the ImageNet database and used transfer learning to help us save time on training, tuning which parameter layers/blocks to freeze and which to further train on our data. However, these models were trained for image classification tasks, not regression, and as such we modified the final layer of these models and concatenated these with feature embeddings from our metadata generated from feed-forward linear layers. We passed this concatenated set of feature embeddings to a few more feed-forward layers then to a final layer with a single neuron that predicts the popularity score. Being a regression model, we used MSE Loss as our metric for assessing quality of performance. A generalized diagram of our neural network architecture is shown in the figure.
Peeking into the black box - Neural Network Visualisation
Since we are using a neural network to imitate the behaviour of perceived popularity, it is important to understand what aspects of the image are looked at while deciding the popularity score. Additionally, our best performing model does comparably well to a human and it would help build trust in the system to bring insights missed by humans or to validate initial perceptions. Visualisation is a strong tool to aid us in understanding the system as well as the data. We specifically focus on saliency maps in this project. Saliency maps are of two types:
Recommended by LinkedIn
Conclusion:
Given the understanding of some elements that contribute to the popularity of a pet, highlighting these features for new pet images could lead to high prediction scores on the platform which may ultimately result in faster and more successful adoptions. Below are some guidelines for capturing a pet image that better captivates human attention.
Reference: GitHub Link
Examples:
Popular pet images visualised
Unpopular pets visualised
Multiple subjects, blending backgrounds and unsual poses makes it hard for the Deep Neural network to identify the pixels critical to judge the cuteness and popularity of a pet's image.
Product Manager @ Digitalzone | ex-Times Internet, Hike | IIT Madras '17
3yGood going Teja! Learnt something new
Data Scientist | Passionate about Measurement & Experimentation
3yGreat work!
AI Research Tech @ JP Morgan | Georgia Tech | IIT Madras
3yWorking on this project was fun and insightful! Thanks for the wonderful article Shanmukha Teja Juttu