Saliency maps - A Visual explanation for popularity of pet images in Deep neural networks
"How to make a my social pictures more paw-pular ?" - An educated Corgi wonders

Saliency maps - A Visual explanation for popularity of pet images in Deep neural networks

Projet Paw-pularity: This paper attempts to address this Corgi's question rephrased as following:

  1. What are the most appealing or repelling aspects of a pet's image?
  2. Why do some pets' pictures get more likes than the other pets or even other images?

Background:

Despite the numerous applications of AI aiding all forms of human intelligence, one domain that has not yet been much explored is Naturalistic Intelligence.

In this project, we challenged ourselves to explore the applications of Deep Learning to emulate human’s perception or emotional stimuli towards subjects captured in images, specifically of pets by detecting a popularity score - A metric derived to measure a pet’s cuteness or attractiveness and was termed "paw-pularity". By understanding what features make an image popular, we can leverage those features to take photos that maximizes the cuteness of a pet. This can be useful for pet adoption services wherein most people shortlist pets that they want to consider adopting prior to visiting the shelter based on how these pets look in profile images.

Motivation - Global animal welfare:

PetFinder.my is Malaysia’s leading animal welfare platform, featuring over 180,000 animals with 54,000 happily adopted. Currently, PetFinder.my uses a basic Cuteness Meter to rank pet photos. It analyzes picture composition and other factors compared to the performance of thousands of pet profiles. We set out to improve this.

Our solution can be integrated into composed AI tools and libraries that will guide shelters and rescuers around the world to improve the appeal of their pet profiles, automatically enhance photo quality and recommend composition improvements. As a result, stray dogs and cats can find their "furever" homes much faster. Many precious lives could be saved and more happy families created.

Approach:

In this paper, we explored the use of existing pre-trained deep neural network architectures (namely ResNet, VGG, DenseNet,and EfficientNet) for transfer learning and built custom layers on top to include the metadata features in determining the popularity of pets (cats & dogs). We further use three different saliency map techniques: Vanilla Gradient, GradCam and RISE, to visually explain the important features of the pet’s image. We came up with several hypotheses to interpret popularity among different images and using saliency maps, we presented evidence to confirm or reject the hypothesis as below:

Custom architecture - Deep convolutional layers

We leveraged select famous CNN architectures pretrained on the ImageNet database and used transfer learning to help us save time on training, tuning which parameter layers/blocks to freeze and which to further train on our data. However, these models were trained for image classification tasks, not regression, and as such we modified the final layer of these models and concatenated these with feature embeddings from our metadata generated from feed-forward linear layers. We passed this concatenated set of feature embeddings to a few more feed-forward layers then to a final layer with a single neuron that predicts the popularity score. Being a regression model, we used MSE Loss as our metric for assessing quality of performance. A generalized diagram of our neural network architecture is shown in the figure.

No alt text provided for this image

Peeking into the black box - Neural Network Visualisation

Since we are using a neural network to imitate the behaviour of perceived popularity, it is important to understand what aspects of the image are looked at while deciding the popularity score. Additionally, our best performing model does comparably well to a human and it would help build trust in the system to bring insights missed by humans or to validate initial perceptions. Visualisation is a strong tool to aid us in understanding the system as well as the data. We specifically focus on saliency maps in this project. Saliency maps are of two types:

  • Gradient based: Gradients calculated on the pixel values of the image by back propagating the loss holding the model constant gives us information on how the score would change on varying the pixel slightly. This measures the sensitivity of the pixel to the final decision. Higher the sensitivity, the more crucial the pixel has been for the prediction. It should be noted that this works only in the case of whitebox models where we know the architecture and hence have access to gradients. We use simple gradients and GradCAM.
  • Perturbation based: In this method, we block a few pixels or patches of pixels to see how much the final prediction changes. Based on the change in score, the importance score is calculated for pixel or block of pixels. This works with black box models where we don't have access to model weights and architecture because all we need is the output for a given input. We use a method called RISE (Randomized Input Sampling for Explanation of Black-box Models) in this category.

No alt text provided for this image

Conclusion:

Given the understanding of some elements that contribute to the popularity of a pet, highlighting these features for new pet images could lead to high prediction scores on the platform which may ultimately result in faster and more successful adoptions. Below are some guidelines for capturing a pet image that better captivates human attention.

  1. Maintain just one subject in the image
  2. Ensure a contrasting background against the pet
  3. Make the face clearly visible along with both eyes
  4. Avoid unusual poses of the pet
  5. Bonus points if the dog sticks out its tongue indicating health and playful energy.

Reference: GitHub Link

Examples:

Popular pet images visualised

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Unpopular pets visualised

Multiple subjects, blending backgrounds and unsual poses makes it hard for the Deep Neural network to identify the pixels critical to judge the cuteness and popularity of a pet's image.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Sumit Jiwane

Product Manager @ Digitalzone | ex-Times Internet, Hike | IIT Madras '17

3y

Good going Teja! Learnt something new

Deepankar Singh

Data Scientist | Passionate about Measurement & Experimentation

3y

Great work!

Like
Reply
Sivapriya Vellaichamy

AI Research Tech @ JP Morgan | Georgia Tech | IIT Madras

3y

Working on this project was fun and insightful! Thanks for the wonderful article Shanmukha Teja Juttu

To view or add a comment, sign in

More articles by Shanmukha Teja Juttu

Insights from the community

Others also viewed

Explore topics