Finding Profitable App Genre

Finding Profitable App Genre

Recently, I was having a chat with a close friend of mine who is an avid app developer. After all the usual gossips our attention shifted to our professional side when my friend informed me that he started doing freelancing recently and asked if I could help him figure out the kind(Genre) of apps he could develop, in order to be profitable. I immediately agreed since I love to play with data and derive meaningful insights out of it. Below are the findings I share with my friend, whom happily agreed to share the same on a public forum so that it will be useful to other freelancers out there. 

Finding Profitable App Genre - Google Play & App Store Market

My goal in this project is to find the genre(ex: Games, Books, News etc) of app's that will be profitable on both Google Play and the iOS App Store markets. This will help any freelance app developers to make better data driven decision on the kind of apps they would like to develop to be profitable.

We will analyze data about Google Play and App Store apps and try to find:

  • Most common apps by genre on both the markets
  • Most popular apps by genre on both the markets

Summary of Results

After analyzing the data, I found that taking a recent popular book and turning it into an app will be profitable on both the markets. I also recommend adding several features to the app. For more details, please refer to the full analysis below.

Data Set

Both Google Play and App Store have more than 2 million apps each. Gathering data for these 4+ million apps will be resource intense and hence I decided to analyze sample of data that I located on Kaggle.

  • This data set contains data about 10,000+ Android apps on the Google Play.
  • This data set contains data about 7,000+ iOS apps on the App Store.

Exploring the Data set

We will start by opening the two data set files that we are going to analyze.

In [1]:

#Opening the Google Play data set
open_android = open("googleplaystore.csv")
from csv import reader
android_reader = reader(open_android)
android = list(android_reader)  

#Opening the iOS App Store data set
open_ios = open("AppleStore.csv")
ios_reader = reader(open_ios)
ios = list(ios_reader)

To make our analysis easier, let's write a small function named app_info that will return the app information we want for any data set. Apart from it, this function will also return the total number of app's in that particular data set if needed.

In [2]:

#Function to return app information
def app_info(dataset, start, finish, length= False):
    for each_app in dataset[start:finish+1]:
        print(each_app)
        print("\n")
        
    if length == True:
        print("Total number of apps: ", len(dataset[1:]))

We will use the above function to print few apps from both the data set to see how the information looks like.

In [3]:

##Using app_info function to print Android apps
app_info(android, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Total number of apps:  10841

As we above, there are total of 10,841 Android apps. The app contains information like name, size, number of reviews, installs, average rating to name a few.

Now lets see how may iOS apps we have for analysis and see how it looks like.

In [4]:

#Using app_info function to print iOS apps
app_info(ios, 0, 3, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Total number of apps:  7197

We have 7197 iOS apps in our App Store data set and it contains information's like name, size, price, total number of ratings, average rating, genre etc.

One thing to note in App Store data set is that, it does not contain number of install information like we had for Google Play data set.

Data Cleaning

The most underrated process in a data analysis task is cleaning of data. Before we start analyzing our data, we need to be sure that the data is free of errors, misinformation's and duplicates. Else our analysis would be inaccurate and can lead to false conclusions. So it is very important that we take time to clean and make our data set ready for further analysis.

Remove Inaccurate Data

First step in the data cleaning process is to find out if there are app's in the data set that has any missing information.

In [5]:

##Find inaccurate app in Google Play
for each in android[1:]:
    if len(each) != len(android[0]):
        print(each)
        print("length of this row: ", len(each))
        print("Index of inaccurate app is: ", android.index(each))
        print("\n")
        print(android[0])
        print("length of the header: ", len(android[0]))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
length of this row:  12
Index of inaccurate app is:  10473


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
length of the header:  13

As we see, information of the app named "Life Made WI-Fi Touchscreen Photo Frame" on the Google Play data set is inaccurate as it is missing one of the information. The app has only 12 columns against the 13 we have for the header of Google Play data set. So we better of remove this app as we don't know which information is missing.

In [6]:

## Removing "Life Made WI-Fi Touchscreen Photo Frame" app
print("Number of apps before deleting: ", len(android[1:]))
del android[10473] 
print("Number of apps after deleting: ",len(android[1:]))
Number of apps before deleting:  10841
Number of apps after deleting:  10840

In [7]:

##Find inaccurate app information in App Store
for each in ios[1:]:
    if len(each) != len(ios[0]):
        print(each)
        print("length of this row: ", len(each))
        print("Index of inaccurate app is: ", ios.index(each))
        print("\n")
        print(ios[0])
        print("length of the header: ", len(ios[0]))

There is no app(s) on the App Store that is missing information.


Deleting Duplicate App(s)

Next step in data cleaning is to locate apps that occur more than ones and find a way to remove those duplicates. For example the app "Google Ads" occur thrice in Google Play data set as we see below.

In [8]:

for each_app in android[1:]:
    name = each_app[0]
    if name == "Google Ads":
        print(each_app)
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']

Now lets find how many duplicate entries are there in each of the data sets.

In [9]:

## Finding number of duplicate apps in the Google Play
duplicate_app_google = []
unique_app_google = []
for each_row in android[1:]:
    name = each_row[0]
    if name in unique_app_google:
        duplicate_app_google.append(name)
    else:
        unique_app_google.append(name)
        
print("Number of duplicate apps in Google Play: ", len(duplicate_app_google))
print("\n")
print("Number of unique apps in Google Play: ", len(unique_app_google))

## Finding number of duplicate apps in the App Store
duplicate_app_apple = []
unique_app_apple = []
for each_row in ios[1:]:
    name = each_row[1]
    if name in unique_app_apple:
        duplicate_app_apple.append(name)
    else:
        unique_app_apple.append(name)

print("\n")        
print("Number of duplicate apps in App Store: ", len(duplicate_app_apple))
print("\n")
print("Number of unique apps in App Store: ", len(unique_app_apple))
Number of duplicate apps in Google Play:  1181


Number of unique apps in Google Play:  9659


Number of duplicate apps in App Store:  2


Number of unique apps in App Store:  7195

We can see that there are 1181 duplicate apps in the Google Play data set. App Store is much better and contains only 2 duplicate entries.


We need to keep only 1 entry per app and delete all the duplicate ones. If you see the "Google Ads" we printed above, there are 2 duplicates and the difference between the apps happen at 4th column which is the total number of reviews. So we can keep the app that has the highest number of reviews and delete the remaining ones. Higher the number of reviews the latest the information of the app should be.

We will also follow the same procedure for App Store data set to keep the unique apps.

Lets start by creating a dictionary where keys will be the unique app name and the values will be the highest review count for that app.

In [10]:

#Finding number of unique apps in Google Play
google_unique_app = {}
for each_row in android[1:]:
    name = each_row[0]
    reviews = float(each_row[3])
    if name in google_unique_app and reviews > google_unique_app[name]:
        google_unique_app[name] = reviews
    elif name not in google_unique_app:
        google_unique_app[name] = reviews
        
print("Number of unique Google apps extracted: ", len(google_unique_app))
Number of unique Google apps extracted:  9659

As we can see, the number of unique apps in the dictionary is 9659 which is same as what we found in the previous code cell.


Now let us use this dictionary to create a list, with all the rows of the unique apps having the highest number of reviews.

In [11]:

unique_google = []
already_added = []
for each_row in android[1:]:
    name = each_row[0]
    review = float(each_row[3])
    if google_unique_app[name] == review and name not in already_added:
        unique_google.append(each_row)
        already_added.append(name)

print("Number of Google Play apps :", len(unique_google))
Number of Google Play apps : 9659

Now lets do the same code for App Store data set and extract the unique app having the highest review. The number of reviews is found at column 6 in the App Store data set and hence we will modify our code accordingly.

In [12]:

apple_unique_app = {}
for each_row in ios[1:]:
    name = each_row[1]
    reviews = float(each_row[5])
    if name in apple_unique_app and reviews > apple_unique_app[name]:
        apple_unique_app[name] = reviews
    elif name not in apple_unique_app:
        apple_unique_app[name] = reviews
        
print("Number of unique iOS apps extracted: ", len(apple_unique_app))

unique_apple = []
already_added = []
for each_row in ios[1:]:
    name = each_row[1]
    review = float(each_row[5])
    if apple_unique_app[name] == review and name not in already_added:
        unique_apple.append(each_row)
        already_added.append(name)

print("Number of App Store apps :", len(unique_apple))
Number of unique iOS apps extracted:  7195
Number of App Store apps : 7195

Removing Non-English Apps

There are several non-English apps in both the data set. For example:

['Cъновник BG', 'BOOKS_AND_REFERENCE', 'NaN', '13', '4.1M', '1,000+', 'Free', '0', 'Everyone', 'Books & Reference', 'January 21, 2017', '250', '4.0 and up']

['뽕티비 - 개인방송, 인터넷방송, BJ방송', 'VIDEO_PLAYERS', 'NaN', '414', '59M', '100,000+', 'Free', '0', 'Mature 17+', 'Video Players & Editors', 'July 18, 2018', '4.0.7', '4.0.3 and up']

['BL 女性向け恋愛ゲーム◆俺プリクロス', 'FAMILY', '4.2', '3379', '62M', '100,000+', 'Free', '0', 'Mature 17+', 'Simulation', 'March 23, 2017', '1.6.3', '2.3.3 and up']

My friend's project mainly caters to English speaking audience and hence it makes no sense to analyze app's that are not in English. So as part of data cleaning, we will remove all the non-English apps.

Each alphabet we type in computer corresponds to a number called "ASCII" value. The texts we normally use in English(letters, numbers, punctuations and other symbols) sits in the ASCII range of 0 - 127.

The logic is to go through each row on both the data sets and extract the name of the app and check if the ASCII value of each of their characters are > 127. if it's greater than 127 we will remove those apps.

However certain English apps still have some characters in their name whose ASCII value are greater than 127. For example:

['FlirtChat - ♥Free Dating/Flirting App♥', 'DATING', '4.3', '2433', '13M', '500,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 26, 2018', '12.0.4', '4.1 and up']

if you see the app above, we are sure that the app is English but the ASCII value of two ♥ characters we have in the name, is 9829. So going by our rule, our code will also remove apps like these, since one of the characters in the name field has ASCII value greater than 127.

So we will modify our code in such a way that, our program will remove the apps only if the app name have more than 3 ASCII characters whose values are greater than 127. By this way we can keep most of the English apps that have special characters in their names like we saw above.

In [13]:

##Removing non-English app from the Google Play
google_english = []
for each_app in unique_google:
    name = each_app[0]
    count = 0
    for each in name:
        if ord(each) > 127:
            count += 1
    if count <= 3:
        google_english.append(each_app)
        
print("Number of English apps in the Goole Play: ", len(google_english))
Number of English apps in the Goole Play:  9614

As we see, against the original 9659 apps we are left with 9614 apps after removing the non-English apps. Now lets do the same for apps in the App Store

In [14]:

##Removing non-English app from the App Store
ios_english = []
for each_app in unique_apple:
    name = each_app[1]
    count = 0
    for each in name:
        if ord(each) > 127:
            count += 1
    if count <= 3:
        ios_english.append(each_app)
        
print("Number of English apps in the App Store: ", len(ios_english))
Number of English apps in the App Store:  6181

So we have removed quite a few apps from the App Store and finally left with 6181 English apps.


Our code may still have few non English apps and there is also a possibility that we might have left few English apps, but that should not have any significant bearing in the outcome of our analysis.

Removing paid Apps

My friend wants to build only free apps since that will bring in more number of users against a paid one for which users expectations will be higher and also require significant maintenance costs. Also most of the freelancers out there are students and hence they prefer developing a free app.

So success of any free app depends up on the number of users downloads it and the revenue comes form the in-app ads. So more the number of users download and use the app, more is the revenue.

So for the reasons said above, we will remove the paid apps from both the data sets.

In [15]:

## Removing paid apps from the google data set
android_final = []
for each_app in google_english:
    if each_app[7] == "0":
        android_final.append(each_app)

print("Number of Android apps after cleaning: ", len(android_final))

## Removing paid apps from the apple data set
ios_final = []
for each_app in ios_english:
    if each_app[4] == "0.0":
        ios_final.append(each_app)
print("Number of iOS apps after cleaning: ", len(ios_final))
Number of Android apps after cleaning:  8864
Number of iOS apps after cleaning:  3220

After removing the paid apps, we are left with 8864 Android apps and 3220 iOS apps.


With this, the data cleaning process is complete.

Data Analysis

As said in the introduction, our goal is to find the type of apps that will be profitable on both the markets. Since we are dealing only with free apps, the revenue of any free apps depends on the number of users using the app.

Most Common App by Genre

So lets start our data analysis by. Finding what genre of apps are very common on two markets. The genre information for Android is on index 9 and for iOS it is on index 11.

We will be creating two functions ; one to create a frequency table for any rows we want and the next function to sort and print the frequent table in descending order so that it will be readable.

In [16]:

## Function to create frequency table
def freq_table(dataset, index):
    frequency_table = {}
    for each_app in dataset:
        column = each_app[index]
        if column in frequency_table:
            frequency_table[column] += 1
        else:
            frequency_table[column] = 1
            
    for each_item in frequency_table:
        frequency_table[each_item] /= len(dataset)
        frequency_table[each_item] *= 100
        
    return frequency_table

## Function to sort the above frequency table in descending order
def sort(dataset, index):
    frequency_table = freq_table(dataset, index)
    list_freq = []
    for each in frequency_table:
        freq_temp = [frequency_table[each], each]
        list_freq.append(freq_temp)
        
    sort_freq = sorted(list_freq, reverse = True)
    for each in sort_freq:
        print(each[1], " : ", each[0])

In [17]:

sort(ios_final, 11)
Games  :  58.13664596273293
Entertainment  :  7.888198757763975
Photo & Video  :  4.968944099378882
Education  :  3.6645962732919255
Social Networking  :  3.291925465838509
Shopping  :  2.608695652173913
Utilities  :  2.515527950310559
Sports  :  2.142857142857143
Music  :  2.049689440993789
Health & Fitness  :  2.018633540372671
Productivity  :  1.7391304347826086
Lifestyle  :  1.5838509316770186
News  :  1.3354037267080745
Travel  :  1.2422360248447204
Finance  :  1.1180124223602486
Weather  :  0.8695652173913043
Food & Drink  :  0.8074534161490683
Reference  :  0.5590062111801243
Business  :  0.5279503105590062
Book  :  0.43478260869565216
Navigation  :  0.18633540372670807
Medical  :  0.18633540372670807
Catalogs  :  0.12422360248447205

As we see, among the Free English apps more than half(58 %) of them belongs to Games category, followed by Entertainment at a distant second (~ 8 %). Photo & Video takes 3rd place with ~ 9%.

So we can see that almost 70% of the apps belongs to fun category(Games, Entertainment, Photos).

Now lets take a look at Android market.

In [18]:

sort(android_final, 9)
Tools  :  8.449909747292418
Entertainment  :  6.069494584837545
Education  :  5.347472924187725
Business  :  4.591606498194946
Productivity  :  3.892148014440433
Lifestyle  :  3.892148014440433
Finance  :  3.7003610108303246
Medical  :  3.531137184115524
Sports  :  3.463447653429603
Personalization  :  3.3167870036101084
Communication  :  3.2378158844765346
Action  :  3.1024368231046933
Health & Fitness  :  3.0798736462093865
Photography  :  2.944494584837545
News & Magazines  :  2.7978339350180503
Social  :  2.6624548736462095
Travel & Local  :  2.3240072202166067
Shopping  :  2.2450361010830324
Books & Reference  :  2.1435018050541514
Simulation  :  2.0419675090252705
Dating  :  1.861462093862816
Arcade  :  1.8501805054151623
Video Players & Editors  :  1.7712093862815883
Casual  :  1.7599277978339352
Maps & Navigation  :  1.3989169675090252
Food & Drink  :  1.2409747292418771
Puzzle  :  1.128158844765343
Racing  :  0.9927797833935018
Role Playing  :  0.9363718411552346
Libraries & Demo  :  0.9363718411552346
Auto & Vehicles  :  0.9250902527075812
Strategy  :  0.9138086642599278
House & Home  :  0.8235559566787004
Weather  :  0.8009927797833934
Events  :  0.7107400722021661
Adventure  :  0.6768953068592057
Comics  :  0.6092057761732852
Beauty  :  0.5979241877256317
Art & Design  :  0.5979241877256317
Parenting  :  0.4963898916967509
Card  :  0.45126353790613716
Casino  :  0.42870036101083037
Trivia  :  0.41741877256317694
Educational;Education  :  0.39485559566787
Board  :  0.3835740072202166
Educational  :  0.3722924187725632
Education;Education  :  0.33844765342960287
Word  :  0.2594765342960289
Casual;Pretend Play  :  0.236913357400722
Music  :  0.2030685920577617
Racing;Action & Adventure  :  0.16922382671480143
Puzzle;Brain Games  :  0.16922382671480143
Entertainment;Music & Video  :  0.16922382671480143
Casual;Brain Games  :  0.13537906137184114
Casual;Action & Adventure  :  0.13537906137184114
Arcade;Action & Adventure  :  0.12409747292418773
Action;Action & Adventure  :  0.10153429602888085
Educational;Pretend Play  :  0.09025270758122744
Simulation;Action & Adventure  :  0.078971119133574
Parenting;Education  :  0.078971119133574
Entertainment;Brain Games  :  0.078971119133574
Board;Brain Games  :  0.078971119133574
Parenting;Music & Video  :  0.06768953068592057
Educational;Brain Games  :  0.06768953068592057
Casual;Creativity  :  0.06768953068592057
Art & Design;Creativity  :  0.06768953068592057
Education;Pretend Play  :  0.056407942238267145
Role Playing;Pretend Play  :  0.04512635379061372
Education;Creativity  :  0.04512635379061372
Role Playing;Action & Adventure  :  0.033844765342960284
Puzzle;Action & Adventure  :  0.033844765342960284
Entertainment;Creativity  :  0.033844765342960284
Entertainment;Action & Adventure  :  0.033844765342960284
Educational;Creativity  :  0.033844765342960284
Educational;Action & Adventure  :  0.033844765342960284
Education;Music & Video  :  0.033844765342960284
Education;Brain Games  :  0.033844765342960284
Education;Action & Adventure  :  0.033844765342960284
Adventure;Action & Adventure  :  0.033844765342960284
Video Players & Editors;Music & Video  :  0.02256317689530686
Sports;Action & Adventure  :  0.02256317689530686
Simulation;Pretend Play  :  0.02256317689530686
Puzzle;Creativity  :  0.02256317689530686
Music;Music & Video  :  0.02256317689530686
Entertainment;Pretend Play  :  0.02256317689530686
Casual;Education  :  0.02256317689530686
Board;Action & Adventure  :  0.02256317689530686
Video Players & Editors;Creativity  :  0.01128158844765343
Trivia;Education  :  0.01128158844765343
Travel & Local;Action & Adventure  :  0.01128158844765343
Tools;Education  :  0.01128158844765343
Strategy;Education  :  0.01128158844765343
Strategy;Creativity  :  0.01128158844765343
Strategy;Action & Adventure  :  0.01128158844765343
Simulation;Education  :  0.01128158844765343
Role Playing;Brain Games  :  0.01128158844765343
Racing;Pretend Play  :  0.01128158844765343
Puzzle;Education  :  0.01128158844765343
Parenting;Brain Games  :  0.01128158844765343
Music & Audio;Music & Video  :  0.01128158844765343
Lifestyle;Pretend Play  :  0.01128158844765343
Lifestyle;Education  :  0.01128158844765343
Health & Fitness;Education  :  0.01128158844765343
Health & Fitness;Action & Adventure  :  0.01128158844765343
Entertainment;Education  :  0.01128158844765343
Communication;Creativity  :  0.01128158844765343
Comics;Creativity  :  0.01128158844765343
Casual;Music & Video  :  0.01128158844765343
Card;Action & Adventure  :  0.01128158844765343
Books & Reference;Education  :  0.01128158844765343
Art & Design;Pretend Play  :  0.01128158844765343
Art & Design;Action & Adventure  :  0.01128158844765343
Arcade;Pretend Play  :  0.01128158844765343
Adventure;Education  :  0.01128158844765343

Unlike App Store which are dominated by Fun apps, The Google Play store has balanced landscape with both fun(entertainment, lifestyle) and apps designed for productivity purposes(tools, education) in the top.

However, we also need to understand that on both Google Play and App Store markets, genres with most proportion may or may not have highest number of users(downloads) as well. The demand might not be same as supply.

App with most users

Now lets figure out the genre of apps with most users.

If you see the App Store data set there is no column that gives the information about the number of users/downloads. So what we will do instead is use column number 5, which is the total number of reviews which kind of gives us an idea about its popularity.

In [19]:

## Finding Apps of Genre with most users on App Store
most_apps = {}
temp_add = {}
for each_app in ios_final:
    rating_count = float(each_app[5])
    genre = each_app[11]
    if genre in most_apps:
        most_apps[genre] *= temp_add[genre]
        most_apps[genre] += rating_count
        temp_add[genre] += 1
        most_apps[genre] /= temp_add[genre]
    
    else:
        most_apps[genre] = rating_count
        temp_add[genre] = 1

as_list = []
for each in most_apps:
    temp_list = [most_apps[each], each]
    as_list.append(temp_list)
    
for each_app in sorted(as_list, reverse = True):
    print(each_app[1], " : ", each_app[0])
Navigation  :  86090.33333333333
Reference  :  74942.11111111111
Social Networking  :  71548.34905660385
Music  :  57326.5303030303
Weather  :  52279.892857142855
Book  :  39758.5
Food & Drink  :  33333.92307692309
Finance  :  31467.944444444445
Photo & Video  :  28441.543749999993
Travel  :  28243.8
Shopping  :  26919.690476190477
Health & Fitness  :  23298.015384615377
Sports  :  23008.898550724636
Games  :  22812.92467948712
News  :  21248.023255813947
Productivity  :  21028.410714285714
Utilities  :  18684.456790123462
Lifestyle  :  16485.764705882346
Entertainment  :  14029.830708661419
Business  :  7491.117647058824
Education  :  7003.983050847459
Catalogs  :  4004.0
Medical  :  612.0

Navigation apps has the highest number of reviews on average. However after analysing further in detail, we could see that the average was due to Waze and Google Maps which has more than 95% of total reviews.

In [20]:

for each in ios_final:
    if each[11] == "Navigation":
        print(each[1], " :", each[5])
        print("\n")
Waze - GPS Navigation, Maps & Real-time Traffic  : 345046


Google Maps - Navigation & Transit  : 154911


Geocaching®  : 12811


CoPilot GPS – Car Navigation & Offline Maps  : 3582


ImmobilienScout24: Real Estate Search in Germany  : 187


Railway Route Search  : 5


We notice the same trend in Social Networking and Music genre as well. Social Networking is dominated by Facebook and Pinterest and Music category is dominated by Pandora and Spotify. It is impossible for a freelancer to compete against these giants in this category and succeed, because all these are very successful products with millions of users in real time. Same thing goes for Food and Finance category which is dominated by giants in those field.

Reference category has 74942 reviews on average. On looking further, Bible and Dictionary tops the list with most reviews.

In [21]:

for each in ios_final:
    if each[11] == "Reference":
        print(each[1], " :", each[5])
        print("\n")
Bible  : 985920


Dictionary.com Dictionary & Thesaurus  : 200047


Dictionary.com Dictionary & Thesaurus for iPad  : 54175


Google Translate  : 26786


Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran  : 18418


New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition  : 17588


Merriam-Webster Dictionary  : 16849


Night Sky  : 12122


City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)  : 8535


LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools  : 4693


GUNS MODS for Minecraft PC Edition - Mods Tools  : 1497


Guides for Pokémon GO - Pokemon GO News and Cheats  : 826


WWDC  : 762


Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free  : 718


VPN Express  : 14


Real Bike Traffic Rider Virtual Reality Glasses  : 8


教えて!goo  : 0


Jishokun-Japanese English Dictionary & Translator  : 0


This category seems interesting. What we can do is take one or more popular book and turn into an app. We can add many different features like quiz, quotes, links to any news about the book on the web. We also can include audio version of the book.

Another interesting thing we can do is to have an in built Dictionary inside the app itself so the user don't have to exit the app to look for meaning.

Now lets find in Google Play the genre of apps with most users.

In [22]:

## Finding Apps of Genre with most users on Google Play
most_apps = {}
temp_add = {}
for each_app in android_final:
    rating_count = each_app[5]
    rating_count = rating_count.replace("+", "")
    rating_count = rating_count.replace(",", "")
    rating_count = float(rating_count)
    genre = each_app[9]
    if genre in most_apps:
        most_apps[genre] *= temp_add[genre]
        most_apps[genre] += rating_count
        temp_add[genre] += 1
        most_apps[genre] /= temp_add[genre]
    
    else:
        most_apps[genre] = rating_count
        temp_add[genre] = 1

as_list = []
for each in most_apps:
    temp_list = [most_apps[each], each]
    as_list.append(temp_list)
    
for each_app in sorted(as_list, reverse = True):
    print(each_app[1], " : ", each_app[0])
Communication  :  38456119.16724743
Adventure;Action & Adventure  :  35333333.333333336
Video Players & Editors  :  24947335.796178345
Social  :  23253652.12711863
Arcade  :  22888365.48780488
Casual  :  19569221.602564108
Puzzle;Action & Adventure  :  18366666.666666668
Photography  :  17840110.40229885
Educational;Action & Adventure  :  17016666.666666668
Productivity  :  16787331.344927523
Racing  :  15910645.681818193
Travel & Local  :  14051476.145631064
Casual;Action & Adventure  :  12916666.666666666
Action  :  12603588.872727277
Strategy  :  11199902.5308642
Tools  :  10802461.246996008
Tools;Education  :  10000000.0
Role Playing;Brain Games  :  10000000.0
Lifestyle;Pretend Play  :  10000000.0
Casual;Music & Video  :  10000000.0
Card;Action & Adventure  :  10000000.0
Adventure;Education  :  10000000.0
News & Magazines  :  9549178.467741935
Music  :  9445583.333333334
Educational;Pretend Play  :  9375000.0
Puzzle;Brain Games  :  9280666.666666666
Word  :  9094458.695652174
Racing;Action & Adventure  :  8816666.666666666
Books & Reference  :  8767811.894736838
Puzzle  :  8302861.910000001
Video Players & Editors;Music & Video  :  7500000.0
Shopping  :  7036877.311557789
Role Playing;Action & Adventure  :  7000000.0
Casual;Pretend Play  :  6957142.857142857
Entertainment;Music & Video  :  6413333.333333333
Action;Action & Adventure  :  5888888.888888889
Entertainment  :  5602792.775092941
Education;Brain Games  :  5333333.333333333
Casual;Creativity  :  5333333.333333333
Role Playing;Pretend Play  :  5275000.0
Personalization  :  5201482.612244898
Weather  :  5074486.197183099
Sports;Action & Adventure  :  5050000.0
Music;Music & Video  :  5050000.0
Video Players & Editors;Creativity  :  5000000.0
Adventure  :  4922785.333333333
Simulation;Action & Adventure  :  4857142.857142857
Education;Education  :  4759517.0
Board  :  4759209.117647059
Sports  :  4596842.615635181
Educational;Brain Games  :  4433333.333333333
Health & Fitness  :  4188821.9853479844
Maps & Navigation  :  4056941.7741935495
Entertainment;Creativity  :  4000000.0
Role Playing  :  3965645.421686747
Card  :  3815462.5
Trivia  :  3475712.7027027025
Simulation  :  3475484.0883977907
Casino  :  3427910.5263157897
Entertainment;Brain Games  :  3314285.714285714
Arcade;Action & Adventure  :  3190909.1818181816
Entertainment;Pretend Play  :  3000000.0
Board;Action & Adventure  :  3000000.0
Education;Creativity  :  2875000.0
Entertainment;Action & Adventure  :  2333333.3333333335
Educational;Creativity  :  2333333.3333333335
Art & Design  :  2122850.9433962265
Education;Music & Video  :  2033333.3333333333
Food & Drink  :  1924897.736363638
Education;Pretend Play  :  1800000.0
Educational;Education  :  1737143.142857143
Business  :  1712290.1474201486
Casual;Brain Games  :  1425916.6666666667
Lifestyle  :  1412998.3449275375
Finance  :  1387692.475609757
House & Home  :  1331540.5616438356
Parenting;Music & Video  :  1118333.3333333333
Strategy;Creativity  :  1000000.0
Strategy;Action & Adventure  :  1000000.0
Racing;Pretend Play  :  1000000.0
Parenting;Brain Games  :  1000000.0
Health & Fitness;Action & Adventure  :  1000000.0
Entertainment;Education  :  1000000.0
Education;Action & Adventure  :  1000000.0
Casual;Education  :  1000000.0
Arcade;Pretend Play  :  1000000.0
Dating  :  854028.8303030301
Comics  :  831873.1481481482
Puzzle;Creativity  :  750000.0
Auto & Vehicles  :  647317.8170731709
Libraries & Demo  :  638503.7349397589
Education  :  550185.4430379759
Simulation;Pretend Play  :  550000.0
Beauty  :  513151.8867924528
Strategy;Education  :  500000.0
Music & Audio;Music & Video  :  500000.0
Communication;Creativity  :  500000.0
Art & Design;Pretend Play  :  500000.0
Parenting  :  467977.5
Parenting;Education  :  452857.14285714284
Educational  :  411184.8484848485
Board;Brain Games  :  407142.85714285716
Art & Design;Creativity  :  285000.0
Events  :  253542.22222222234
Medical  :  120550.61980830679
Travel & Local;Action & Adventure  :  100000.0
Puzzle;Education  :  100000.0
Lifestyle;Education  :  100000.0
Health & Fitness;Education  :  100000.0
Art & Design;Action & Adventure  :  100000.0
Comics;Creativity  :  50000.0
Books & Reference;Education  :  1000.0
Simulation;Education  :  500.0
Trivia;Education  :  100.0

Communication tops the list with average install of 38 million. However same like App Store market, most number of installs belong to , SkypeMessenger and few more apps which have more than 1 billion installs.

In [23]:

for each in android_final:
    if each[9] == "Communication":
        rating = each[5]
        rating = rating.replace("+", "")
        rating = rating.replace(",","")
        rating = float(rating)
        if rating > 100000000:
            print(each[0], " : ", each[5])
WhatsApp Messenger  :  1,000,000,000+
Google Duo - High Quality Video Calls  :  500,000,000+
Messenger – Text and Video Chat for Free  :  1,000,000,000+
imo free video calls and chat  :  500,000,000+
Skype - free IM & video calls  :  1,000,000,000+
LINE: Free Calls & Messages  :  500,000,000+
Google Chrome: Fast & Secure  :  1,000,000,000+
UC Browser - Fast Download Private & Secure  :  500,000,000+
Gmail  :  1,000,000,000+
Hangouts  :  1,000,000,000+
Viber Messenger  :  500,000,000+

So Communication Apps seems to be more popular than they really are. If you exclude the Communications apps with more than 10 million installs, the average installs reduce to 0.7 million.

In [24]:

#average rating by excluding > 10 million apps
ratings = []
for each in android_final:
    if each[9] == "Communication":
        rating = each[5]
        rating = rating.replace("+", "")
        rating = rating.replace(",","")
        rating = float(rating)
        if rating < 10000000:
            ratings.append(rating)

print("Average installs of communication apps ater excluding apps greater than 10 million installs: ",sum(ratings) / len(ratings))
Average installs of communication apps ater excluding apps greater than 10 million installs:  747172.3857142857

The Genre after Communication is Adventure;Action & Adventure which only have 3 apps which dont give much information for our analysis and hence we skip that.

In [25]:

for each in android_final:
    if each[9] == "Adventure;Action & Adventure":
        print(each[0], " : ", each[5])
Leo and Tig  :  1,000,000+
Transformers Rescue Bots: Hero Adventures  :  5,000,000+
ROBLOX  :  100,000,000+

The other top Genres in the list, also follows similar pattern like Communications apps where only the few giants in the field have almost all the installs. For example  and Google Play have more than 1 billion in the Video Players & Editors Genre.

In [26]:

ratings = []
for each in android_final:
    if each[9] == "Video Players & Editors":
        rating = each[5]
        rating = rating.replace("+", "")
        rating = rating.replace(",","")
        rating = float(rating)
        if rating > 100000000:
            print(each[0], " : ", each[5])
YouTube  :  1,000,000,000+
Google Play Movies & TV  :  1,000,000,000+
MX Player  :  500,000,000+

Books & Reference genre have nearly 9 million installs. We would like to explore this in detail as we found this genre has some potential in App Store, and hence would like to see how it fairs in the Google Play market.`

In [27]:

for each in android_final:
    if each[9] == "Books & Reference":
        print(each[0], " : ", each[5])
E-Book Read - Read Book for free  :  50,000+
Download free book with green book  :  100,000+
Wikipedia  :  10,000,000+
Cool Reader  :  10,000,000+
Free Panda Radio Music  :  100,000+
Book store  :  1,000,000+
FBReader: Favorite Book Reader  :  10,000,000+
English Grammar Complete Handbook  :  500,000+
Free Books - Spirit Fanfiction and Stories  :  1,000,000+
Google Play Books  :  1,000,000,000+
AlReader -any text book reader  :  5,000,000+
Offline English Dictionary  :  100,000+
Offline: English to Tagalog Dictionary  :  500,000+
FamilySearch Tree  :  1,000,000+
Cloud of Books  :  1,000,000+
Recipes of Prophetic Medicine for free  :  500,000+
ReadEra – free ebook reader  :  1,000,000+
Anonymous caller detection  :  10,000+
Ebook Reader  :  5,000,000+
Litnet - E-books  :  100,000+
Read books online  :  5,000,000+
English to Urdu Dictionary  :  500,000+
eBoox: book reader fb2 epub zip  :  1,000,000+
English Persian Dictionary  :  500,000+
Flybook  :  500,000+
All Maths Formulas  :  1,000,000+
Ancestry  :  5,000,000+
HTC Help  :  10,000,000+
English translation from Bengali  :  100,000+
Pdf Book Download - Read Pdf Book  :  100,000+
Free Book Reader  :  100,000+
eBoox new: Reader for fb2 epub zip books  :  50,000+
Only 30 days in English, the guideline is guaranteed  :  500,000+
Moon+ Reader  :  10,000,000+
SH-02J Owner's Manual (Android 8.0)  :  50,000+
English-Myanmar Dictionary  :  1,000,000+
Golden Dictionary (EN-AR)  :  1,000,000+
All Language Translator Free  :  1,000,000+
Azpen eReader  :  500,000+
URBANO V 02 instruction manual  :  100,000+
Bible  :  100,000,000+
C Programs and Reference  :  50,000+
C Offline Tutorial  :  1,000+
C Programs Handbook  :  50,000+
Amazon Kindle  :  100,000,000+
Aab e Hayat Full Novel  :  100,000+
Aldiko Book Reader  :  10,000,000+
Google I/O 2018  :  500,000+
R Language Reference Guide  :  10,000+
Learn R Programming Full  :  5,000+
R Programing Offline Tutorial  :  1,000+
Guide for R Programming  :  5+
Learn R Programming  :  10+
R Quick Reference Big Data  :  1,000+
V Made  :  100,000+
Wattpad 📖 Free Books  :  100,000,000+
Dictionary - WordWeb  :  5,000,000+
Guide (for X-MEN)  :  100,000+
AC Air condition Troubleshoot,Repair,Maintenance  :  5,000+
AE Bulletins  :  1,000+
Ae Allah na Dai (Rasa)  :  10,000+
50000 Free eBooks & Free AudioBooks  :  5,000,000+
Ag PhD Field Guide  :  10,000+
Ag PhD Deficiencies  :  10,000+
Ag PhD Planting Population Calculator  :  1,000+
Ag PhD Soybean Diseases  :  1,000+
Fertilizer Removal By Crop  :  50,000+
A-J Media Vault  :  50+
Al-Quran (Free)  :  10,000,000+
Al Quran (Tafsir & by Word)  :  500,000+
Al Quran Indonesia  :  10,000,000+
Al'Quran Bahasa Indonesia  :  10,000,000+
Al Quran Al karim  :  1,000,000+
Al-Muhaffiz  :  50,000+
Al Quran : EAlim - Translations & MP3 Offline  :  5,000,000+
Al-Quran 30 Juz free copies  :  500,000+
Koran Read &MP3 30 Juz Offline  :  1,000,000+
Hafizi Quran 15 lines per page  :  1,000,000+
Quran for Android  :  10,000,000+
Surah Al-Waqiah  :  100,000+
Hisnul Al Muslim - Hisn Invocations & Adhkaar  :  100,000+
Satellite AR  :  1,000,000+
Audiobooks from Audible  :  100,000,000+
Kinot & Eichah for Tisha B'Av  :  10,000+
AW Tozer Devotionals - Daily  :  5,000+
Tozer Devotional -Series 1  :  1,000+
The Pursuit of God  :  1,000+
AY Sing  :  5,000+
Ay Hasnain k Nana Milad Naat  :  10,000+
Ay Mohabbat Teri Khatir Novel  :  10,000+
Arizona Statutes, ARS (AZ Law)  :  1,000+
Oxford A-Z of English Usage  :  1,000,000+
BD Fishpedia  :  1,000+
BD All Sim Offer  :  10,000+
Youboox - Livres, BD et magazines  :  500,000+
B&H Kids AR  :  10,000+
B y H Niños ES  :  5,000+
Dictionary.com: Find Definitions for English Words  :  10,000,000+
English Dictionary - Offline  :  10,000,000+
Bible KJV  :  5,000,000+
Borneo Bible, BM Bible  :  10,000+
MOD Black for BM  :  100+
BM Box  :  1,000+
Anime Mod for BM  :  100+
NOOK: Read eBooks & Magazines  :  10,000,000+
NOOK Audiobooks  :  500,000+
NOOK App for NOOK Devices  :  500,000+
Browsery by Barnes & Noble  :  5,000+
bp e-store  :  1,000+
Brilliant Quotes: Life, Love, Family & Motivation  :  1,000,000+
BR Ambedkar Biography & Quotes  :  10,000+
BU Alsace  :  100+
Catholic La Bu Zo Kam  :  500+
Khrifa Hla Bu (Solfa)  :  10+
Kristian Hla Bu  :  10,000+
SA HLA BU  :  1,000+
Learn SAP BW  :  500+
Learn SAP BW on HANA  :  500+
CA Laws 2018 (California Laws and Codes)  :  5,000+
Bootable Methods(USB-CD-DVD)  :  10,000+
cloudLibrary  :  100,000+
SDA Collegiate Quarterly  :  500+
Sabbath School  :  100,000+
Cypress College Library  :  100+
Stats Royale for Clash Royale  :  1,000,000+
GATE 21 years CS Papers(2011-2018 Solved)  :  50+
Learn CT Scan Of Head  :  5,000+
Easy Cv maker 2018  :  10,000+
How to Write CV  :  100,000+
CW Nuclear  :  1,000+
CY Spray nozzle  :  10+
BibleRead En Cy Zh Yue  :  5+
CZ-Help  :  5+
Modlitební knížka CZ  :  500+
Guide for DB Xenoverse  :  10,000+
Guide for DB Xenoverse 2  :  10,000+
Guide for IMS DB  :  10+
DC HSEMA  :  5,000+
DC Public Library  :  1,000+
Painting Lulu DC Super Friends  :  1,000+
Dictionary  :  10,000,000+
Fix Error Google Playstore  :  1,000+
D. H. Lawrence Poems FREE  :  1,000+
Bilingual Dictionary Audio App  :  5,000+
DM Screen  :  10,000+
wikiHow: how to do anything  :  1,000,000+
Dr. Doug's Tips  :  1,000+
Bible du Semeur-BDS (French)  :  50,000+
La citadelle du musulman  :  50,000+
DV 2019 Entry Guide  :  10,000+
DV 2019 - EDV Photo & Form  :  50,000+
DV 2018 Winners Guide  :  1,000+
EB Annual Meetings  :  1,000+
EC - AP & Telangana  :  5,000+
TN Patta Citta & EC  :  10,000+
AP Stamps and Registration  :  10,000+
CompactiMa EC pH Calibration  :  100+
EGW Writings 2  :  100,000+
EGW Writings  :  1,000,000+
Bible with EGW Comments  :  100,000+
My Little Pony AR Guide  :  1,000,000+
SDA Sabbath School Quarterly  :  500,000+
Duaa Ek Ibaadat  :  5,000+
Spanish English Translator  :  10,000,000+
Dictionary - Merriam-Webster  :  10,000,000+
JW Library  :  10,000,000+
Oxford Dictionary of English : Free  :  10,000,000+
English Hindi Dictionary  :  10,000,000+
English to Hindi Dictionary  :  5,000,000+
EP Research Service  :  1,000+
Hymnes et Louanges  :  100,000+
EU Charter  :  1,000+
EU Data Protection  :  1,000+
EU IP Codes  :  100+
EW PDF  :  5+
BakaReader EX  :  100,000+
EZ Quran  :  50,000+
FA Part 1 & 2 Past Papers Solved Free – Offline  :  5,000+
La Fe de Jesus  :  1,000+
La Fe de Jesús  :  500+
Le Fe de Jesus  :  500+
Florida - Pocket Brainbook  :  1,000+
Florida Statutes (FL Code)  :  1,000+
English To Shona Dictionary  :  10,000+
Greek Bible FP (Audio)  :  1,000+
Golden Dictionary (FR-AR)  :  500,000+
Fanfic-FR  :  5,000+
Bulgarian French Dictionary Fr  :  10,000+
Chemin (fr)  :  1,000+
The SCP Foundation DB fr nn5n  :  1,000+

Books & Reference section seems to have variety of apps like eBook, Dictionary, programming language etc.


We also see successful apps built around Bible and Quran which suggests that taking a recent popular book and turning into an app will be profitable.

Instead of having a raw version of the book, we should add different features to make it more interesting. For example: quizzes, quote of the day, news and interviews regarding the book, audio version and an inbuild dictionary to name a few.

Conclusion

In this project, we analyzed data about Google Play and App Store to recommend the genre of apps which will be successful on both markets.

Baes on my analysis, I recommended my friend that taking a recent popular book and turning into an app will be profitable. In order to make the app more appealing to users, we should add different features like audio version of the book, daily quizzes from the book, link to news item or interviews about the book, in-built dictionary and a forum in order to encourage discussion about the book.




To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics