Finding Profitable App Genre
Recently, I was having a chat with a close friend of mine who is an avid app developer. After all the usual gossips our attention shifted to our professional side when my friend informed me that he started doing freelancing recently and asked if I could help him figure out the kind(Genre) of apps he could develop, in order to be profitable. I immediately agreed since I love to play with data and derive meaningful insights out of it. Below are the findings I share with my friend, whom happily agreed to share the same on a public forum so that it will be useful to other freelancers out there.
Finding Profitable App Genre - Google Play & App Store Market
My goal in this project is to find the genre(ex: Games, Books, News etc) of app's that will be profitable on both Google Play and the iOS App Store markets. This will help any freelance app developers to make better data driven decision on the kind of apps they would like to develop to be profitable.
We will analyze data about Google Play and App Store apps and try to find:
- Most common apps by genre on both the markets
- Most popular apps by genre on both the markets
Summary of Results
After analyzing the data, I found that taking a recent popular book and turning it into an app will be profitable on both the markets. I also recommend adding several features to the app. For more details, please refer to the full analysis below.
Data Set
Both Google Play and App Store have more than 2 million apps each. Gathering data for these 4+ million apps will be resource intense and hence I decided to analyze sample of data that I located on Kaggle.
- This data set contains data about 10,000+ Android apps on the Google Play.
- This data set contains data about 7,000+ iOS apps on the App Store.
Exploring the Data set
We will start by opening the two data set files that we are going to analyze.
In [1]:
#Opening the Google Play data set open_android = open("googleplaystore.csv") from csv import reader android_reader = reader(open_android) android = list(android_reader) #Opening the iOS App Store data set open_ios = open("AppleStore.csv") ios_reader = reader(open_ios) ios = list(ios_reader)
To make our analysis easier, let's write a small function named app_info that will return the app information we want for any data set. Apart from it, this function will also return the total number of app's in that particular data set if needed.
In [2]:
#Function to return app information def app_info(dataset, start, finish, length= False): for each_app in dataset[start:finish+1]: print(each_app) print("\n") if length == True: print("Total number of apps: ", len(dataset[1:]))
We will use the above function to print few apps from both the data set to see how the information looks like.
In [3]:
##Using app_info function to print Android apps app_info(android, 0, 3, True) ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Total number of apps: 10841
As we above, there are total of 10,841 Android apps. The app contains information like name, size, number of reviews, installs, average rating to name a few.
Now lets see how may iOS apps we have for analysis and see how it looks like.
In [4]:
#Using app_info function to print iOS apps app_info(ios, 0, 3, True) ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Total number of apps: 7197
We have 7197 iOS apps in our App Store data set and it contains information's like name, size, price, total number of ratings, average rating, genre etc.
One thing to note in App Store data set is that, it does not contain number of install information like we had for Google Play data set.
Data Cleaning
The most underrated process in a data analysis task is cleaning of data. Before we start analyzing our data, we need to be sure that the data is free of errors, misinformation's and duplicates. Else our analysis would be inaccurate and can lead to false conclusions. So it is very important that we take time to clean and make our data set ready for further analysis.
Remove Inaccurate Data
First step in the data cleaning process is to find out if there are app's in the data set that has any missing information.
In [5]:
##Find inaccurate app in Google Play for each in android[1:]: if len(each) != len(android[0]): print(each) print("length of this row: ", len(each)) print("Index of inaccurate app is: ", android.index(each)) print("\n") print(android[0]) print("length of the header: ", len(android[0])) ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] length of this row: 12 Index of inaccurate app is: 10473 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] length of the header: 13
As we see, information of the app named "Life Made WI-Fi Touchscreen Photo Frame" on the Google Play data set is inaccurate as it is missing one of the information. The app has only 12 columns against the 13 we have for the header of Google Play data set. So we better of remove this app as we don't know which information is missing.
In [6]:
## Removing "Life Made WI-Fi Touchscreen Photo Frame" app print("Number of apps before deleting: ", len(android[1:])) del android[10473] print("Number of apps after deleting: ",len(android[1:])) Number of apps before deleting: 10841 Number of apps after deleting: 10840
In [7]:
##Find inaccurate app information in App Store for each in ios[1:]: if len(each) != len(ios[0]): print(each) print("length of this row: ", len(each)) print("Index of inaccurate app is: ", ios.index(each)) print("\n") print(ios[0]) print("length of the header: ", len(ios[0]))
There is no app(s) on the App Store that is missing information.
Deleting Duplicate App(s)
Next step in data cleaning is to locate apps that occur more than ones and find a way to remove those duplicates. For example the app "Google Ads" occur thrice in Google Play data set as we see below.
In [8]:
for each_app in android[1:]: name = each_app[0] if name == "Google Ads": print(each_app) ['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up'] ['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up'] ['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
Now lets find how many duplicate entries are there in each of the data sets.
In [9]:
## Finding number of duplicate apps in the Google Play duplicate_app_google = [] unique_app_google = [] for each_row in android[1:]: name = each_row[0] if name in unique_app_google: duplicate_app_google.append(name) else: unique_app_google.append(name) print("Number of duplicate apps in Google Play: ", len(duplicate_app_google)) print("\n") print("Number of unique apps in Google Play: ", len(unique_app_google)) ## Finding number of duplicate apps in the App Store duplicate_app_apple = [] unique_app_apple = [] for each_row in ios[1:]: name = each_row[1] if name in unique_app_apple: duplicate_app_apple.append(name) else: unique_app_apple.append(name) print("\n") print("Number of duplicate apps in App Store: ", len(duplicate_app_apple)) print("\n") print("Number of unique apps in App Store: ", len(unique_app_apple)) Number of duplicate apps in Google Play: 1181 Number of unique apps in Google Play: 9659 Number of duplicate apps in App Store: 2 Number of unique apps in App Store: 7195
We can see that there are 1181 duplicate apps in the Google Play data set. App Store is much better and contains only 2 duplicate entries.
We need to keep only 1 entry per app and delete all the duplicate ones. If you see the "Google Ads" we printed above, there are 2 duplicates and the difference between the apps happen at 4th column which is the total number of reviews. So we can keep the app that has the highest number of reviews and delete the remaining ones. Higher the number of reviews the latest the information of the app should be.
We will also follow the same procedure for App Store data set to keep the unique apps.
Lets start by creating a dictionary where keys will be the unique app name and the values will be the highest review count for that app.
In [10]:
#Finding number of unique apps in Google Play google_unique_app = {} for each_row in android[1:]: name = each_row[0] reviews = float(each_row[3]) if name in google_unique_app and reviews > google_unique_app[name]: google_unique_app[name] = reviews elif name not in google_unique_app: google_unique_app[name] = reviews print("Number of unique Google apps extracted: ", len(google_unique_app)) Number of unique Google apps extracted: 9659
As we can see, the number of unique apps in the dictionary is 9659 which is same as what we found in the previous code cell.
Now let us use this dictionary to create a list, with all the rows of the unique apps having the highest number of reviews.
In [11]:
unique_google = [] already_added = [] for each_row in android[1:]: name = each_row[0] review = float(each_row[3]) if google_unique_app[name] == review and name not in already_added: unique_google.append(each_row) already_added.append(name) print("Number of Google Play apps :", len(unique_google)) Number of Google Play apps : 9659
Now lets do the same code for App Store data set and extract the unique app having the highest review. The number of reviews is found at column 6 in the App Store data set and hence we will modify our code accordingly.
In [12]:
apple_unique_app = {} for each_row in ios[1:]: name = each_row[1] reviews = float(each_row[5]) if name in apple_unique_app and reviews > apple_unique_app[name]: apple_unique_app[name] = reviews elif name not in apple_unique_app: apple_unique_app[name] = reviews print("Number of unique iOS apps extracted: ", len(apple_unique_app)) unique_apple = [] already_added = [] for each_row in ios[1:]: name = each_row[1] review = float(each_row[5]) if apple_unique_app[name] == review and name not in already_added: unique_apple.append(each_row) already_added.append(name) print("Number of App Store apps :", len(unique_apple)) Number of unique iOS apps extracted: 7195 Number of App Store apps : 7195
Removing Non-English Apps
There are several non-English apps in both the data set. For example:
['Cъновник BG', 'BOOKS_AND_REFERENCE', 'NaN', '13', '4.1M', '1,000+', 'Free', '0', 'Everyone', 'Books & Reference', 'January 21, 2017', '250', '4.0 and up']
['뽕티비 - 개인방송, 인터넷방송, BJ방송', 'VIDEO_PLAYERS', 'NaN', '414', '59M', '100,000+', 'Free', '0', 'Mature 17+', 'Video Players & Editors', 'July 18, 2018', '4.0.7', '4.0.3 and up']
['BL 女性向け恋愛ゲーム◆俺プリクロス', 'FAMILY', '4.2', '3379', '62M', '100,000+', 'Free', '0', 'Mature 17+', 'Simulation', 'March 23, 2017', '1.6.3', '2.3.3 and up']
My friend's project mainly caters to English speaking audience and hence it makes no sense to analyze app's that are not in English. So as part of data cleaning, we will remove all the non-English apps.
Each alphabet we type in computer corresponds to a number called "ASCII" value. The texts we normally use in English(letters, numbers, punctuations and other symbols) sits in the ASCII range of 0 - 127.
The logic is to go through each row on both the data sets and extract the name of the app and check if the ASCII value of each of their characters are > 127. if it's greater than 127 we will remove those apps.
However certain English apps still have some characters in their name whose ASCII value are greater than 127. For example:
['FlirtChat - ♥Free Dating/Flirting App♥', 'DATING', '4.3', '2433', '13M', '500,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 26, 2018', '12.0.4', '4.1 and up']
if you see the app above, we are sure that the app is English but the ASCII value of two ♥ characters we have in the name, is 9829. So going by our rule, our code will also remove apps like these, since one of the characters in the name field has ASCII value greater than 127.
So we will modify our code in such a way that, our program will remove the apps only if the app name have more than 3 ASCII characters whose values are greater than 127. By this way we can keep most of the English apps that have special characters in their names like we saw above.
In [13]:
##Removing non-English app from the Google Play google_english = [] for each_app in unique_google: name = each_app[0] count = 0 for each in name: if ord(each) > 127: count += 1 if count <= 3: google_english.append(each_app) print("Number of English apps in the Goole Play: ", len(google_english)) Number of English apps in the Goole Play: 9614
As we see, against the original 9659 apps we are left with 9614 apps after removing the non-English apps. Now lets do the same for apps in the App Store
In [14]:
##Removing non-English app from the App Store ios_english = [] for each_app in unique_apple: name = each_app[1] count = 0 for each in name: if ord(each) > 127: count += 1 if count <= 3: ios_english.append(each_app) print("Number of English apps in the App Store: ", len(ios_english)) Number of English apps in the App Store: 6181
So we have removed quite a few apps from the App Store and finally left with 6181 English apps.
Our code may still have few non English apps and there is also a possibility that we might have left few English apps, but that should not have any significant bearing in the outcome of our analysis.
Removing paid Apps
My friend wants to build only free apps since that will bring in more number of users against a paid one for which users expectations will be higher and also require significant maintenance costs. Also most of the freelancers out there are students and hence they prefer developing a free app.
So success of any free app depends up on the number of users downloads it and the revenue comes form the in-app ads. So more the number of users download and use the app, more is the revenue.
So for the reasons said above, we will remove the paid apps from both the data sets.
In [15]:
## Removing paid apps from the google data set android_final = [] for each_app in google_english: if each_app[7] == "0": android_final.append(each_app) print("Number of Android apps after cleaning: ", len(android_final)) ## Removing paid apps from the apple data set ios_final = [] for each_app in ios_english: if each_app[4] == "0.0": ios_final.append(each_app) print("Number of iOS apps after cleaning: ", len(ios_final)) Number of Android apps after cleaning: 8864 Number of iOS apps after cleaning: 3220
After removing the paid apps, we are left with 8864 Android apps and 3220 iOS apps.
With this, the data cleaning process is complete.
Data Analysis
As said in the introduction, our goal is to find the type of apps that will be profitable on both the markets. Since we are dealing only with free apps, the revenue of any free apps depends on the number of users using the app.
Most Common App by Genre
So lets start our data analysis by. Finding what genre of apps are very common on two markets. The genre information for Android is on index 9 and for iOS it is on index 11.
We will be creating two functions ; one to create a frequency table for any rows we want and the next function to sort and print the frequent table in descending order so that it will be readable.
In [16]:
## Function to create frequency table def freq_table(dataset, index): frequency_table = {} for each_app in dataset: column = each_app[index] if column in frequency_table: frequency_table[column] += 1 else: frequency_table[column] = 1 for each_item in frequency_table: frequency_table[each_item] /= len(dataset) frequency_table[each_item] *= 100 return frequency_table ## Function to sort the above frequency table in descending order def sort(dataset, index): frequency_table = freq_table(dataset, index) list_freq = [] for each in frequency_table: freq_temp = [frequency_table[each], each] list_freq.append(freq_temp) sort_freq = sorted(list_freq, reverse = True) for each in sort_freq: print(each[1], " : ", each[0])
In [17]:
sort(ios_final, 11) Games : 58.13664596273293 Entertainment : 7.888198757763975 Photo & Video : 4.968944099378882 Education : 3.6645962732919255 Social Networking : 3.291925465838509 Shopping : 2.608695652173913 Utilities : 2.515527950310559 Sports : 2.142857142857143 Music : 2.049689440993789 Health & Fitness : 2.018633540372671 Productivity : 1.7391304347826086 Lifestyle : 1.5838509316770186 News : 1.3354037267080745 Travel : 1.2422360248447204 Finance : 1.1180124223602486 Weather : 0.8695652173913043 Food & Drink : 0.8074534161490683 Reference : 0.5590062111801243 Business : 0.5279503105590062 Book : 0.43478260869565216 Navigation : 0.18633540372670807 Medical : 0.18633540372670807 Catalogs : 0.12422360248447205
As we see, among the Free English apps more than half(58 %) of them belongs to Games category, followed by Entertainment at a distant second (~ 8 %). Photo & Video takes 3rd place with ~ 9%.
So we can see that almost 70% of the apps belongs to fun category(Games, Entertainment, Photos).
Now lets take a look at Android market.
In [18]:
sort(android_final, 9) Tools : 8.449909747292418 Entertainment : 6.069494584837545 Education : 5.347472924187725 Business : 4.591606498194946 Productivity : 3.892148014440433 Lifestyle : 3.892148014440433 Finance : 3.7003610108303246 Medical : 3.531137184115524 Sports : 3.463447653429603 Personalization : 3.3167870036101084 Communication : 3.2378158844765346 Action : 3.1024368231046933 Health & Fitness : 3.0798736462093865 Photography : 2.944494584837545 News & Magazines : 2.7978339350180503 Social : 2.6624548736462095 Travel & Local : 2.3240072202166067 Shopping : 2.2450361010830324 Books & Reference : 2.1435018050541514 Simulation : 2.0419675090252705 Dating : 1.861462093862816 Arcade : 1.8501805054151623 Video Players & Editors : 1.7712093862815883 Casual : 1.7599277978339352 Maps & Navigation : 1.3989169675090252 Food & Drink : 1.2409747292418771 Puzzle : 1.128158844765343 Racing : 0.9927797833935018 Role Playing : 0.9363718411552346 Libraries & Demo : 0.9363718411552346 Auto & Vehicles : 0.9250902527075812 Strategy : 0.9138086642599278 House & Home : 0.8235559566787004 Weather : 0.8009927797833934 Events : 0.7107400722021661 Adventure : 0.6768953068592057 Comics : 0.6092057761732852 Beauty : 0.5979241877256317 Art & Design : 0.5979241877256317 Parenting : 0.4963898916967509 Card : 0.45126353790613716 Casino : 0.42870036101083037 Trivia : 0.41741877256317694 Educational;Education : 0.39485559566787 Board : 0.3835740072202166 Educational : 0.3722924187725632 Education;Education : 0.33844765342960287 Word : 0.2594765342960289 Casual;Pretend Play : 0.236913357400722 Music : 0.2030685920577617 Racing;Action & Adventure : 0.16922382671480143 Puzzle;Brain Games : 0.16922382671480143 Entertainment;Music & Video : 0.16922382671480143 Casual;Brain Games : 0.13537906137184114 Casual;Action & Adventure : 0.13537906137184114 Arcade;Action & Adventure : 0.12409747292418773 Action;Action & Adventure : 0.10153429602888085 Educational;Pretend Play : 0.09025270758122744 Simulation;Action & Adventure : 0.078971119133574 Parenting;Education : 0.078971119133574 Entertainment;Brain Games : 0.078971119133574 Board;Brain Games : 0.078971119133574 Parenting;Music & Video : 0.06768953068592057 Educational;Brain Games : 0.06768953068592057 Casual;Creativity : 0.06768953068592057 Art & Design;Creativity : 0.06768953068592057 Education;Pretend Play : 0.056407942238267145 Role Playing;Pretend Play : 0.04512635379061372 Education;Creativity : 0.04512635379061372 Role Playing;Action & Adventure : 0.033844765342960284 Puzzle;Action & Adventure : 0.033844765342960284 Entertainment;Creativity : 0.033844765342960284 Entertainment;Action & Adventure : 0.033844765342960284 Educational;Creativity : 0.033844765342960284 Educational;Action & Adventure : 0.033844765342960284 Education;Music & Video : 0.033844765342960284 Education;Brain Games : 0.033844765342960284 Education;Action & Adventure : 0.033844765342960284 Adventure;Action & Adventure : 0.033844765342960284 Video Players & Editors;Music & Video : 0.02256317689530686 Sports;Action & Adventure : 0.02256317689530686 Simulation;Pretend Play : 0.02256317689530686 Puzzle;Creativity : 0.02256317689530686 Music;Music & Video : 0.02256317689530686 Entertainment;Pretend Play : 0.02256317689530686 Casual;Education : 0.02256317689530686 Board;Action & Adventure : 0.02256317689530686 Video Players & Editors;Creativity : 0.01128158844765343 Trivia;Education : 0.01128158844765343 Travel & Local;Action & Adventure : 0.01128158844765343 Tools;Education : 0.01128158844765343 Strategy;Education : 0.01128158844765343 Strategy;Creativity : 0.01128158844765343 Strategy;Action & Adventure : 0.01128158844765343 Simulation;Education : 0.01128158844765343 Role Playing;Brain Games : 0.01128158844765343 Racing;Pretend Play : 0.01128158844765343 Puzzle;Education : 0.01128158844765343 Parenting;Brain Games : 0.01128158844765343 Music & Audio;Music & Video : 0.01128158844765343 Lifestyle;Pretend Play : 0.01128158844765343 Lifestyle;Education : 0.01128158844765343 Health & Fitness;Education : 0.01128158844765343 Health & Fitness;Action & Adventure : 0.01128158844765343 Entertainment;Education : 0.01128158844765343 Communication;Creativity : 0.01128158844765343 Comics;Creativity : 0.01128158844765343 Casual;Music & Video : 0.01128158844765343 Card;Action & Adventure : 0.01128158844765343 Books & Reference;Education : 0.01128158844765343 Art & Design;Pretend Play : 0.01128158844765343 Art & Design;Action & Adventure : 0.01128158844765343 Arcade;Pretend Play : 0.01128158844765343 Adventure;Education : 0.01128158844765343
Unlike App Store which are dominated by Fun apps, The Google Play store has balanced landscape with both fun(entertainment, lifestyle) and apps designed for productivity purposes(tools, education) in the top.
However, we also need to understand that on both Google Play and App Store markets, genres with most proportion may or may not have highest number of users(downloads) as well. The demand might not be same as supply.
App with most users
Now lets figure out the genre of apps with most users.
If you see the App Store data set there is no column that gives the information about the number of users/downloads. So what we will do instead is use column number 5, which is the total number of reviews which kind of gives us an idea about its popularity.
In [19]:
## Finding Apps of Genre with most users on App Store most_apps = {} temp_add = {} for each_app in ios_final: rating_count = float(each_app[5]) genre = each_app[11] if genre in most_apps: most_apps[genre] *= temp_add[genre] most_apps[genre] += rating_count temp_add[genre] += 1 most_apps[genre] /= temp_add[genre] else: most_apps[genre] = rating_count temp_add[genre] = 1 as_list = [] for each in most_apps: temp_list = [most_apps[each], each] as_list.append(temp_list) for each_app in sorted(as_list, reverse = True): print(each_app[1], " : ", each_app[0]) Navigation : 86090.33333333333 Reference : 74942.11111111111 Social Networking : 71548.34905660385 Music : 57326.5303030303 Weather : 52279.892857142855 Book : 39758.5 Food & Drink : 33333.92307692309 Finance : 31467.944444444445 Photo & Video : 28441.543749999993 Travel : 28243.8 Shopping : 26919.690476190477 Health & Fitness : 23298.015384615377 Sports : 23008.898550724636 Games : 22812.92467948712 News : 21248.023255813947 Productivity : 21028.410714285714 Utilities : 18684.456790123462 Lifestyle : 16485.764705882346 Entertainment : 14029.830708661419 Business : 7491.117647058824 Education : 7003.983050847459 Catalogs : 4004.0 Medical : 612.0
Navigation apps has the highest number of reviews on average. However after analysing further in detail, we could see that the average was due to Waze and Google Maps which has more than 95% of total reviews.
In [20]:
for each in ios_final: if each[11] == "Navigation": print(each[1], " :", each[5]) print("\n") Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
We notice the same trend in Social Networking and Music genre as well. Social Networking is dominated by Facebook and Pinterest and Music category is dominated by Pandora and Spotify. It is impossible for a freelancer to compete against these giants in this category and succeed, because all these are very successful products with millions of users in real time. Same thing goes for Food and Finance category which is dominated by giants in those field.
Reference category has 74942 reviews on average. On looking further, Bible and Dictionary tops the list with most reviews.
In [21]:
for each in ios_final: if each[11] == "Reference": print(each[1], " :", each[5]) print("\n") Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0
This category seems interesting. What we can do is take one or more popular book and turn into an app. We can add many different features like quiz, quotes, links to any news about the book on the web. We also can include audio version of the book.
Another interesting thing we can do is to have an in built Dictionary inside the app itself so the user don't have to exit the app to look for meaning.
Now lets find in Google Play the genre of apps with most users.
In [22]:
## Finding Apps of Genre with most users on Google Play most_apps = {} temp_add = {} for each_app in android_final: rating_count = each_app[5] rating_count = rating_count.replace("+", "") rating_count = rating_count.replace(",", "") rating_count = float(rating_count) genre = each_app[9] if genre in most_apps: most_apps[genre] *= temp_add[genre] most_apps[genre] += rating_count temp_add[genre] += 1 most_apps[genre] /= temp_add[genre] else: most_apps[genre] = rating_count temp_add[genre] = 1 as_list = [] for each in most_apps: temp_list = [most_apps[each], each] as_list.append(temp_list) for each_app in sorted(as_list, reverse = True): print(each_app[1], " : ", each_app[0]) Communication : 38456119.16724743 Adventure;Action & Adventure : 35333333.333333336 Video Players & Editors : 24947335.796178345 Social : 23253652.12711863 Arcade : 22888365.48780488 Casual : 19569221.602564108 Puzzle;Action & Adventure : 18366666.666666668 Photography : 17840110.40229885 Educational;Action & Adventure : 17016666.666666668 Productivity : 16787331.344927523 Racing : 15910645.681818193 Travel & Local : 14051476.145631064 Casual;Action & Adventure : 12916666.666666666 Action : 12603588.872727277 Strategy : 11199902.5308642 Tools : 10802461.246996008 Tools;Education : 10000000.0 Role Playing;Brain Games : 10000000.0 Lifestyle;Pretend Play : 10000000.0 Casual;Music & Video : 10000000.0 Card;Action & Adventure : 10000000.0 Adventure;Education : 10000000.0 News & Magazines : 9549178.467741935 Music : 9445583.333333334 Educational;Pretend Play : 9375000.0 Puzzle;Brain Games : 9280666.666666666 Word : 9094458.695652174 Racing;Action & Adventure : 8816666.666666666 Books & Reference : 8767811.894736838 Puzzle : 8302861.910000001 Video Players & Editors;Music & Video : 7500000.0 Shopping : 7036877.311557789 Role Playing;Action & Adventure : 7000000.0 Casual;Pretend Play : 6957142.857142857 Entertainment;Music & Video : 6413333.333333333 Action;Action & Adventure : 5888888.888888889 Entertainment : 5602792.775092941 Education;Brain Games : 5333333.333333333 Casual;Creativity : 5333333.333333333 Role Playing;Pretend Play : 5275000.0 Personalization : 5201482.612244898 Weather : 5074486.197183099 Sports;Action & Adventure : 5050000.0 Music;Music & Video : 5050000.0 Video Players & Editors;Creativity : 5000000.0 Adventure : 4922785.333333333 Simulation;Action & Adventure : 4857142.857142857 Education;Education : 4759517.0 Board : 4759209.117647059 Sports : 4596842.615635181 Educational;Brain Games : 4433333.333333333 Health & Fitness : 4188821.9853479844 Maps & Navigation : 4056941.7741935495 Entertainment;Creativity : 4000000.0 Role Playing : 3965645.421686747 Card : 3815462.5 Trivia : 3475712.7027027025 Simulation : 3475484.0883977907 Casino : 3427910.5263157897 Entertainment;Brain Games : 3314285.714285714 Arcade;Action & Adventure : 3190909.1818181816 Entertainment;Pretend Play : 3000000.0 Board;Action & Adventure : 3000000.0 Education;Creativity : 2875000.0 Entertainment;Action & Adventure : 2333333.3333333335 Educational;Creativity : 2333333.3333333335 Art & Design : 2122850.9433962265 Education;Music & Video : 2033333.3333333333 Food & Drink : 1924897.736363638 Education;Pretend Play : 1800000.0 Educational;Education : 1737143.142857143 Business : 1712290.1474201486 Casual;Brain Games : 1425916.6666666667 Lifestyle : 1412998.3449275375 Finance : 1387692.475609757 House & Home : 1331540.5616438356 Parenting;Music & Video : 1118333.3333333333 Strategy;Creativity : 1000000.0 Strategy;Action & Adventure : 1000000.0 Racing;Pretend Play : 1000000.0 Parenting;Brain Games : 1000000.0 Health & Fitness;Action & Adventure : 1000000.0 Entertainment;Education : 1000000.0 Education;Action & Adventure : 1000000.0 Casual;Education : 1000000.0 Arcade;Pretend Play : 1000000.0 Dating : 854028.8303030301 Comics : 831873.1481481482 Puzzle;Creativity : 750000.0 Auto & Vehicles : 647317.8170731709 Libraries & Demo : 638503.7349397589 Education : 550185.4430379759 Simulation;Pretend Play : 550000.0 Beauty : 513151.8867924528 Strategy;Education : 500000.0 Music & Audio;Music & Video : 500000.0 Communication;Creativity : 500000.0 Art & Design;Pretend Play : 500000.0 Parenting : 467977.5 Parenting;Education : 452857.14285714284 Educational : 411184.8484848485 Board;Brain Games : 407142.85714285716 Art & Design;Creativity : 285000.0 Events : 253542.22222222234 Medical : 120550.61980830679 Travel & Local;Action & Adventure : 100000.0 Puzzle;Education : 100000.0 Lifestyle;Education : 100000.0 Health & Fitness;Education : 100000.0 Art & Design;Action & Adventure : 100000.0 Comics;Creativity : 50000.0 Books & Reference;Education : 1000.0 Simulation;Education : 500.0 Trivia;Education : 100.0
Communication tops the list with average install of 38 million. However same like App Store market, most number of installs belong to , Skype, Messenger and few more apps which have more than 1 billion installs.
In [23]:
for each in android_final: if each[9] == "Communication": rating = each[5] rating = rating.replace("+", "") rating = rating.replace(",","") rating = float(rating) if rating > 100000000: print(each[0], " : ", each[5]) WhatsApp Messenger : 1,000,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Viber Messenger : 500,000,000+
So Communication Apps seems to be more popular than they really are. If you exclude the Communications apps with more than 10 million installs, the average installs reduce to 0.7 million.
In [24]:
#average rating by excluding > 10 million apps ratings = [] for each in android_final: if each[9] == "Communication": rating = each[5] rating = rating.replace("+", "") rating = rating.replace(",","") rating = float(rating) if rating < 10000000: ratings.append(rating) print("Average installs of communication apps ater excluding apps greater than 10 million installs: ",sum(ratings) / len(ratings)) Average installs of communication apps ater excluding apps greater than 10 million installs: 747172.3857142857
The Genre after Communication is Adventure;Action & Adventure which only have 3 apps which dont give much information for our analysis and hence we skip that.
In [25]:
for each in android_final: if each[9] == "Adventure;Action & Adventure": print(each[0], " : ", each[5]) Leo and Tig : 1,000,000+ Transformers Rescue Bots: Hero Adventures : 5,000,000+ ROBLOX : 100,000,000+
The other top Genres in the list, also follows similar pattern like Communications apps where only the few giants in the field have almost all the installs. For example and Google Play have more than 1 billion in the Video Players & Editors Genre.
In [26]:
ratings = [] for each in android_final: if each[9] == "Video Players & Editors": rating = each[5] rating = rating.replace("+", "") rating = rating.replace(",","") rating = float(rating) if rating > 100000000: print(each[0], " : ", each[5]) YouTube : 1,000,000,000+ Google Play Movies & TV : 1,000,000,000+ MX Player : 500,000,000+
Books & Reference genre have nearly 9 million installs. We would like to explore this in detail as we found this genre has some potential in App Store, and hence would like to see how it fairs in the Google Play market.`
In [27]:
for each in android_final: if each[9] == "Books & Reference": print(each[0], " : ", each[5]) E-Book Read - Read Book for free : 50,000+ Download free book with green book : 100,000+ Wikipedia : 10,000,000+ Cool Reader : 10,000,000+ Free Panda Radio Music : 100,000+ Book store : 1,000,000+ FBReader: Favorite Book Reader : 10,000,000+ English Grammar Complete Handbook : 500,000+ Free Books - Spirit Fanfiction and Stories : 1,000,000+ Google Play Books : 1,000,000,000+ AlReader -any text book reader : 5,000,000+ Offline English Dictionary : 100,000+ Offline: English to Tagalog Dictionary : 500,000+ FamilySearch Tree : 1,000,000+ Cloud of Books : 1,000,000+ Recipes of Prophetic Medicine for free : 500,000+ ReadEra – free ebook reader : 1,000,000+ Anonymous caller detection : 10,000+ Ebook Reader : 5,000,000+ Litnet - E-books : 100,000+ Read books online : 5,000,000+ English to Urdu Dictionary : 500,000+ eBoox: book reader fb2 epub zip : 1,000,000+ English Persian Dictionary : 500,000+ Flybook : 500,000+ All Maths Formulas : 1,000,000+ Ancestry : 5,000,000+ HTC Help : 10,000,000+ English translation from Bengali : 100,000+ Pdf Book Download - Read Pdf Book : 100,000+ Free Book Reader : 100,000+ eBoox new: Reader for fb2 epub zip books : 50,000+ Only 30 days in English, the guideline is guaranteed : 500,000+ Moon+ Reader : 10,000,000+ SH-02J Owner's Manual (Android 8.0) : 50,000+ English-Myanmar Dictionary : 1,000,000+ Golden Dictionary (EN-AR) : 1,000,000+ All Language Translator Free : 1,000,000+ Azpen eReader : 500,000+ URBANO V 02 instruction manual : 100,000+ Bible : 100,000,000+ C Programs and Reference : 50,000+ C Offline Tutorial : 1,000+ C Programs Handbook : 50,000+ Amazon Kindle : 100,000,000+ Aab e Hayat Full Novel : 100,000+ Aldiko Book Reader : 10,000,000+ Google I/O 2018 : 500,000+ R Language Reference Guide : 10,000+ Learn R Programming Full : 5,000+ R Programing Offline Tutorial : 1,000+ Guide for R Programming : 5+ Learn R Programming : 10+ R Quick Reference Big Data : 1,000+ V Made : 100,000+ Wattpad 📖 Free Books : 100,000,000+ Dictionary - WordWeb : 5,000,000+ Guide (for X-MEN) : 100,000+ AC Air condition Troubleshoot,Repair,Maintenance : 5,000+ AE Bulletins : 1,000+ Ae Allah na Dai (Rasa) : 10,000+ 50000 Free eBooks & Free AudioBooks : 5,000,000+ Ag PhD Field Guide : 10,000+ Ag PhD Deficiencies : 10,000+ Ag PhD Planting Population Calculator : 1,000+ Ag PhD Soybean Diseases : 1,000+ Fertilizer Removal By Crop : 50,000+ A-J Media Vault : 50+ Al-Quran (Free) : 10,000,000+ Al Quran (Tafsir & by Word) : 500,000+ Al Quran Indonesia : 10,000,000+ Al'Quran Bahasa Indonesia : 10,000,000+ Al Quran Al karim : 1,000,000+ Al-Muhaffiz : 50,000+ Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+ Al-Quran 30 Juz free copies : 500,000+ Koran Read &MP3 30 Juz Offline : 1,000,000+ Hafizi Quran 15 lines per page : 1,000,000+ Quran for Android : 10,000,000+ Surah Al-Waqiah : 100,000+ Hisnul Al Muslim - Hisn Invocations & Adhkaar : 100,000+ Satellite AR : 1,000,000+ Audiobooks from Audible : 100,000,000+ Kinot & Eichah for Tisha B'Av : 10,000+ AW Tozer Devotionals - Daily : 5,000+ Tozer Devotional -Series 1 : 1,000+ The Pursuit of God : 1,000+ AY Sing : 5,000+ Ay Hasnain k Nana Milad Naat : 10,000+ Ay Mohabbat Teri Khatir Novel : 10,000+ Arizona Statutes, ARS (AZ Law) : 1,000+ Oxford A-Z of English Usage : 1,000,000+ BD Fishpedia : 1,000+ BD All Sim Offer : 10,000+ Youboox - Livres, BD et magazines : 500,000+ B&H Kids AR : 10,000+ B y H Niños ES : 5,000+ Dictionary.com: Find Definitions for English Words : 10,000,000+ English Dictionary - Offline : 10,000,000+ Bible KJV : 5,000,000+ Borneo Bible, BM Bible : 10,000+ MOD Black for BM : 100+ BM Box : 1,000+ Anime Mod for BM : 100+ NOOK: Read eBooks & Magazines : 10,000,000+ NOOK Audiobooks : 500,000+ NOOK App for NOOK Devices : 500,000+ Browsery by Barnes & Noble : 5,000+ bp e-store : 1,000+ Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+ BR Ambedkar Biography & Quotes : 10,000+ BU Alsace : 100+ Catholic La Bu Zo Kam : 500+ Khrifa Hla Bu (Solfa) : 10+ Kristian Hla Bu : 10,000+ SA HLA BU : 1,000+ Learn SAP BW : 500+ Learn SAP BW on HANA : 500+ CA Laws 2018 (California Laws and Codes) : 5,000+ Bootable Methods(USB-CD-DVD) : 10,000+ cloudLibrary : 100,000+ SDA Collegiate Quarterly : 500+ Sabbath School : 100,000+ Cypress College Library : 100+ Stats Royale for Clash Royale : 1,000,000+ GATE 21 years CS Papers(2011-2018 Solved) : 50+ Learn CT Scan Of Head : 5,000+ Easy Cv maker 2018 : 10,000+ How to Write CV : 100,000+ CW Nuclear : 1,000+ CY Spray nozzle : 10+ BibleRead En Cy Zh Yue : 5+ CZ-Help : 5+ Modlitební knížka CZ : 500+ Guide for DB Xenoverse : 10,000+ Guide for DB Xenoverse 2 : 10,000+ Guide for IMS DB : 10+ DC HSEMA : 5,000+ DC Public Library : 1,000+ Painting Lulu DC Super Friends : 1,000+ Dictionary : 10,000,000+ Fix Error Google Playstore : 1,000+ D. H. Lawrence Poems FREE : 1,000+ Bilingual Dictionary Audio App : 5,000+ DM Screen : 10,000+ wikiHow: how to do anything : 1,000,000+ Dr. Doug's Tips : 1,000+ Bible du Semeur-BDS (French) : 50,000+ La citadelle du musulman : 50,000+ DV 2019 Entry Guide : 10,000+ DV 2019 - EDV Photo & Form : 50,000+ DV 2018 Winners Guide : 1,000+ EB Annual Meetings : 1,000+ EC - AP & Telangana : 5,000+ TN Patta Citta & EC : 10,000+ AP Stamps and Registration : 10,000+ CompactiMa EC pH Calibration : 100+ EGW Writings 2 : 100,000+ EGW Writings : 1,000,000+ Bible with EGW Comments : 100,000+ My Little Pony AR Guide : 1,000,000+ SDA Sabbath School Quarterly : 500,000+ Duaa Ek Ibaadat : 5,000+ Spanish English Translator : 10,000,000+ Dictionary - Merriam-Webster : 10,000,000+ JW Library : 10,000,000+ Oxford Dictionary of English : Free : 10,000,000+ English Hindi Dictionary : 10,000,000+ English to Hindi Dictionary : 5,000,000+ EP Research Service : 1,000+ Hymnes et Louanges : 100,000+ EU Charter : 1,000+ EU Data Protection : 1,000+ EU IP Codes : 100+ EW PDF : 5+ BakaReader EX : 100,000+ EZ Quran : 50,000+ FA Part 1 & 2 Past Papers Solved Free – Offline : 5,000+ La Fe de Jesus : 1,000+ La Fe de Jesús : 500+ Le Fe de Jesus : 500+ Florida - Pocket Brainbook : 1,000+ Florida Statutes (FL Code) : 1,000+ English To Shona Dictionary : 10,000+ Greek Bible FP (Audio) : 1,000+ Golden Dictionary (FR-AR) : 500,000+ Fanfic-FR : 5,000+ Bulgarian French Dictionary Fr : 10,000+ Chemin (fr) : 1,000+ The SCP Foundation DB fr nn5n : 1,000+
Books & Reference section seems to have variety of apps like eBook, Dictionary, programming language etc.
We also see successful apps built around Bible and Quran which suggests that taking a recent popular book and turning into an app will be profitable.
Instead of having a raw version of the book, we should add different features to make it more interesting. For example: quizzes, quote of the day, news and interviews regarding the book, audio version and an inbuild dictionary to name a few.
Conclusion
In this project, we analyzed data about Google Play and App Store to recommend the genre of apps which will be successful on both markets.
Baes on my analysis, I recommended my friend that taking a recent popular book and turning into an app will be profitable. In order to make the app more appealing to users, we should add different features like audio version of the book, daily quizzes from the book, link to news item or interviews about the book, in-built dictionary and a forum in order to encourage discussion about the book.