SlideShare a Scribd company logo
MACHINE LEARNING BASED RAINFALL PREDICTION
Abstract:
Machine learning and Feature Selection are playing a vital role in internet and health sector also.
Rainfall prediction is important as heavy rainfall can lead to many disasters. The prediction helps
people to take preventive measures and moreover the prediction should be accurate. There are
two types of prediction short term rainfall prediction and long term rainfall. Prediction mostly
short term prediction can gives us the accurate result. The main challenge is to build a model for
long term rainfall prediction. Heavy precipitation prediction could be a major drawback for earth
science department because it is closely associated with the economy and lifetime of human. It’s
a cause for natural disasters like flood and drought that square measure encountered by
individuals across the world each year. Accuracy of rainfall statement has nice importance for
countries like India whose economy is basically dependent on agriculture.
Rainfall prediction is the one of the important technique to predict the climatic conditions in any
country. This paper proposes a rainfall prediction model using LR & RF for dataset. The
input data is having multiple meteorological parameters and to predict the rainfall in more
precise. From the results, the proposed machine learning model provides better results than the
other algorithms in the literature. The goal of this project is to develop an appropriate machine
learning tool which can predict will be rain or not. The algorithm that can be used here are
Logistic Regression and Random Forest.
TABLE OF CONTENTS
CHAPTE
R NO. TITLE
PAGE
NO.
1.
CHAPTER 1 : INTRODUCTION
1.1 GENERAL
1.1.1 THE MACHINE LEARNING SYSTEM
1.1.2 FUNDAMENTAL
1.2 JUPYTER
1.3 MACHINE LEARNING
1.4 CLASSIFICATION TECHNIQUES
1.4.1 NEURAL NETWORK AND DEEP LEARNING
1.4.2 METHODOLOGIES - GIVEN INPUT AND EXPECTED
OUTPUT
1.5 OBJECTIVE AND SCOPE OF THE PROJECT
1.6 EXISTING SYSTEM
1.6.1 DISADVANTAGES OF EXISTING SYSTEM
1.6.2 LITERATURE SURVEY
1.7 PROPOSED SYSTEM
1.7.1 PROPOSED SYSTEM ADVANTAGES
4
6
9
12
12
13
17
17
2.
CHAPTER 2 :PROJECT DESCRIPTION
2.1 INTRODUCTION
2.2 DETAILED DIAGRAM
2.2.1 FRONT END DESIGN
2.2.2 BACK END FLOW
2.3 SOFTWARE SPECIFICATION
28
29
29
30
2.3.1 HARDWARE SPECIFICATION
2.3.2 SOFTWARE SPECIFICATION
2.4 MODULE DESCRIPTION
2.4.1 DATA COLLECTION
2.4.2 DATA AUGUMENTATION
2.4.3 DATA SPLITTING
2.4.4 CLASSIFICATION
2.4.5 PERFORMANCES MATRICES
2.4.6 CONFUSION MATRIX
2.5 MODULE DIAGRAM
2.5.1 SYSTEM ARCHITECTURE
2.5.2 USECASE DIAGRAM
2.5.3 CLASS DIAGRAM
2.5.4 ACTIVITY DIAGRAM
2.5.5 SEQUENCE DIAGRAM
2.5.6 STATE FLOW DIAGRAM
2.5.7 FLOW DIAGRAM
30
31
32
33
34
35
35
36
37
38
40
3.
CHAPTER 3 : SOFTWARE SPECIFICATION
3.1 GENERAL
3.2 ANACONDA
3.3 PYTHON
3.2.1 SCIENTIFIC AND NUMERIC COMPUTING
3.2.2 CREATING SOFTWARE PROTOTYPES
3.2.3 GOOD LANGUAGE TO TEACH PROGRAMMING
41
42
43
44
44
44
4.
CHAPTER 4 : IMPLEMENTATION
4.1 GENERAL
4.2 IMPLEMENTATION CODING
4.3 SNAPSHOTS
48
48
51
5.
CHAPTER 5 : CONCLUSION & REFERENCES
5.1 CONCLUSION
5.2 REFERENCES
55
56
CHAPTER I
INTRODUCTION
1.1 GENERAL
Glossary and Key Terms
This section provides a quick reference for several algorithms that are not explicity mentioned
in this chapter, but may be of interest to the reader. This should provide the reader with some
keywords or useful points of reference for other similar libraries to those discussed in this
chapter.
BIDMachGPU accelerated machine learning library for algorithms that are not necessarily
neural network based.
Caret provides a standardised API for many of the most useful machine learning packages for
R. For readers who are more comfortable with R, Caret provides a good substitute for Python’s
SciKit-Learn.
Mathematicais a commercial symbolic mathematical computation system, developed since
1988 by Wolfram, Inc. It provides powerful machine learning techniques “out of the box” such
as image classification [4].
MATLAB is short for MATrixLABoratory, which is a commercial numerical computing
environment, and is a proprietary programming language by MathWorks. It is very popular at
universities where it is often licensed. It was originally built on the idea that most computing
applications in some wayrely on storage and manipulations of one fundamental object—the
matrix, and this is still a popular approach.
-R is used extensively by the statistics community. The software package Caret provides a
standardised API for many of R’s machine learning libraries.
WEKA is short for the Waikato Environment for Knowledge Analysis [6] and has been a very
popular open source tool since its inception in 1993. In 2005Weka received the SIGKDD Data
Mining and Knowledge Discovery Service
Award: it is easy to learn and simple to use, and provides a GUI to many machine learning
algorithms.
VowpalWabbitMicrosoft’s machine learning library. Mature and actively developed, with an
emphasis on performance.
Requirements and Installation
The most convenient way of installing the Python requirements for this tutorial is by using the
Anaconda scientific Python distribution. Anaconda is a collection of the most commonly used
Python packages preconfigured and ready to use.
Approximately 150 scientific packages are included in the Anaconda installation.
Install the version of Anaconda for your operating system.
All Python software described here is available for Windows, Linux, and Macintosh. All code
samples presented in this tutorial were tested under Ubuntu Linux 14.04 using Python 2.7.
Some code examples may not work on Windows without slight modification (e.g. file paths in
Windows use  and not / as in
UNIX type systems).
The main software used in a typical Python machine learning pipeline can consist of almost any
combination of the following tools:
1. NumPy, for matrix and vector manipulation
2. Pandas for time series and R-like DataFrame data structures
3. The 2D plotting library matplotlib
4. SciKit-Learn as a source for many machine learning algorithms and utilities
5. Keras for neural networks and deep learning
Managing Packages
Anaconda comes with its own built in package manager, known as Conda. Using the conda
command from the terminal, you can download, update, and delete Python packages. Conda
takes care of all dependencies and ensures that packages are preconfigured to work with all other
packages you may have installed.
Keeping your Python distribution up to date and well maintained is essential in this fast moving
field. However, Anaconda makes it particularly easy to manage and keep your scientific stack up
to date. Once Anaconda is installed you can manage your Python distribution, and all the
scientific packages installed by Anaconda using the conda application from the command line.
To list all packages currently installed, use conda list. This will output all packages and their
version numbers. Updating all Anaconda packages in your system is performed using the conda
update -all command. Conda itself can be updated using the conda update conda command,
while Python can be updated using the conda update python command. To search for packages,
use the search parameter, e.g. conda search stats where stats is the name or partial name of the
package you are searching for.
OBJECTIVE AND SCOPE OF THE PROJECT
 The objective of this project is to show how sentimental analysis can help improve the
user experience over a social network or system interface.
 The learning algorithm will learn what our emotions are from statistical data then
perform sentiment analysis.
 Our main objective is also maintain accuracy in the final result.
 The main goal of such a sentiment analysis is to discover how the audience perceives the
television show. The Twitter data that is collected will be classified into two categories;
positive or negative. An analysis will then be performed on the classified data to investigate
what percentage of the audience sample falls into each category.
 Particular emphasis is placed on evaluating different machine learning algorithms for the
task of twitter sentiment analysis.
Jupiter
Jupyter, previously known as IPython Notebook, is a web-based, interactive development
environment. Originally developed for Python, it has since expanded to support over 40 other
programming languages including Julia and R.
Jupyter allows for notebooksto be written that contain text, live code, images, and equations.
These notebooks can be shared, and can even be hosted on GitHubfor free.
For each section of this tutorial, you can download a Juypter notebook that allows you to edit and
experiment with the code and examples for each topic. Jupyter is part of the Anaconda
distribution; it can be started from the command line using the jupyter command:
Machine Learning
We will now move on to the task of machine learning itself. In the following sections we will
describe how to use some basic algorithms, and perform regression, classification, and clustering
on some freely available medical datasets concerning breast cancer and diabetes, and we will
also take a look at a DNA microarray dataset.
SciKit-Learn
SciKit-Learn provides a standardised interface to many of the most commonly used machine
learning algorithms, and is the most popular and frequently used library for machine learning for
Python. As well as providing many learning algorithms, SciKit-Learn has a large number of
convenience functions for common preprocessing tasks (for example, normalisation or k-fold
cross validation).
SciKit-Learn is a very large software library.
Clustering
Clustering algorithms focus on ordering data together into groups. In general clustering
algorithms are unsupervised—they require no y response variable as input. That is to say, they
attempt to find groups or clusters within data where you do not know the label for each sample.
SciKit-Learn have many clusteringalgorithms, but in this section we will demonstrate
hierarchical clustering on a DNA expression microarray dataset using an algorithm from the
SciPy library.
We will plot a visualisation of the clustering using what is known as a dendrogram, also using
the SciPy library.
The goal is to cluster the data properly in logical groups, in this case into the cancer types
represented by each sample’s expression data. We do this using agglomerative hierarchical
clustering, using Ward’s linkage method:
Classification
weanalysed data that was unlabelled—we did not know to what class a sample belonged (known
as unsupervised learning). In contrast to this, a supervised problem deals with labelled data
where are aware of the discrete classes to which each sample belongs. When we wish to predict
which class a sample belongs to, we call this a classification problem. SciKit-Learn has a number
of algorithms for classification, in this section we will look at the Support Vector Machine.
We will work on the Wisconsin breast cancer dataset, split it into a training set and a test set,
train a Support Vector Machine with a linear kernel, and test the trained model on an unseen
dataset. The Support Vector Machine model should be able to predict if a new sample is
malignant or benign based on the features of a new, unseen sample:
You will notice that the SVM model performed very well at predicting the malignancy of new,
unseen samples from the test set—this can be quantified nicely by printing a number of metrics
using the classification report function. Here, the precision, recall, and F1 score (F1 = 2·
precision·recall/precision+recall) for each class is shown. The support column is a count of the
number of samples for each class.
Support Vector Machines are a very powerful tool for classification. They work well in high
dimensional spaces, even when the number of features is higher than the number of samples.
However, their running time is quadratic to the number of samples so large datasets can become
difficult to train. Quadratic means that if you increase a dataset in size by 10 times, it will take
100 times longer to train.
Last, you will notice that the breast cancer dataset consisted of 30 features. This makes it
difficult to visualize or plot the data. To aid in visualization of highly dimensional data, we can
apply a technique called dimensionality reduction.
Dimensionality Reduction
Another important method in machine learning, and data science in general, is dimensionality
reduction. For this example, we will look at the Wisconsin breast cancer dataset once again. The
dataset consists of over 500 samples, where each sample has 30 features. The features relate to
images of a fine needle aspirate of breast tissue, and the features describe the characteristics of
the cells present in the images. All features are real values. The target variable is a discrete value
(either malignant or benign) and is therefore a classification dataset.
You will recall from the Iris example in Sect. 7.3 that we plotted a scatter matrix of the data,
where each feature was plotted against every other feature in the dataset to look for potential
correlations (Fig. 3). By examining this plot you could probably find features which would
separate the dataset into groups. Because the dataset only had 4 features we were able to plot
each feature against each other relatively easily. However, as the numbers of features grow, this
becomes less and less feasible, especially if you consider the gene expression example in Sect.
9.4 which had over 6000 features.
One method that is used to handle data that is highly dimensional is Principle Component
Analysis, or PCA. PCA is an unsupervised algorithm for reducing the number of dimensions of a
dataset. For example, for plotting purposes you might want to reduce your data down to 2 or 3
dimensions, and PCA allows
you to do this by generating components, which are combinations of the original features, that
you can then use to plot your data.
PCA is an unsupervised algorithm. You supply it with your data, X, and you specify the number
of components you wish to reduce its dimensionality to. This is known as transforming the data:
Again, you would not use this model for new data—in a real world scenario, you would, for
example, perform a 10-fold cross validation on the dataset, choosing the model parameters that
perform best on the cross validation. This model would be much more likely to perform well on
new data. At the very least, you would randomly select a subset, say 30% of the data, as a test set
and train the model on the remaining 70% of the dataset. You would evaluate the model based on
the score on the test set and not on the training set
.
NEURAL NETWORKS AND DEEP LEARNING
While a proper description of neural networks and deep learning is far beyond the scope of this
chapter, we will however discuss an example use case of one of the most popular frameworks for
deep learning: Keras4.
In this section we will use Keras to build a simple neural network to classify theWisconsin breast
cancer dataset that was described earlier. Often, deep learning algorithms and neural networks
are used to classify images—convolutional neural networks are especially used for image related
classification. However,
they can of course be used for text or tabular-based data as well. In this we will build a standard
feed-forward, densely connected neural network and classify a text-based cancer dataset in order
to demonstrate the framework’susage.
In this example we are once again using the Wisconsin breast cancer dataset, which consists of
30 features and 569 individual samples. To make it more challenging for the neural network, we
will use a training set consisting of only 50% of the entire dataset, and test our neural network on
the remaining 50% of the data.
Note,Keras is not installed as part of the Anaconda distribution, to install it use pip:
Keras additionally requires either Theano or TensorFlow to be installed. In the examples in this
chapter we are using Theano as a backend, however the code will work identically for either
backend. You can install Theano using pip, but it has a number of dependencies that must be
installed first. Refer to the Theano and TensorFlow documentation for more information [12].
Keras is a modular API. It allows you to create neural networks by building a stack of modules,
from the input of the neural network, to the output of the neural network, piece by piece until you
have a complete network. Also, Keras can be configured to use your Graphics Processing Unit,
or GPU. This makes training neural networks far faster than if we were to use a CPU. We begin
by importing Keras:
We may want to view the network’s accuracy on the test (or its loss on the training set) over time
(measured at each epoch), to get a better idea how well it is learning. An epoch is one complete
cycle through the training data.
Fortunately, this is quite easy to plot as Keras’ fit function returns a history object which we can
use to do exactly this:
This will result in a plot similar to that shown. Often you will also want to plot the loss on the
test set and training set, and the accuracy on the test set and training set.
Plotting the loss and accuracy can be used to see if you are over fitting (you experience tiny loss
on the training set, but large loss on the test set) and to see when your training has plateaued.
Problem Statement:
Rainfall prediction is a beneficiary one, but it is a challenging task. Machine learning techniques
can use computational methods and predict rainfall by retrieving and integrating the hidden
knowledge from the linear and non-linear patterns of past weather data. Various tools and
methods for predicting rain are currently available, but there is still a shortage of accurate results.
Existing methods are failing whenever massive datasets are used for rainfall prediction.
OBJECTIVE:
Predicting rainfall is an application of science and technology for predicting the amount of rain
over an area. The most important thing is to accurately determine the rainfall for active use of
rainfall for water resources, crops, pre-planning of water resources and for agricultural purposes.
In earlier rainfall information benefits the farmers for better managing their crops and properties
from heavy rainfall. The farmers better manage to increase the economic growth of the country
by efficient rainfall information. Prediction of precipitation is necessary to save the life of
people’s and properties from flooding. Prediction of rainfall helps people in coastal areas by
preventing the floods.
SCOPE OF THE PROJECT:
The accurate and precise rainfall prediction is still lacking which could assist in diverse fields
like agriculture, water reservation and flood prediction. The issue is to formulate the calculations
for the rainfall prediction that would be based on the previous findings and similarities and will
give the output predictions that are reliable and appropriate. The imprecise and inaccurate
predictions are not only the waste of time but also the loss of resources and lead to inefficient
management of crisis like poor agriculture, poor water reserves and poor management of floods.
Therefore, the need is not to formulate only the rainfall predicting system but also a system that
is more accurate and precise as compared to the existing rainfall predictors.
EXISTING SYSTEM
Supervised learning is built to make prediction, given an unforeseen input instance. A supervised
learning algorithm takes a known set of input dataset and its known responses to the data
(output) to learn the regression/classification model. An algorithm is used to learn the dataset and
train it to generate the model for prediction of rainfall for the response to new data or test data.
Supervised learning uses classification algorithms and regression techniques to develop
predictive models.
1.NAIVE BAYES:
Naive Bayes classifiers calculate the probability of a sample to be of a certain category, based on
prior knowledge. They use the Naïve Bayes Theorem, that assumes that the effect of a certain
feature of a sample is independent of the other features. That means that each character of a
sample contributes independently to determine the probability of the classification of that
sample, outputting the category of the highest probability of the sample. In Bernoulli Naïve
Bayes the predictors are boolean variables. The parameters that we use to predict the class
variable take up only values yes or no.The basic idea of Naive Bayes technique is to find the
probabilities of classes assigned to texts by using the joint probabilities of words and classes.
2.LOGISTICREGRESSION:
Logistic regression is basically a supervised classification algorithm. In a classification problem,
the target variable(or output), y, can take only discrete values for given set of features(or inputs),
X. The logistic regression model described relationship between predictors that can be
continuous, binary, and categorical. Logistic regression becomes a classification technique only
when a decision threshold is brought into the picture. The setting of the threshold value is a very
important aspect of logistic regression and is dependent on the classification problem itself. It
predicts the probability that a given data entry belongs to the category numbered as “1”. Just like
Linear regression assumes that the data follows a linear function, Logistic regression models the
data using the sigmoid function.
1.1.1 DISADVANTAGES OF EXISTING SYSTEM
Methods have performance limitations because of wide range of variations in data and amount

of data is limited.
Issue involved in rainfall classification is choosing the required sampling recess of

Observation-Forecasting of rainfall, which is dependent upon the sampling interval of input data.
Less accuracy

LITERATURE SURVEY:
1. TITLE: PRDICTION OF RAINFALL USING MACHINE LEARNING
TECHNIQUES
Author: Moulana Mohammed, Roshitha Kolapalli, Niharika Golla, Siva Sai Maturi
YEAR: - 2020
Abstract:
Rainfall prediction is important as heavy rainfall can lead to many disasters. The prediction helps
people to take preventive measures and moreover the prediction should be accurate. There are
two types of prediction short term rainfall prediction and long term rainfall. Prediction mostly
short term prediction can gives us the accurate result. The main challenge is to build a model for
long term rainfall prediction. Heavy precipitation prediction could be a major drawback for earth
science department because it is closely associated with the economy and lifetime of human. It’s
a cause for natural disasters like flood and drought that square measure encountered by
individuals across the world each year. Accuracy of rainfall statement has nice importance for
countries like India whose economy is basically dependent on agriculture. The dynamic nature of
atmosphere, applied mathematics techniques fail to provide sensible accuracy for precipitation
statement. The prediction of precipitation using machine learning techniques may use regression.
Intention of this project is to offer non-experts easy access to the techniques, approaches utilized
in the sector of precipitation prediction and provide a comparative study among the various
machine learning techniques.
2. TITLE: RAINFALL PRDICTION USING ACHINE LEARNING
ALGORITHM
Author: Kumar Arun, Garg Ishan, Kaur Sanmeet
YEAR: - 2019
Abstract:
This paper introduces current supervised learning models which are based on machine learning
algorithm for Rainfall prediction in India. Rainfall is always a major issue across the world as it
affects all the major factor on which the human being is depended. In current, Unpredictable and
accurate rainfall prediction is a challenging task. We apply rainfall data of India to different
machine learning algorithms and compare the accuracy of classifiers such as SVM, Navie Bayes,
Logistic Regression, Random Forest and Multilayer Perceptron (MLP). Our motive if to get the
optimized result and a better rainfall prediction.
3. TITLE: A NEURAL NETWORK BASED LOCAL RAINFALL
PREDICTION
Author: Tomoa kiKashiwaoa, Koichi Nakayama, ShinAndo
YEAR: - 2017
Abstract:
In this study, we develop and test a local rainfall (precipitation) prediction system based on
artificial neural networks (ANNs). Our system can automatically obtain meteorological data used
for rainfall prediction from the Internet. Meteorological data from equipment installed at a local
point is also shared among users in our system. The final goal of the study was the practical use
of “big data” on the Internet as well as the sharing of data among users for accurate rainfall
prediction. We predicted local rainfall in regions of Japan using data from the Japan
Meteorological Agency (JMA). As neural network (NN) models for the system, we used a multi-
layer perceptron (MLP) with a hybrid algorithm composed of back-propagation (BP) and random
optimization (RO) methods, and radial basis function network (RBFN) with a least squares
method (LSM), and compared the prediction performance of the two models. Precipitation (total
amount of rainfall above 0.5 mm between 12:00 and 24:00 JST (Japan standard time)) at
Matsuyama, Sapporo, and Naha in 2012 was predicted by NNs using meteorological data for
each city from 2011. The volume of precipitation was also predicted (total amount above 1.0 mm
between 17:00 and 24:00 JST) at 16 points in Japan and compared with predictions by the JMA
in order to verify the universality of the proposed system. The experimental results showed that
precipitation in Japan can be predicted by the proposed method, and that the prediction
performance of the MLP model was superior to that of the RBFN model for the rainfall
prediction problem. However, the results were not better than those generated by the JMA.
Finally, heavy rainfall (above 10 mm/h) in summer (Jun.–Sep.) afternoons (12:00–24:00 JST) in
Tokyo in 2011 and 2012 was predicted using data for Tokyo between 2000 and 2010. The results
showed that the volume of precipitation could be accurately predicted and the caching rate of
heavy rainfall was high. This suggests that the proposed system can predict unexpected local
heavy rainfalls as “guerrilla rainstorms.”
4. TITLE: APPLICATION OF THE DEEP LEARNING FOR THE
PREDICTION OF RAINFALL IN SOUTHERN TAIWAN
Author: Meng-Hua Yen, Ding-Wei Liu, Yi-Chia Hsin, Chu-En Lin
YEAR: - 2018
Abstract:
Precipitation is useful information for assessing vital water resources, agriculture, ecosystems
and hydrology. Data-driven model predictions using deep learning algorithms are promising for
these purposes. Echo state network (ESN) and Deep Echo state network (DeepESN), referred to
as Reservoir Computing (RC), are effective and speedy algorithms to process a large amount of
data. In this study, we used the ESN and the DeepESN algorithms to analyze the meteorological
hourly data from 2002 to 2014 at the Tainan Observatory in the southern Taiwan. The results
show that the correlation coefficient by using the DeepESN was better than that by using the
ESN and commercial neuronal network algorithms (Back-propagation network (BPN) and
support vector regression (SVR), MATLAB, The MathWorks co.), and the accuracy of predicted
rainfall by using the DeepESN can be significantly improved compared with those by using
ESN, the BPN and the SVR. In sum, the DeepESN is a trustworthy and good method to predict
rainfall; it could be applied to global climate forecasts which need high-volume data processing.
5. TITLE: RAINFALL PREDICTION USING MACHINE LEARNING AND
NEURAL NETWORK
Author: Kaushik Dutta, Gouthaman. P
YEAR: - 2020
Abstract:
Rainfall prediction model mainly based on artificial neural networks have been proposed in India
until now. This research work does a comparative study of two rainfall prediction approaches
and finds the more accurate one. The present technique to predict rainfall doesn’t work well with
the complex data present. The approaches which are being used now-a-days are statistical
methods and numerical methods, which don’t work accurately when there is any non-linear
pattern. Existing system fails whenever the complexity of the datasets which contains past
rainfall increases. Henceforth, to find the best way to predict rainfall, study of both machine
learning and neural networks is performed and the algorithm which gives more accuracy is
further used in prediction. Recently, rainfall is considered the primary source of most of the
economy of our country. Agriculture is considered the main economy driven source. To do a
proper investment on agriculture, a proper estimation of rainfall is needed. Along with
agriculture, rainfall prediction is needed for the people in coastal areas. People in coastal areas
are in high risk of heavy rainfall and floods, so they should be aware of the rainfall much earlier
so that they can plan their stay accordingly. For areas which have less rainfall and faces water
scarcity should have rainwater harvesters, which can collect the rainwater. To establish a proper
rainwater harvester, rainfall estimation is required. Weather forecasting is the easiest and fastest
way to get a greater outreach. This research work can be used by all the weather forecasting
channels, so that the prediction news can be more accurate and can spread to all parts of the
country.
6. TITLE: STUDY OF SHORT TERM RAIN FORECASTING USING
MACHINE LEARNING BASED APPROACH
Author: M. S. Balamurugan & R. Manojkumar
YEAR: - 2019
Abstract:
Weather forecasting has been still dependent on statistical and numerical analysis in most part of
the world. Though statistical and numerical analysis provides better results, it highly depends on
stable historical relationships with the predict and predicting value of the predict and at a future
time. On the other hand, machine learning explores new algorithmic approaches in prediction
which is based on data-driven prediction. Climatic changes for a location are dependent on
variable factors like temperature, precipitation, atmospheric pressure, humidity, wind speed and
combination of other such factors which are variable in nature. Since climatic changes are
location-based statistical and numerical approaches result in failure at times and needs an
alternate method like machine learning based study of understanding about the weather forecast.
In this study it has been observed that percentage in departure of rainfall has been ranging from
46 to 91% for the month of June 2019 as per Indian Meteorological Department (IMD) by using
the traditional forecasting methods, but whereas based on the following study implemented using
machine learning it has been observed that forecast was able to achieve much better rainfall
prediction comparative to statistical methods.
1.1 PROPOSED SYSTEM
In proposed work, Regression analysis: Regression analysis deals with the dependence of one
variable (called as dependent variable) on one or more other variables, (called as independent
variables) which is useful for estimating and/ or predicting the mean or average value of the
former in terms of known or fixed values of the latter. For example, the salary of a person is
based on his/her experience here, the experience attribute is independent variable salary is
dependent variable. Simple linear regression defines the relationship between a single dependent
variable and a single independent variable. The below equation is the general form of regression.
y = β0 + β1x + ε where β0 and β1 are parameters, and ε is a probabilistic error term. Regression
analysis is a vital tool for modeling and analyzing information. It is used for predictive analysis
that is forecasting of rainfall or weather, predicting trends in business, finance, and marketing. It
can also be used for correcting errors and also provide quantitative support. The advantages of
regression analysis are:
1. It is a powerful technique for testing relationship between one dependent variable and many
independent variables.
2. It allows researchers to control extraneous factors.
3. Regression asses the cumulative effect of multiple factors.
4. It also helps to attain the measure of error using the regression line as a base for estimations.
ARCHITECTURE FOR PROPOSED SYSTEM:
Proposed approach :
The back-propagation technique works well with less complex system, but as the complexity of
the system increases back propagation method’s accuracy decreases. This process deals with four
types of inputs and three types of outputs layers. Following are the four-input layer used:
1. Air temperature
2. Air humidity
3. Wind speed
4. Sunshine duration
Following are the output layers used:
1. Rainfall
2. Medium rainfall
3. High rainfall
Steps associated with the proposed system are input of data, preprocess of data, splitting of data,
training of the algorithm, testing of the dataset, comparing both the algorithm, giving the best
algorithm, prediction with the more accurate algorithm and result at the end. The main reason for
not doing prediction with both the algorithm is to reduce the complexities of the whole system,
so the system first finds the most accurate algorithm between machine learning and neural
network and accordingly does prediction with the better one. The result will be received in the
form of graphs and excel sheets. For preprocess , all the result will be received in the form of
different graphs and for machine learning and neural network , the accuracy will be received in
the form of Metrics as well as excel sheet and accordingly the predicted value will be received in
the form of excel sheet which will contain two columns ID and predicted value. IDs will be same
as that of in the datasheet. To get for which region prediction is being done, IDs should be
matched with the IDs present in dataset.
PROPOSED SYSTEM ADVANTAGES
 Speed and very low complexity, which makes it very well suited to operate on real
scenarios.
 Computation load needed for image processing purpose is much reduced, combined with
very simple classifiers..
 Ability to learn and extract complex image features.
 With its simplicity and fast processing time, the proposed algorithm is suitable to be
implemented in embedded system or mobile application that has limited processing resources
CHAPTER 2
PROJECT DESCRIPTION
2.1 INTRODUCTION
In today’s situation, rainfall is considered to be one of the sole responsible factors for
most of the significant things across the world. In India, agriculture is considered to be one of the
important factors for deciding the economy of the country and agriculture is solely dependent on
rainfall. Apart From that in the coastal areas across the world, getting to know the amount of
rainfall is very much necessary. In some of the areas which have water scarcity, to establish rain
water harvester, prior prediction of the rainfall should be done.
This project deals with the prediction of rainfall using machine learning & neural networks. The
project performs the comparative study of machine learning approaches and neural network
approaches then accordingly portrays the efficient approach for rainfall prediction. First of all,
preprocess is performed When it comes to machine learning, LASSO regression is being used
and for neural network, ANN (Artificial neural network) approach is being used. After
calculation, types of errors, accuracy of both LASSO and ANN has been compared and
accordingly conclusion has been made. To reduce the systems complexity, the prediction has
been done with the approach that has better accuracy. The prediction has been done using the
dataset which contains rainfall data from year 1901 to 2015 for different regions across the
country.
It contains month wise data as well as annual rainfall data for the same. Currently, rainfall
prediction has become one of the key factors for most of the water conservation systems in and
across country. One of the biggest challenges is the complexity present in rainfall data. Most of
the rainfall prediction system, nowadays are unable to find the hidden layers or any non-linear
patterns present in the system. This project will assist to find all the hidden layers as well as non-
linear patterns, which is useful for performing the precise prediction of rainfall [1].
Rainfall prediction is the application to predict the rainfall in a given region. It can be done in
two types. The first is to analyze the physical law that affects rainfall and the second one is to
make a system which will discover hidden patterns or the features that affects the physical
factors and the process involved in achieving it. The second one is better because it doesn’t
include any type of mathematical calculations and can be useful for complex and non-linear data
[2]. Due to presence of the system which doesn’t find the hidden layers and nonlinear patterns
accurately, the prediction results to be wrong for most of the times and that may lead to huge
losses. So, the main objective for this research work is to find a system that can resolve both the
issues i.e. able to find complexity as well as hidden layers present, which will give proper and
accurate prediction thereby assisting the country to develop when it comes to agriculture and
economy.
2.2 DETAILED DIAGRAM
2.2.1 Back End Module Diagrams:
FRONT END:
2.3 SYSTEM SPECIFICATION:
2.3.1 HARDWARE REQUIREMENTS:
The hardware requirements may serve as the basis for a contract for the implementation of the
system and should therefore be a complete and consistent specification of the whole system.
They are used by software engineers as the starting point for the system design. It shows what
the system does and not how it should be implemented
PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 40 GB
2.3.2 SOFTWARE REQUIREMENTS:
The software requirements document is the specification of the system. It should include
both a definition and a specification of requirements. It is a set of what the system should
do rather than how it should do it. The software requirements provide a basis for creating
the software requirements specification. It is useful in estimating cost, planning team
activities, performing tasks and tracking the team’s and tracking the team’s progress
throughout the development activity.
PYTHON IDE : Anaconda Jupyter Notebook
PROGRAMMING LANGUAGE : Python
MODULES:
DATASET
The dataset used in this system contains the rainfall of several regions in and across the country.
It contains rainfall from 1901 – 2015 for the same. Along with that annual rainfall is also been
used and the rainfall between the transition of two months. There are in total 4116 rows present
in the dataset. The dataset is been collected from data.gov.in. Category – Rainfall in India
Released under – NDSAP Contributor – Ministry of Earth Sciences, IMD Group – Rainfall
Sectors – Atmosphere science, earth sciences, science & technology.
DATA CLEANING:
In this module the data is cleaned. After cleaning of the data, the data is grouped as per
requirement. This grouping of data is known as data clustering. Then check if there is any
missing value in the data set or not. It there is some missing value then change it by any default
value. After that if any data need to change its format, it is done. That total process before the
prediction is known is data pre-processing. After that the data is used for the prediction and
forecasting step.
Data Prediction and forecasting:
In this step, the pre-processed data is taken for the prediction. This prediction can be done in any
process which are mentioned above. But the Linear Regression algorithm score more prediction
accuracy than the other algorithm. So, in this project the linear regression method is used for the
prediction. For that, the pre-processed data is splitted for the train and test purpose. Then a
predictive object is created to predict the test value which is trained by the trained value. Then
the object is used to forecast data for next few years.
DATA SPLITTING:
For each experiment, we split the entire dataset into 70% training set and 30% test set. We
used the training set for resampling, hyper parameter tuning, and training the model and we used
test set to test the performance of the trained model. While splitting the data, we specified a
random seed (any random number), which ensured the same data split every time the program
executed.
TRAINING AND TESTING:
Algorithms learn from data. They find relationships, develop understanding, make decisions, and
evaluate their confidence from the training data they’re given. And the better the training data is,
the better the model performs.
In fact, the quality and quantity of your training data has as much to do with the success of your
data project as the algorithms themselves.
Now, even if you’ve stored a vast amount of well-structured data, it might not be labeled in a
way that actually works for training your model. For example, autonomous vehicles don’t just
need pictures of the road, they need labeled images where each car, pedestrian, street sign and
more are annotated; sentiment analysis projects require labels that help an algorithm understand
when someone’s using slang or sarcasm; chatbots need entity extraction and careful syntactic
analysis, not just raw language.
In other words, the data you want to use for training usually needs to be enriched or labeled. Or
you might just need to collect more of it to power your algorithms. But chances are, the data
you’ve stored isn’t quite ready to be used to train your classifiers.
Because if you’re trying to make a great model, you need great training data. And we know a
thing or two about that. After all, we’ve labeled over 5 billion rows of data for some of the most
innovative companies in the world. Whether it’s images, text, audio, or, really, any other kind of
data, we can help create the training set that makes your models successful.
REGRESSION:
Random Forest:-
Random forest is a supervised learning algorithm. The "forest" it builds, is an
ensemble of decision trees, usually trained with the “bagging” method. The general idea of the
bagging method is that a combination of learning models increases the overall result.
Put simply: random forest builds multiple decision trees and merges them together to get a
more accurate and stable prediction.
One big advantage of random forest is that it can be used for both classification and regression
problems, which form the majority of current machine learning systems. Let's
look at random forest in classification, since classification is sometimes considered the
building block of machine learning. Below you can see how a random forest would look like
with two trees:
Random forest has nearly the same hyper parameters as a decision tree or a bagging classifier.
Fortunately, there's no need to combine a decision tree with a bagging classifier because
you can easily use the classifier-class of random forest. With random forest, you can also deal
with regression tasks by using the algorithms repressor.
Random forest adds additional randomness to the model, while growing the trees. Instead of
searching for the most important feature while splitting a node, it searches for the best feature
among a random subset of features. This results in a wide diversity that generally results in a
better model. Therefore, in random forest, only a random subset of the features is taken into
consideration by the algorithm for splitting a node. You can even make trees more random by
additionally using random thresholds for each feature rather than searching for the best possible
thresholds (like a normal decision tree does).
Logistic Regression:
It is a classification not a regression algorithm. It is used to estimate discrete values (Binary
values like 0/1, yes/no, true/false) based on given set of independent variable(s). In simple
words, it predicts the probability of occurrence of an event by fitting data to a logit function.
Hence, it is also known as logit regression. Since, it predicts the probability, its output values lies
between 0 and 1 (as expected). Mathematically, the log odds of the outcome are modelled as a
linear combination of the predictor variables.
Odds = p/(1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p)=ln(p/(1-p))= b0+b1X1+b2X2+b3X3....+bkXk
As we are classifying text on the basis of a wide feature set, with a binary output (true/false or
true article/fake article), a logistic regression (LR) model is used, since it provides the intuitive
equation to classify problems into binary or multiple classes. We performed hyperparameters
tuning to get the best result for all individual datasets, while multiple parameters are tested
before acquiring the maximum accuracies from LR model.
CONFUSION MATRIX:
It is the most commonly used evaluation metrics in predictive analysis mainly because it is very.
Easy to understand and it can be used to compute other essential metrics such as accuracy, recall,
Precision, etc. It is an NxN matrix that describes the overall performance of a model when used
on some dataset, where N is the number of class labels in the classification problem.
PERFORMANCE EVALUATION:
ACCURACY:
Though the train accuracy proved to be good and peaked 90% accurate, validation results seems
not satisfying. The model has shown good results for training data than the test sample. Our
model is yielding better results with the training set rather than the test set. This particular result
is occurring due to overfitting of the test data. A model with no preprocessed data can cause such
overfitting events to occur. Hence, at certain events the classifier can be subject to overfitting the
test data.
Model Loss
The loss function that we considered was the binary cross entropy. When we use this function,
the trained set can be viewed to be improving in the overall loss of the set but in reality, the test
data suggests otherwise. The test data is actually increasing in loss when compared to trained
sample. The increase in loss can be attributed to the overfitting of the data. In the above Figure
A.3, as more epochs considered for the trained set, the loss decreases with the set. While, the
tested set starts with a lower loss value but with the more epochs being considered, the loss of
the tested set actually increases. This illustrates the drawback of the architecture and
methodology that is being used. At 10 epochs, the loss both the sets are almost equal. After 10
epochs, the loss of the training set linearly decreases while the loss of the test set gradually
increases. The visual representation and the statistics both confirm the overfitting of the data sets.
However, this can all be fixed when we perform normalization, preprocessing, and adding
dropout layers. After adding dropout layer and normalizing feature set
2.5 SYSTEM DESIGN:
Designing of system is the process in which it is used to define the interface, modules and data for a
system to specified the demand to satisfy. System design is seen as the application of the system theory.
The main thing of the design a system is to develop the system architecture by giving the data and
information that is necessary for the implementation of a system.
SYSTEM ARCHITECTURE:
USECASE DIAGRAM:
Use case diagrams are a way to capture the system's functionality and requirements in
UML diagrams. It captures the dynamic behavior of a live system. A use case diagram
consists of a use case and an actor
DETAILED ARCHITECTURE FLOW
CLASS DIAGRAM:
Class diagrams are the main building block in object-oriented modeling. They are
used to show the different objects in a system, their attributes, their operations and the
relationships among them. The different objects are Data owner, Cloud user, Cloud
admin these are the objects in this uml relationships and their properties are uploading the
documents, generating key for securing the data, maintaining the cloud data s then
downloading using the key and accessing the cloud data.
STATE DIAGRAM:
A state diagram, also known as a state machine diagram or state chart diagram, is
an illustration of the states an object can attain as well as the transitions between
those states in the Unified Modeling Language. Then, all of the possible
existing states are placed in relation to the beginning and the end.
ACTIVITY DIAGRAM:
Activity Diagrams describe how activities are coordinated to provide a service which can
be at different levels of abstraction. Typically, an event needs to be achieved by some
operations, particularly where the operation is intended to achieve a number of different
things that require coordination
SEQUENCE DIAGRAM:
A sequence diagram is a type of interaction diagram because it describes how and
in what order a group of objects works together. These diagrams are used by software
developers and business professionals to understand requirements for a new system or to
document an existing process.
DATA FLOW DIAGRAM:
Data flow diagrams are used to graphically represent the flow of data in a business
information system. DFD describes the processes that are involved in a system to transfer
data from the input to the file storage and reports generation. Data flow diagrams can be
divided into logical and physical. The logical data flow diagram describes flow of data
through a system to perform certain functionality of a business. The physical data flow
diagram describes the implementation of the logical data flow.
STATE DIAGRAM:
MODULES:
DATASET
The dataset used in this system contains the rainfall of several regions in and across the country.
It contains rainfall from 1901 – 2015 for the same. Along with that annual rainfall is also been
used and the rainfall between the transition of two months. There are in total 4116 rows present
in the dataset. The dataset is been collected from data.gov.in. Category – Rainfall in India
Released under – NDSAP Contributor – Ministry of Earth Sciences, IMD Group – Rainfall
Sectors – Atmosphere science, earth sciences, science & technology.
DATA CLEANING:
In this module the data is cleaned. After cleaning of the data, the data is grouped as per
requirement. This grouping of data is known as data clustering. Then check if there is any
missing value in the data set or not. It there is some missing value then change it by any default
value. After that if any data need to change its format, it is done. That total process before the
prediction is known is data pre-processing. After that the data is used for the prediction and
forecasting step.
Data Prediction and forecasting:
In this step, the pre-processed data is taken for the prediction. This prediction can be done in any
process which are mentioned above. But the Linear Regression algorithm score more prediction
accuracy than the other algorithm. So, in this project the linear regression method is used for the
prediction. For that, the pre-processed data is splitted for the train and test purpose. Then a
predictive object is created to predict the test value which is trained by the trained value. Then
the object is used to forecast data for next few years.
DATA SPLITTING:
For each experiment, we split the entire dataset into 70% training set and 30% test set. We
used the training set for resampling, hyper parameter tuning, and training the model and we used
test set to test the performance of the trained model. While splitting the data, we specified a
random seed (any random number), which ensured the same data split every time the program
executed.
TRAINING AND TESTING:
Algorithms learn from data. They find relationships, develop understanding, make decisions, and
evaluate their confidence from the training data they’re given. And the better the training data is,
the better the model performs.
In fact, the quality and quantity of your training data has as much to do with the success of your
data project as the algorithms themselves.
Now, even if you’ve stored a vast amount of well-structured data, it might not be labeled in a
way that actually works for training your model. For example, autonomous vehicles don’t just
need pictures of the road, they need labeled images where each car, pedestrian, street sign and
more are annotated; sentiment analysis projects require labels that help an algorithm understand
when someone’s using slang or sarcasm; chatbots need entity extraction and careful syntactic
analysis, not just raw language.
In other words, the data you want to use for training usually needs to be enriched or labeled. Or
you might just need to collect more of it to power your algorithms. But chances are, the data
you’ve stored isn’t quite ready to be used to train your classifiers.
Because if you’re trying to make a great model, you need great training data. And we know a
thing or two about that. After all, we’ve labeled over 5 billion rows of data for some of the most
innovative companies in the world. Whether it’s images, text, audio, or, really, any other kind of
data, we can help create the training set that makes your models successful.
REGRESSION:
Random Forest:-
Random forest is a supervised learning algorithm. The "forest" it builds, is an
ensemble of decision trees, usually trained with the “bagging” method. The general idea of the
bagging method is that a combination of learning models increases the overall result.
Put simply: random forest builds multiple decision trees and merges them together to get a
more accurate and stable prediction.
One big advantage of random forest is that it can be used for both classification and regression
problems, which form the majority of current machine learning systems. Let's
look at random forest in classification, since classification is sometimes considered the
building block of machine learning. Below you can see how a random forest would look like
with two trees:
Random forest has nearly the same hyper parameters as a decision tree or a bagging classifier.
Fortunately, no need to combine a decision tree with a bagging classifier because you can easily
use the classifier-class of random forest. With random forest, you can also deal with regression
tasks by using the algorithms repressor.
Random forest adds additional randomness to the model, while growing the trees. Instead of
searching for the most important feature while splitting a node, it searches for the best feature
among a random subset of features. This results in a wide diversity that generally results in a
better model. Therefore, in random forest, only a random subset of the features is taken into
consideration by the algorithm for splitting a node. You can even make trees more random by
additionally using random thresholds for each feature rather than searching for the best possible
thresholds (like a normal decision tree does).
CONFUSION MATRIX:
It is the most commonly used evaluation metrics in predictive analysis mainly because it is very.
Easy to understand and it can be used to compute other essential metrics such as accuracy, recall,
Precision, etc. It is an NxN matrix that describes the overall performance of a model when used
on some dataset, where N is the number of class labels in the classification problem.
PERFORMANCE EVALUATION:
ACCURACY:
Though the train accuracy proved to be good and peaked 90% accurate, validation results seems
not satisfying. The model has shown good results for training data than the test sample. Our
model is yielding better results with the training set rather than the test set. This particular result
is occurring due to overfitting of the test data. A model with no preprocessed data can cause such
overfitting events to occur. Hence, at certain events the classifier can be subject to overfitting the
test data.
Model Loss
The loss function that we considered was the binary cross entropy. When we use this function,
the trained set can be viewed to be improving in the overall loss of the set but in reality, the test
data suggests otherwise. The test data is actually increasing in loss when compared to trained
sample. The increase in loss can be attributed to the overfitting of the data. In the above Figure
A.3, as more epochs considered for the trained set, the loss decreases with the set. While, the
tested set starts with a lower loss value but with the more epochs being considered, the loss of
the tested set actually increases. This illustrates the drawback of the architecture and
methodology that is being used. At 10 epochs, the loss both the sets are almost equal. After 10
epochs, the loss of the training set linearly decreases while the loss of the test set gradually
increases. The visual representation and the statistics both confirm the overfitting of the data sets.
However, this can all be fixed when we perform normalization, preprocessing, and adding
dropout layers. After adding dropout layer and normalizing feature set.
CHAPTER 3
SOFTWARE SPECIFICATION
3.1 GENERAL
ANACONDA
It is a free and open-source distribution of the Python and R programming languages for
scientific computing (data science, machine learning applications, large-scale data processing,
predictive analytics, etc.), that aims to simplify package management and deployment.
Anaconda distribution comes with more than 1,500 packages as well as
the Conda package and virtual environment manager. It also includes a GUI, Anaconda
Navigator, as a graphical alternative to the Command Line Interface (CLI).
The big difference between Conda and the pip package manager is in how package dependencies
are managed, which is a significant challenge for Python data science and the reason Conda
exists. Pip installs all Python package dependencies required, whether or not those conflict with
other packages you installed previously.
So your working installation of, for example, Google Tensorflow, can suddenly stop working
when you pip install a different package that needs a different version of the Numpy library.
More insidiously, everything might still appear to work but now you get different results from
your data science, or you are unable to reproduce the same results elsewhere because you didn't
pip install in the same order.
Conda analyzes your current environment, everything you have installed, any version limitations
you specify (e.g. you only want tensorflow>= 2.0) and figures out how to install compatible
dependencies. Or it will tell you that what you want can't be done. Pip, by contrast, will just
install the thing you wanted and any dependencies, even if that breaks other things.Open source
packages can be individually installed from the Anaconda repository, Anaconda Cloud
(anaconda.org), or your own private repository or mirror, using the conda install command.
Anaconda Inc compiles and builds all the packages in the Anaconda repository itself, and
provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit. You can also install
anything on PyPI into a Conda environment using pip, and Conda knows what it has installed
and what pip has installed. Custom packages can be made using the conda build command, and
can be shared with others by uploading them to Anaconda Cloud, PyPI or other repositories.The
default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.7.
However, you can create new environments that include any version of Python packaged with
conda.
Anaconda Navigator is a desktop Graphical User Interface (GUI) included in Anaconda
distribution that allows users to launch applications and manage conda packages, environments
and channels without using command-line commands. Navigator can search for packages on
Anaconda Cloud or in a local Anaconda Repository, install them in an environment, run the
packages and update them. It is available for Windows, macOS and Linux.
The following applications are available by default in Navigator:
 JupyterLab
 Jupyter Notebook
 QtConsole
 Spyder
 Glueviz
 Orange
 Rstudio
 Visual Studio Code
Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating
XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET
Framework is a language-neutral platform for writing programs that can easily and securely
interoperate. There’s no language barrier with .NET: there are numerous languages available to
the developer including Managed C++, C#, Visual Basic and Java Script. The .NET framework
provides the foundation for components to interact seamlessly, whether locally or remotely on
different platforms. It standardizes common data types and communications protocols so that
components created in different languages can easily interoperate.
“.NET” is also the collective name given to various software components built upon the .NET
platform. These will be both products (Visual Studio.NET and Windows.NET Server, for
instance) and services (like Passport, .NET My Services, and so on).
Microsoft VISUAL STUDIO is an Integrated Development Environment (IDE) from
Microsoft. It is used to develop computer programs, as well as websites, web apps, web services
and mobile apps.
Python is a powerful multi-purpose programming language created by Guido van Rossum. It has
simple easy-to-use syntax, making it the perfect language for someone trying to learn computer
programming for the first time. Python features are:
 Easy to code
 Free and Open Source
 Object-Oriented Language
 GUI Programming Support
 High-Level Language
 Extensible feature
 Python is Portable language
 Python is Integrated language
 Interpreted
 Large Standard Library
 Dynamically Typed Language
PYTHON:
 Python is a powerful multi-purpose programming language created by Guido
van Rossum.
 It has simple easy-to-use syntax, making it the perfect language for someone
trying to learn computer programming for the first time.
Features Of Python :
1.Easy to code:
Python is high level programming language. Python is very easy to learn language as compared
to other language like c, c#, java script, java etc. It is very easy to code in python language and
anybody can learn python basic in few hours or days. It is also developer-friendly language.
2. Free and Open Source:
Python language is freely available at official website and you can download it from the given
download link below click on the Download Python keyword.
Since, it is open-source, this means that source code is also available to the public. So you can
download it as, use it as well as share it.
3.Object-Oriented Language:
One of the key features of python is Object-Oriented programming. Python supports object
oriented language and concepts of classes, objects encapsulation etc.
4. GUI Programming Support:
Graphical Users interfaces can be made using a module such as PyQt5, PyQt4, wxPython or Tk
in python.
PyQt5 is the most popular option for creating graphical apps with Python.
5. High-Level Language:
Python is a high-level language. When we write programs in python, we do not need to
remember the system architecture, nor do we need to manage the memory.
6.Extensible feature:
Python is a Extensible language. we can write our some python code into c or c++ language and
also we can compile that code in c/c++ language.
7. Python is Portable language:
Python language is also a portable language. for example, if we have python code for windows
and if we want to run this code on other platform such as Linux, Unix and Mac then we do not
need to change it, we can run this code on any platform.
8. Python is Integrated language:
Python is also an Integrated language because we can easily integrated python with other
language like c, c++ etc.
9. Interpreted Language:
Python is an Interpreted Language. because python code is executed line by line at a time. like
other language c, c++, java etc there is no need to compile python code this makes it easier to
debug our code. The source code of python is converted into an immediate form called bytecode.
10. Large Standard Library
Python has a large standard library which provides rich set of module and functions so you do
not have to write your own code for every single thing.There are many libraries present in
python for such as regular expressions, unit-testing, web browsers etc.
11. Dynamically Typed Language:
Python is dynamically-typed language. That means the type (for example- int, double, long etc)
for a variable is decided at run time not in advance.because of this feature we don’t need to
specify the type of variable.
APPLICATIONS OF PYTHON :
WEB APPLICATIONS
 You can create scalable Web Apps using frameworks and CMS (Content Management
System) that are built on Python. Some of the popular platforms for creating Web Apps
are:Django, Flask, Pyramid, Plone, Django CMS.
 Sites like Mozilla, Reddit, Instagram and PBS are written in Python.
SCIENTIFIC AND NUMERIC COMPUTING
 There are numerous libraries available in Python for scientific and numeric computing.
There are libraries like:SciPy and NumPy that are used in general purpose computing.
And, there are specific libraries like: EarthPy for earth science, AstroPy for Astronomy
and so on.
 Also, the language is heavily used in machine learning, data mining and deep learning.
CREATING SOFTWARE PROTOTYPES
 Python is slow compared to compiled languages like C++ and Java. It might not be a
good choice if resources are limited and efficiency is a must.
 However, Python is a great language for creating prototypes. For example: You can use
Pygame (library for creating games) to create your game's prototype first. If you like the
prototype, you can use language like C++ to create the actual game.

GOOD LANGUAGE TO TEACH PROGRAMMING
 Python is used by many companies to teach programming to kids
 It is a good language with a lot of features and capabilities. Yet, it's one of the easiest
language to learn because of its simple easy-to-use sy
CHAPTER 4
IMPLEMENTATION
4.1 GENERAL
Python is a program that was originally designed to simplify the implementation of numerical
linear algebra routines. It has since grown into something much bigger, and it is used to
implement numerical algorithms for a wide range of applications. The basic language used is
very similar to standard linear algebra notation, but there are a few extensions that will likely
cause you some problems at first.
4.2 CODE IMPLEMENTATION
4.3 SNAPSHOTS
RESULT:
CHAPTER 5
CONCLUSION AND REFERENCES
CONCLUSION
Rainfall forecast is a daunting task for any algorithm to handle. However, the algorithm that we
focused on was the Artificial Neural Networks. The reason we chose RF & LR was because of
its ability to handle larger data, such as the large batch sizes that were inputted and also allows
various types of data used. This was a huge benefactor in our decision of using RF & LR. The
other reason was that it performed better than other algorithms when handling inconsistences in
the data such as noise or incomplete data. Inconsistences can throw off the accuracy of the
algorithms by an exceptional margin. However, RF & LR was capable of handling these types of
data. The final results agree with our choice as RF & LR was able to yield an accuracy of 87%.
The other algorithms could reach a maximum accuracy of 86%. If we consider extremely large
datasets, that 1% can make quite the difference in forecasting. Through our model, we were able
to prove that RF & LR are a viable model to be used in the field of weather forecasting. They can
handle large data, handle inconsistences, and yield higher accuracies. RF & LR is one of the true
spearheads in the domain of weather forecasting
FUTURE WORK:
In future research, we intend to incorporate different ensemble techniques to combine the
diversities of the models and increase the forecasting ability. We are planning to take data from
different regions to increase the diversity of the data set and check which model performs well
with such noisy data. The architecture of the network model will be examined further to enhance
the accuracy of predictions. We intend to extend our wing in understanding of neural networks
by using different neural network models like Recurrent Neural network(LSTM) and Time delay
neural network(TDNN). The accuracy of the probabilistic model like Naive Bayes will be
examined. In order to do so first, we need to perform discretization.
REFERENCES
1. Manojit Chattopadhyay and Surajit Chattopadhyay, "Elucidating the role of topological
pattern discovery and support vector machine in generating predictive models for Indian
summer monsoon rainfall", Theoretical and Applied Climatology, pp. 1-12, July 2015.
2. Kumar Abhishek, Abhay Kumar, Rajeev Ranjan and Sarthak Kumar, "A Rainfall
Prediction Model using Artificial Neural Network", 2012 IEEE Control and System
Graduate Research Colloquium (ICSGRC 2012), pp. 82-87, 2012.
3. Minghui Qiu, Peilin Zhao, Ke Zhang, Jun Huang, Xing Shi, Xiaoguang Wang, et al., "A
Short-Term Rainfall Prediction Model using Multi-Task Convolutional Neural
Networks", IEEE International Conference on Data Mining, pp. 395-400, 2017.
4. S Aswin, P Geetha and R Vinayakumar, "Deep Learning Models for the Prediction of
Rainfall", International Conference on Communication and Signal Processing, pp. 0657-
0661, April 3–5, 2018.
5. Xianggen Gan, Lihong Chen, Dongbao Yang and Guang Liu, "The Research Of Rainfall
Prediction Models Based On Matlab Neural Network", Proceedings of IEEE CCIS 2011,
pp. 45-48.
6. Cramer Sam, Michael Kampouridis, Alex A. Freitas and Antonis Alexandridis,
"Predicting Rainfall in the Context of Rainfall Derivatives Using Genetic
Programming", 2015 IEEE Symposium Series on Computational Intelligence, pp. 711-
718.
7. Mohini P. Darji, Vipul K. Dabhi and Harshadkumar B. Prajapati, "Rainfall Forecasting
Using Neural Network: A Survey", 2015 International Conference on Advances in
Computer Engineering and Applications (ICACEA), pp. 706-713.
8. Sandeep Kumar Mohapatra, Anamika Upadhyay and Channabasava Gola, "Rainfall
Prediction based on 100 years of Meterological Data", 2017 International Conference on
Computing and Communication Technologies for smart Nation, pp. 162-166.
9. Sankhadeep Chatterjee, Bimal Datta, Soumya Sen and Nilanjan Dey, "Rainfall Prediction
using Hybrid Neural Network Approach", 2018 2nd International Conference on Recent
Advances in Signal Processing Telecommunications & Computing (SigTeICom), pp. 67-
72.
10. Sunil Navadia, Pintukumar Yadav, Jobin Thomas and Shakila Shaikh, "Weather
Prediction: A novel approach for measuring and analyzing weather data", International
conference on I-SMAC (IoT in Social Mobile Analytics and Cloud) (I-SMAC 2017), pp.
414-417.
Ad

More Related Content

Similar to Predicting rainfall with data science in python (20)

employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
rohithprabhas1
 
Internship Report
Internship ReportInternship Report
Internship Report
Ritoban Gupta
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
Roberto Falconi
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
Geetha982072
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
IRJET Journal
 
Obj report
Obj reportObj report
Obj report
Manish Raghav
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
IRJET Journal
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
IRJET Journal
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
Datasciencetools
DatasciencetoolsDatasciencetools
Datasciencetools
jyostnanareshit
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
Vrushali Lanjewar
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
IJDMS
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
report_barc
report_barcreport_barc
report_barc
siontani
 
final_ppt on industrial 6 weeks training in Igap
final_ppt on industrial 6 weeks training in Igapfinal_ppt on industrial 6 weeks training in Igap
final_ppt on industrial 6 weeks training in Igap
namdevmisal
 
Learning Ray, 5th Early Release Max Pumperla
Learning Ray, 5th Early Release Max PumperlaLearning Ray, 5th Early Release Max Pumperla
Learning Ray, 5th Early Release Max Pumperla
gjslndtloto
 
Deep Learning Applications and Image Processing
Deep Learning Applications and Image ProcessingDeep Learning Applications and Image Processing
Deep Learning Applications and Image Processing
ijtsrd
 
Github-Source code management system SRS
Github-Source code management system SRSGithub-Source code management system SRS
Github-Source code management system SRS
Aditya Narayan Swami
 
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
Kamal Acharya
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech Recognition
IRJET Journal
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
rohithprabhas1
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
Geetha982072
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
IRJET Journal
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
IRJET Journal
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
IRJET Journal
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
Vrushali Lanjewar
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
IJDMS
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
report_barc
report_barcreport_barc
report_barc
siontani
 
final_ppt on industrial 6 weeks training in Igap
final_ppt on industrial 6 weeks training in Igapfinal_ppt on industrial 6 weeks training in Igap
final_ppt on industrial 6 weeks training in Igap
namdevmisal
 
Learning Ray, 5th Early Release Max Pumperla
Learning Ray, 5th Early Release Max PumperlaLearning Ray, 5th Early Release Max Pumperla
Learning Ray, 5th Early Release Max Pumperla
gjslndtloto
 
Deep Learning Applications and Image Processing
Deep Learning Applications and Image ProcessingDeep Learning Applications and Image Processing
Deep Learning Applications and Image Processing
ijtsrd
 
Github-Source code management system SRS
Github-Source code management system SRSGithub-Source code management system SRS
Github-Source code management system SRS
Aditya Narayan Swami
 
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
Kamal Acharya
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech Recognition
IRJET Journal
 

Recently uploaded (20)

What is the Philosophy of Statistics? (and how I was drawn to it)
What is the Philosophy of Statistics? (and how I was drawn to it)What is the Philosophy of Statistics? (and how I was drawn to it)
What is the Philosophy of Statistics? (and how I was drawn to it)
jemille6
 
Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)
Mohamed Rizk Khodair
 
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Leonel Morgado
 
Kumushini_Thennakoon_CAPWIC_slides_.pptx
Kumushini_Thennakoon_CAPWIC_slides_.pptxKumushini_Thennakoon_CAPWIC_slides_.pptx
Kumushini_Thennakoon_CAPWIC_slides_.pptx
kumushiniodu
 
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
TechSoup
 
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE  BY sweety Tamanna Mahapatra MSc PediatricAPGAR SCORE  BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
SweetytamannaMohapat
 
Tax evasion, Tax planning & Tax avoidance.pptx
Tax evasion, Tax  planning &  Tax avoidance.pptxTax evasion, Tax  planning &  Tax avoidance.pptx
Tax evasion, Tax planning & Tax avoidance.pptx
manishbaidya2017
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18
Celine George
 
How to Manage Upselling in Odoo 18 Sales
How to Manage Upselling in Odoo 18 SalesHow to Manage Upselling in Odoo 18 Sales
How to Manage Upselling in Odoo 18 Sales
Celine George
 
How to Create Kanban View in Odoo 18 - Odoo Slides
How to Create Kanban View in Odoo 18 - Odoo SlidesHow to Create Kanban View in Odoo 18 - Odoo Slides
How to Create Kanban View in Odoo 18 - Odoo Slides
Celine George
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
How to Add Customer Note in Odoo 18 POS - Odoo Slides
How to Add Customer Note in Odoo 18 POS - Odoo SlidesHow to Add Customer Note in Odoo 18 POS - Odoo Slides
How to Add Customer Note in Odoo 18 POS - Odoo Slides
Celine George
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdfRanking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Rafael Villas B
 
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
Dr. Nasir Mustafa
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
Lecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptx
Lecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptxLecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptx
Lecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptx
Arshad Shaikh
 
Ancient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian HistoryAncient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian History
Virag Sontakke
 
The History of Kashmir Karkota Dynasty NEP.pptx
The History of Kashmir Karkota Dynasty NEP.pptxThe History of Kashmir Karkota Dynasty NEP.pptx
The History of Kashmir Karkota Dynasty NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
What is the Philosophy of Statistics? (and how I was drawn to it)
What is the Philosophy of Statistics? (and how I was drawn to it)What is the Philosophy of Statistics? (and how I was drawn to it)
What is the Philosophy of Statistics? (and how I was drawn to it)
jemille6
 
Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)
Mohamed Rizk Khodair
 
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Leonel Morgado
 
Kumushini_Thennakoon_CAPWIC_slides_.pptx
Kumushini_Thennakoon_CAPWIC_slides_.pptxKumushini_Thennakoon_CAPWIC_slides_.pptx
Kumushini_Thennakoon_CAPWIC_slides_.pptx
kumushiniodu
 
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
TechSoup
 
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE  BY sweety Tamanna Mahapatra MSc PediatricAPGAR SCORE  BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
SweetytamannaMohapat
 
Tax evasion, Tax planning & Tax avoidance.pptx
Tax evasion, Tax  planning &  Tax avoidance.pptxTax evasion, Tax  planning &  Tax avoidance.pptx
Tax evasion, Tax planning & Tax avoidance.pptx
manishbaidya2017
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18
Celine George
 
How to Manage Upselling in Odoo 18 Sales
How to Manage Upselling in Odoo 18 SalesHow to Manage Upselling in Odoo 18 Sales
How to Manage Upselling in Odoo 18 Sales
Celine George
 
How to Create Kanban View in Odoo 18 - Odoo Slides
How to Create Kanban View in Odoo 18 - Odoo SlidesHow to Create Kanban View in Odoo 18 - Odoo Slides
How to Create Kanban View in Odoo 18 - Odoo Slides
Celine George
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
How to Add Customer Note in Odoo 18 POS - Odoo Slides
How to Add Customer Note in Odoo 18 POS - Odoo SlidesHow to Add Customer Note in Odoo 18 POS - Odoo Slides
How to Add Customer Note in Odoo 18 POS - Odoo Slides
Celine George
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdfRanking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Rafael Villas B
 
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
PHYSIOLOGY MCQS By DR. NASIR MUSTAFA (PHYSIOLOGY)
Dr. Nasir Mustafa
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
Lecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptx
Lecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptxLecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptx
Lecture 2 CLASSIFICATION OF PHYLUM ARTHROPODA UPTO CLASSES & POSITION OF_1.pptx
Arshad Shaikh
 
Ancient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian HistoryAncient Stone Sculptures of India: As a Source of Indian History
Ancient Stone Sculptures of India: As a Source of Indian History
Virag Sontakke
 
Ad

Predicting rainfall with data science in python

  • 1. MACHINE LEARNING BASED RAINFALL PREDICTION Abstract: Machine learning and Feature Selection are playing a vital role in internet and health sector also. Rainfall prediction is important as heavy rainfall can lead to many disasters. The prediction helps people to take preventive measures and moreover the prediction should be accurate. There are two types of prediction short term rainfall prediction and long term rainfall. Prediction mostly short term prediction can gives us the accurate result. The main challenge is to build a model for long term rainfall prediction. Heavy precipitation prediction could be a major drawback for earth science department because it is closely associated with the economy and lifetime of human. It’s a cause for natural disasters like flood and drought that square measure encountered by individuals across the world each year. Accuracy of rainfall statement has nice importance for countries like India whose economy is basically dependent on agriculture. Rainfall prediction is the one of the important technique to predict the climatic conditions in any country. This paper proposes a rainfall prediction model using LR & RF for dataset. The input data is having multiple meteorological parameters and to predict the rainfall in more precise. From the results, the proposed machine learning model provides better results than the other algorithms in the literature. The goal of this project is to develop an appropriate machine learning tool which can predict will be rain or not. The algorithm that can be used here are Logistic Regression and Random Forest.
  • 2. TABLE OF CONTENTS CHAPTE R NO. TITLE PAGE NO. 1. CHAPTER 1 : INTRODUCTION 1.1 GENERAL 1.1.1 THE MACHINE LEARNING SYSTEM 1.1.2 FUNDAMENTAL 1.2 JUPYTER 1.3 MACHINE LEARNING 1.4 CLASSIFICATION TECHNIQUES 1.4.1 NEURAL NETWORK AND DEEP LEARNING 1.4.2 METHODOLOGIES - GIVEN INPUT AND EXPECTED OUTPUT 1.5 OBJECTIVE AND SCOPE OF THE PROJECT 1.6 EXISTING SYSTEM 1.6.1 DISADVANTAGES OF EXISTING SYSTEM 1.6.2 LITERATURE SURVEY 1.7 PROPOSED SYSTEM 1.7.1 PROPOSED SYSTEM ADVANTAGES 4 6 9 12 12 13 17 17 2. CHAPTER 2 :PROJECT DESCRIPTION 2.1 INTRODUCTION 2.2 DETAILED DIAGRAM 2.2.1 FRONT END DESIGN 2.2.2 BACK END FLOW 2.3 SOFTWARE SPECIFICATION 28 29 29 30
  • 3. 2.3.1 HARDWARE SPECIFICATION 2.3.2 SOFTWARE SPECIFICATION 2.4 MODULE DESCRIPTION 2.4.1 DATA COLLECTION 2.4.2 DATA AUGUMENTATION 2.4.3 DATA SPLITTING 2.4.4 CLASSIFICATION 2.4.5 PERFORMANCES MATRICES 2.4.6 CONFUSION MATRIX 2.5 MODULE DIAGRAM 2.5.1 SYSTEM ARCHITECTURE 2.5.2 USECASE DIAGRAM 2.5.3 CLASS DIAGRAM 2.5.4 ACTIVITY DIAGRAM 2.5.5 SEQUENCE DIAGRAM 2.5.6 STATE FLOW DIAGRAM 2.5.7 FLOW DIAGRAM 30 31 32 33 34 35 35 36 37 38 40 3. CHAPTER 3 : SOFTWARE SPECIFICATION 3.1 GENERAL 3.2 ANACONDA 3.3 PYTHON 3.2.1 SCIENTIFIC AND NUMERIC COMPUTING 3.2.2 CREATING SOFTWARE PROTOTYPES 3.2.3 GOOD LANGUAGE TO TEACH PROGRAMMING 41 42 43 44 44 44 4. CHAPTER 4 : IMPLEMENTATION 4.1 GENERAL 4.2 IMPLEMENTATION CODING 4.3 SNAPSHOTS 48 48 51 5. CHAPTER 5 : CONCLUSION & REFERENCES 5.1 CONCLUSION 5.2 REFERENCES 55 56
  • 4. CHAPTER I INTRODUCTION 1.1 GENERAL Glossary and Key Terms This section provides a quick reference for several algorithms that are not explicity mentioned in this chapter, but may be of interest to the reader. This should provide the reader with some keywords or useful points of reference for other similar libraries to those discussed in this chapter. BIDMachGPU accelerated machine learning library for algorithms that are not necessarily neural network based. Caret provides a standardised API for many of the most useful machine learning packages for R. For readers who are more comfortable with R, Caret provides a good substitute for Python’s SciKit-Learn. Mathematicais a commercial symbolic mathematical computation system, developed since 1988 by Wolfram, Inc. It provides powerful machine learning techniques “out of the box” such as image classification [4]. MATLAB is short for MATrixLABoratory, which is a commercial numerical computing environment, and is a proprietary programming language by MathWorks. It is very popular at universities where it is often licensed. It was originally built on the idea that most computing applications in some wayrely on storage and manipulations of one fundamental object—the matrix, and this is still a popular approach. -R is used extensively by the statistics community. The software package Caret provides a standardised API for many of R’s machine learning libraries.
  • 5. WEKA is short for the Waikato Environment for Knowledge Analysis [6] and has been a very popular open source tool since its inception in 1993. In 2005Weka received the SIGKDD Data Mining and Knowledge Discovery Service Award: it is easy to learn and simple to use, and provides a GUI to many machine learning algorithms. VowpalWabbitMicrosoft’s machine learning library. Mature and actively developed, with an emphasis on performance. Requirements and Installation The most convenient way of installing the Python requirements for this tutorial is by using the Anaconda scientific Python distribution. Anaconda is a collection of the most commonly used Python packages preconfigured and ready to use. Approximately 150 scientific packages are included in the Anaconda installation. Install the version of Anaconda for your operating system. All Python software described here is available for Windows, Linux, and Macintosh. All code samples presented in this tutorial were tested under Ubuntu Linux 14.04 using Python 2.7. Some code examples may not work on Windows without slight modification (e.g. file paths in Windows use and not / as in UNIX type systems). The main software used in a typical Python machine learning pipeline can consist of almost any combination of the following tools: 1. NumPy, for matrix and vector manipulation 2. Pandas for time series and R-like DataFrame data structures 3. The 2D plotting library matplotlib 4. SciKit-Learn as a source for many machine learning algorithms and utilities 5. Keras for neural networks and deep learning Managing Packages Anaconda comes with its own built in package manager, known as Conda. Using the conda command from the terminal, you can download, update, and delete Python packages. Conda
  • 6. takes care of all dependencies and ensures that packages are preconfigured to work with all other packages you may have installed. Keeping your Python distribution up to date and well maintained is essential in this fast moving field. However, Anaconda makes it particularly easy to manage and keep your scientific stack up to date. Once Anaconda is installed you can manage your Python distribution, and all the scientific packages installed by Anaconda using the conda application from the command line. To list all packages currently installed, use conda list. This will output all packages and their version numbers. Updating all Anaconda packages in your system is performed using the conda update -all command. Conda itself can be updated using the conda update conda command, while Python can be updated using the conda update python command. To search for packages, use the search parameter, e.g. conda search stats where stats is the name or partial name of the package you are searching for. OBJECTIVE AND SCOPE OF THE PROJECT  The objective of this project is to show how sentimental analysis can help improve the user experience over a social network or system interface.  The learning algorithm will learn what our emotions are from statistical data then perform sentiment analysis.  Our main objective is also maintain accuracy in the final result.  The main goal of such a sentiment analysis is to discover how the audience perceives the television show. The Twitter data that is collected will be classified into two categories; positive or negative. An analysis will then be performed on the classified data to investigate what percentage of the audience sample falls into each category.  Particular emphasis is placed on evaluating different machine learning algorithms for the task of twitter sentiment analysis.
  • 7. Jupiter Jupyter, previously known as IPython Notebook, is a web-based, interactive development environment. Originally developed for Python, it has since expanded to support over 40 other programming languages including Julia and R. Jupyter allows for notebooksto be written that contain text, live code, images, and equations. These notebooks can be shared, and can even be hosted on GitHubfor free. For each section of this tutorial, you can download a Juypter notebook that allows you to edit and experiment with the code and examples for each topic. Jupyter is part of the Anaconda distribution; it can be started from the command line using the jupyter command: Machine Learning We will now move on to the task of machine learning itself. In the following sections we will describe how to use some basic algorithms, and perform regression, classification, and clustering on some freely available medical datasets concerning breast cancer and diabetes, and we will also take a look at a DNA microarray dataset.
  • 8. SciKit-Learn SciKit-Learn provides a standardised interface to many of the most commonly used machine learning algorithms, and is the most popular and frequently used library for machine learning for Python. As well as providing many learning algorithms, SciKit-Learn has a large number of convenience functions for common preprocessing tasks (for example, normalisation or k-fold cross validation). SciKit-Learn is a very large software library. Clustering Clustering algorithms focus on ordering data together into groups. In general clustering algorithms are unsupervised—they require no y response variable as input. That is to say, they attempt to find groups or clusters within data where you do not know the label for each sample. SciKit-Learn have many clusteringalgorithms, but in this section we will demonstrate hierarchical clustering on a DNA expression microarray dataset using an algorithm from the SciPy library.
  • 9. We will plot a visualisation of the clustering using what is known as a dendrogram, also using the SciPy library. The goal is to cluster the data properly in logical groups, in this case into the cancer types represented by each sample’s expression data. We do this using agglomerative hierarchical clustering, using Ward’s linkage method: Classification weanalysed data that was unlabelled—we did not know to what class a sample belonged (known as unsupervised learning). In contrast to this, a supervised problem deals with labelled data where are aware of the discrete classes to which each sample belongs. When we wish to predict which class a sample belongs to, we call this a classification problem. SciKit-Learn has a number of algorithms for classification, in this section we will look at the Support Vector Machine. We will work on the Wisconsin breast cancer dataset, split it into a training set and a test set, train a Support Vector Machine with a linear kernel, and test the trained model on an unseen dataset. The Support Vector Machine model should be able to predict if a new sample is malignant or benign based on the features of a new, unseen sample:
  • 10. You will notice that the SVM model performed very well at predicting the malignancy of new, unseen samples from the test set—this can be quantified nicely by printing a number of metrics using the classification report function. Here, the precision, recall, and F1 score (F1 = 2· precision·recall/precision+recall) for each class is shown. The support column is a count of the number of samples for each class. Support Vector Machines are a very powerful tool for classification. They work well in high dimensional spaces, even when the number of features is higher than the number of samples. However, their running time is quadratic to the number of samples so large datasets can become difficult to train. Quadratic means that if you increase a dataset in size by 10 times, it will take 100 times longer to train. Last, you will notice that the breast cancer dataset consisted of 30 features. This makes it difficult to visualize or plot the data. To aid in visualization of highly dimensional data, we can apply a technique called dimensionality reduction. Dimensionality Reduction Another important method in machine learning, and data science in general, is dimensionality reduction. For this example, we will look at the Wisconsin breast cancer dataset once again. The dataset consists of over 500 samples, where each sample has 30 features. The features relate to
  • 11. images of a fine needle aspirate of breast tissue, and the features describe the characteristics of the cells present in the images. All features are real values. The target variable is a discrete value (either malignant or benign) and is therefore a classification dataset. You will recall from the Iris example in Sect. 7.3 that we plotted a scatter matrix of the data, where each feature was plotted against every other feature in the dataset to look for potential correlations (Fig. 3). By examining this plot you could probably find features which would separate the dataset into groups. Because the dataset only had 4 features we were able to plot each feature against each other relatively easily. However, as the numbers of features grow, this becomes less and less feasible, especially if you consider the gene expression example in Sect. 9.4 which had over 6000 features. One method that is used to handle data that is highly dimensional is Principle Component Analysis, or PCA. PCA is an unsupervised algorithm for reducing the number of dimensions of a dataset. For example, for plotting purposes you might want to reduce your data down to 2 or 3 dimensions, and PCA allows you to do this by generating components, which are combinations of the original features, that you can then use to plot your data. PCA is an unsupervised algorithm. You supply it with your data, X, and you specify the number of components you wish to reduce its dimensionality to. This is known as transforming the data:
  • 12. Again, you would not use this model for new data—in a real world scenario, you would, for example, perform a 10-fold cross validation on the dataset, choosing the model parameters that perform best on the cross validation. This model would be much more likely to perform well on new data. At the very least, you would randomly select a subset, say 30% of the data, as a test set and train the model on the remaining 70% of the dataset. You would evaluate the model based on the score on the test set and not on the training set . NEURAL NETWORKS AND DEEP LEARNING While a proper description of neural networks and deep learning is far beyond the scope of this chapter, we will however discuss an example use case of one of the most popular frameworks for deep learning: Keras4. In this section we will use Keras to build a simple neural network to classify theWisconsin breast cancer dataset that was described earlier. Often, deep learning algorithms and neural networks are used to classify images—convolutional neural networks are especially used for image related classification. However, they can of course be used for text or tabular-based data as well. In this we will build a standard feed-forward, densely connected neural network and classify a text-based cancer dataset in order to demonstrate the framework’susage. In this example we are once again using the Wisconsin breast cancer dataset, which consists of 30 features and 569 individual samples. To make it more challenging for the neural network, we
  • 13. will use a training set consisting of only 50% of the entire dataset, and test our neural network on the remaining 50% of the data. Note,Keras is not installed as part of the Anaconda distribution, to install it use pip: Keras additionally requires either Theano or TensorFlow to be installed. In the examples in this chapter we are using Theano as a backend, however the code will work identically for either backend. You can install Theano using pip, but it has a number of dependencies that must be installed first. Refer to the Theano and TensorFlow documentation for more information [12]. Keras is a modular API. It allows you to create neural networks by building a stack of modules, from the input of the neural network, to the output of the neural network, piece by piece until you have a complete network. Also, Keras can be configured to use your Graphics Processing Unit, or GPU. This makes training neural networks far faster than if we were to use a CPU. We begin by importing Keras: We may want to view the network’s accuracy on the test (or its loss on the training set) over time (measured at each epoch), to get a better idea how well it is learning. An epoch is one complete cycle through the training data. Fortunately, this is quite easy to plot as Keras’ fit function returns a history object which we can use to do exactly this: This will result in a plot similar to that shown. Often you will also want to plot the loss on the test set and training set, and the accuracy on the test set and training set. Plotting the loss and accuracy can be used to see if you are over fitting (you experience tiny loss on the training set, but large loss on the test set) and to see when your training has plateaued.
  • 14. Problem Statement: Rainfall prediction is a beneficiary one, but it is a challenging task. Machine learning techniques can use computational methods and predict rainfall by retrieving and integrating the hidden knowledge from the linear and non-linear patterns of past weather data. Various tools and methods for predicting rain are currently available, but there is still a shortage of accurate results. Existing methods are failing whenever massive datasets are used for rainfall prediction. OBJECTIVE: Predicting rainfall is an application of science and technology for predicting the amount of rain over an area. The most important thing is to accurately determine the rainfall for active use of rainfall for water resources, crops, pre-planning of water resources and for agricultural purposes. In earlier rainfall information benefits the farmers for better managing their crops and properties from heavy rainfall. The farmers better manage to increase the economic growth of the country by efficient rainfall information. Prediction of precipitation is necessary to save the life of people’s and properties from flooding. Prediction of rainfall helps people in coastal areas by preventing the floods.
  • 15. SCOPE OF THE PROJECT: The accurate and precise rainfall prediction is still lacking which could assist in diverse fields like agriculture, water reservation and flood prediction. The issue is to formulate the calculations for the rainfall prediction that would be based on the previous findings and similarities and will give the output predictions that are reliable and appropriate. The imprecise and inaccurate predictions are not only the waste of time but also the loss of resources and lead to inefficient management of crisis like poor agriculture, poor water reserves and poor management of floods. Therefore, the need is not to formulate only the rainfall predicting system but also a system that is more accurate and precise as compared to the existing rainfall predictors. EXISTING SYSTEM Supervised learning is built to make prediction, given an unforeseen input instance. A supervised learning algorithm takes a known set of input dataset and its known responses to the data (output) to learn the regression/classification model. An algorithm is used to learn the dataset and train it to generate the model for prediction of rainfall for the response to new data or test data. Supervised learning uses classification algorithms and regression techniques to develop predictive models. 1.NAIVE BAYES: Naive Bayes classifiers calculate the probability of a sample to be of a certain category, based on prior knowledge. They use the Naïve Bayes Theorem, that assumes that the effect of a certain feature of a sample is independent of the other features. That means that each character of a sample contributes independently to determine the probability of the classification of that sample, outputting the category of the highest probability of the sample. In Bernoulli Naïve Bayes the predictors are boolean variables. The parameters that we use to predict the class
  • 16. variable take up only values yes or no.The basic idea of Naive Bayes technique is to find the probabilities of classes assigned to texts by using the joint probabilities of words and classes. 2.LOGISTICREGRESSION: Logistic regression is basically a supervised classification algorithm. In a classification problem, the target variable(or output), y, can take only discrete values for given set of features(or inputs), X. The logistic regression model described relationship between predictors that can be continuous, binary, and categorical. Logistic regression becomes a classification technique only when a decision threshold is brought into the picture. The setting of the threshold value is a very important aspect of logistic regression and is dependent on the classification problem itself. It predicts the probability that a given data entry belongs to the category numbered as “1”. Just like Linear regression assumes that the data follows a linear function, Logistic regression models the data using the sigmoid function. 1.1.1 DISADVANTAGES OF EXISTING SYSTEM Methods have performance limitations because of wide range of variations in data and amount  of data is limited. Issue involved in rainfall classification is choosing the required sampling recess of  Observation-Forecasting of rainfall, which is dependent upon the sampling interval of input data. Less accuracy  LITERATURE SURVEY: 1. TITLE: PRDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES Author: Moulana Mohammed, Roshitha Kolapalli, Niharika Golla, Siva Sai Maturi YEAR: - 2020 Abstract:
  • 17. Rainfall prediction is important as heavy rainfall can lead to many disasters. The prediction helps people to take preventive measures and moreover the prediction should be accurate. There are two types of prediction short term rainfall prediction and long term rainfall. Prediction mostly short term prediction can gives us the accurate result. The main challenge is to build a model for long term rainfall prediction. Heavy precipitation prediction could be a major drawback for earth science department because it is closely associated with the economy and lifetime of human. It’s a cause for natural disasters like flood and drought that square measure encountered by individuals across the world each year. Accuracy of rainfall statement has nice importance for countries like India whose economy is basically dependent on agriculture. The dynamic nature of atmosphere, applied mathematics techniques fail to provide sensible accuracy for precipitation statement. The prediction of precipitation using machine learning techniques may use regression. Intention of this project is to offer non-experts easy access to the techniques, approaches utilized in the sector of precipitation prediction and provide a comparative study among the various machine learning techniques. 2. TITLE: RAINFALL PRDICTION USING ACHINE LEARNING ALGORITHM Author: Kumar Arun, Garg Ishan, Kaur Sanmeet YEAR: - 2019 Abstract: This paper introduces current supervised learning models which are based on machine learning algorithm for Rainfall prediction in India. Rainfall is always a major issue across the world as it affects all the major factor on which the human being is depended. In current, Unpredictable and accurate rainfall prediction is a challenging task. We apply rainfall data of India to different machine learning algorithms and compare the accuracy of classifiers such as SVM, Navie Bayes, Logistic Regression, Random Forest and Multilayer Perceptron (MLP). Our motive if to get the optimized result and a better rainfall prediction.
  • 18. 3. TITLE: A NEURAL NETWORK BASED LOCAL RAINFALL PREDICTION Author: Tomoa kiKashiwaoa, Koichi Nakayama, ShinAndo YEAR: - 2017 Abstract: In this study, we develop and test a local rainfall (precipitation) prediction system based on artificial neural networks (ANNs). Our system can automatically obtain meteorological data used for rainfall prediction from the Internet. Meteorological data from equipment installed at a local point is also shared among users in our system. The final goal of the study was the practical use of “big data” on the Internet as well as the sharing of data among users for accurate rainfall prediction. We predicted local rainfall in regions of Japan using data from the Japan Meteorological Agency (JMA). As neural network (NN) models for the system, we used a multi- layer perceptron (MLP) with a hybrid algorithm composed of back-propagation (BP) and random optimization (RO) methods, and radial basis function network (RBFN) with a least squares method (LSM), and compared the prediction performance of the two models. Precipitation (total amount of rainfall above 0.5 mm between 12:00 and 24:00 JST (Japan standard time)) at Matsuyama, Sapporo, and Naha in 2012 was predicted by NNs using meteorological data for each city from 2011. The volume of precipitation was also predicted (total amount above 1.0 mm between 17:00 and 24:00 JST) at 16 points in Japan and compared with predictions by the JMA in order to verify the universality of the proposed system. The experimental results showed that precipitation in Japan can be predicted by the proposed method, and that the prediction performance of the MLP model was superior to that of the RBFN model for the rainfall prediction problem. However, the results were not better than those generated by the JMA. Finally, heavy rainfall (above 10 mm/h) in summer (Jun.–Sep.) afternoons (12:00–24:00 JST) in Tokyo in 2011 and 2012 was predicted using data for Tokyo between 2000 and 2010. The results showed that the volume of precipitation could be accurately predicted and the caching rate of heavy rainfall was high. This suggests that the proposed system can predict unexpected local heavy rainfalls as “guerrilla rainstorms.”
  • 19. 4. TITLE: APPLICATION OF THE DEEP LEARNING FOR THE PREDICTION OF RAINFALL IN SOUTHERN TAIWAN Author: Meng-Hua Yen, Ding-Wei Liu, Yi-Chia Hsin, Chu-En Lin YEAR: - 2018 Abstract: Precipitation is useful information for assessing vital water resources, agriculture, ecosystems and hydrology. Data-driven model predictions using deep learning algorithms are promising for these purposes. Echo state network (ESN) and Deep Echo state network (DeepESN), referred to as Reservoir Computing (RC), are effective and speedy algorithms to process a large amount of data. In this study, we used the ESN and the DeepESN algorithms to analyze the meteorological hourly data from 2002 to 2014 at the Tainan Observatory in the southern Taiwan. The results show that the correlation coefficient by using the DeepESN was better than that by using the ESN and commercial neuronal network algorithms (Back-propagation network (BPN) and support vector regression (SVR), MATLAB, The MathWorks co.), and the accuracy of predicted rainfall by using the DeepESN can be significantly improved compared with those by using ESN, the BPN and the SVR. In sum, the DeepESN is a trustworthy and good method to predict rainfall; it could be applied to global climate forecasts which need high-volume data processing. 5. TITLE: RAINFALL PREDICTION USING MACHINE LEARNING AND NEURAL NETWORK Author: Kaushik Dutta, Gouthaman. P YEAR: - 2020 Abstract:
  • 20. Rainfall prediction model mainly based on artificial neural networks have been proposed in India until now. This research work does a comparative study of two rainfall prediction approaches and finds the more accurate one. The present technique to predict rainfall doesn’t work well with the complex data present. The approaches which are being used now-a-days are statistical methods and numerical methods, which don’t work accurately when there is any non-linear pattern. Existing system fails whenever the complexity of the datasets which contains past rainfall increases. Henceforth, to find the best way to predict rainfall, study of both machine learning and neural networks is performed and the algorithm which gives more accuracy is further used in prediction. Recently, rainfall is considered the primary source of most of the economy of our country. Agriculture is considered the main economy driven source. To do a proper investment on agriculture, a proper estimation of rainfall is needed. Along with agriculture, rainfall prediction is needed for the people in coastal areas. People in coastal areas are in high risk of heavy rainfall and floods, so they should be aware of the rainfall much earlier so that they can plan their stay accordingly. For areas which have less rainfall and faces water scarcity should have rainwater harvesters, which can collect the rainwater. To establish a proper rainwater harvester, rainfall estimation is required. Weather forecasting is the easiest and fastest way to get a greater outreach. This research work can be used by all the weather forecasting channels, so that the prediction news can be more accurate and can spread to all parts of the country. 6. TITLE: STUDY OF SHORT TERM RAIN FORECASTING USING MACHINE LEARNING BASED APPROACH Author: M. S. Balamurugan & R. Manojkumar YEAR: - 2019 Abstract: Weather forecasting has been still dependent on statistical and numerical analysis in most part of the world. Though statistical and numerical analysis provides better results, it highly depends on stable historical relationships with the predict and predicting value of the predict and at a future
  • 21. time. On the other hand, machine learning explores new algorithmic approaches in prediction which is based on data-driven prediction. Climatic changes for a location are dependent on variable factors like temperature, precipitation, atmospheric pressure, humidity, wind speed and combination of other such factors which are variable in nature. Since climatic changes are location-based statistical and numerical approaches result in failure at times and needs an alternate method like machine learning based study of understanding about the weather forecast. In this study it has been observed that percentage in departure of rainfall has been ranging from 46 to 91% for the month of June 2019 as per Indian Meteorological Department (IMD) by using the traditional forecasting methods, but whereas based on the following study implemented using machine learning it has been observed that forecast was able to achieve much better rainfall prediction comparative to statistical methods. 1.1 PROPOSED SYSTEM
  • 22. In proposed work, Regression analysis: Regression analysis deals with the dependence of one variable (called as dependent variable) on one or more other variables, (called as independent variables) which is useful for estimating and/ or predicting the mean or average value of the former in terms of known or fixed values of the latter. For example, the salary of a person is based on his/her experience here, the experience attribute is independent variable salary is dependent variable. Simple linear regression defines the relationship between a single dependent variable and a single independent variable. The below equation is the general form of regression. y = β0 + β1x + ε where β0 and β1 are parameters, and ε is a probabilistic error term. Regression analysis is a vital tool for modeling and analyzing information. It is used for predictive analysis that is forecasting of rainfall or weather, predicting trends in business, finance, and marketing. It can also be used for correcting errors and also provide quantitative support. The advantages of regression analysis are: 1. It is a powerful technique for testing relationship between one dependent variable and many independent variables. 2. It allows researchers to control extraneous factors. 3. Regression asses the cumulative effect of multiple factors. 4. It also helps to attain the measure of error using the regression line as a base for estimations. ARCHITECTURE FOR PROPOSED SYSTEM:
  • 23. Proposed approach : The back-propagation technique works well with less complex system, but as the complexity of the system increases back propagation method’s accuracy decreases. This process deals with four types of inputs and three types of outputs layers. Following are the four-input layer used: 1. Air temperature 2. Air humidity
  • 24. 3. Wind speed 4. Sunshine duration Following are the output layers used: 1. Rainfall 2. Medium rainfall 3. High rainfall Steps associated with the proposed system are input of data, preprocess of data, splitting of data, training of the algorithm, testing of the dataset, comparing both the algorithm, giving the best algorithm, prediction with the more accurate algorithm and result at the end. The main reason for not doing prediction with both the algorithm is to reduce the complexities of the whole system, so the system first finds the most accurate algorithm between machine learning and neural network and accordingly does prediction with the better one. The result will be received in the form of graphs and excel sheets. For preprocess , all the result will be received in the form of different graphs and for machine learning and neural network , the accuracy will be received in the form of Metrics as well as excel sheet and accordingly the predicted value will be received in the form of excel sheet which will contain two columns ID and predicted value. IDs will be same as that of in the datasheet. To get for which region prediction is being done, IDs should be matched with the IDs present in dataset. PROPOSED SYSTEM ADVANTAGES  Speed and very low complexity, which makes it very well suited to operate on real scenarios.  Computation load needed for image processing purpose is much reduced, combined with very simple classifiers..  Ability to learn and extract complex image features.
  • 25.  With its simplicity and fast processing time, the proposed algorithm is suitable to be implemented in embedded system or mobile application that has limited processing resources CHAPTER 2 PROJECT DESCRIPTION 2.1 INTRODUCTION
  • 26. In today’s situation, rainfall is considered to be one of the sole responsible factors for most of the significant things across the world. In India, agriculture is considered to be one of the important factors for deciding the economy of the country and agriculture is solely dependent on rainfall. Apart From that in the coastal areas across the world, getting to know the amount of rainfall is very much necessary. In some of the areas which have water scarcity, to establish rain water harvester, prior prediction of the rainfall should be done. This project deals with the prediction of rainfall using machine learning & neural networks. The project performs the comparative study of machine learning approaches and neural network approaches then accordingly portrays the efficient approach for rainfall prediction. First of all, preprocess is performed When it comes to machine learning, LASSO regression is being used and for neural network, ANN (Artificial neural network) approach is being used. After calculation, types of errors, accuracy of both LASSO and ANN has been compared and accordingly conclusion has been made. To reduce the systems complexity, the prediction has been done with the approach that has better accuracy. The prediction has been done using the dataset which contains rainfall data from year 1901 to 2015 for different regions across the country. It contains month wise data as well as annual rainfall data for the same. Currently, rainfall prediction has become one of the key factors for most of the water conservation systems in and across country. One of the biggest challenges is the complexity present in rainfall data. Most of the rainfall prediction system, nowadays are unable to find the hidden layers or any non-linear patterns present in the system. This project will assist to find all the hidden layers as well as non- linear patterns, which is useful for performing the precise prediction of rainfall [1]. Rainfall prediction is the application to predict the rainfall in a given region. It can be done in two types. The first is to analyze the physical law that affects rainfall and the second one is to make a system which will discover hidden patterns or the features that affects the physical factors and the process involved in achieving it. The second one is better because it doesn’t include any type of mathematical calculations and can be useful for complex and non-linear data [2]. Due to presence of the system which doesn’t find the hidden layers and nonlinear patterns accurately, the prediction results to be wrong for most of the times and that may lead to huge losses. So, the main objective for this research work is to find a system that can resolve both the
  • 27. issues i.e. able to find complexity as well as hidden layers present, which will give proper and accurate prediction thereby assisting the country to develop when it comes to agriculture and economy. 2.2 DETAILED DIAGRAM 2.2.1 Back End Module Diagrams:
  • 28. FRONT END: 2.3 SYSTEM SPECIFICATION: 2.3.1 HARDWARE REQUIREMENTS:
  • 29. The hardware requirements may serve as the basis for a contract for the implementation of the system and should therefore be a complete and consistent specification of the whole system. They are used by software engineers as the starting point for the system design. It shows what the system does and not how it should be implemented PROCESSOR : Intel I5 RAM : 4GB HARD DISK : 40 GB 2.3.2 SOFTWARE REQUIREMENTS: The software requirements document is the specification of the system. It should include both a definition and a specification of requirements. It is a set of what the system should do rather than how it should do it. The software requirements provide a basis for creating the software requirements specification. It is useful in estimating cost, planning team activities, performing tasks and tracking the team’s and tracking the team’s progress throughout the development activity. PYTHON IDE : Anaconda Jupyter Notebook PROGRAMMING LANGUAGE : Python MODULES: DATASET
  • 30. The dataset used in this system contains the rainfall of several regions in and across the country. It contains rainfall from 1901 – 2015 for the same. Along with that annual rainfall is also been used and the rainfall between the transition of two months. There are in total 4116 rows present in the dataset. The dataset is been collected from data.gov.in. Category – Rainfall in India Released under – NDSAP Contributor – Ministry of Earth Sciences, IMD Group – Rainfall Sectors – Atmosphere science, earth sciences, science & technology. DATA CLEANING: In this module the data is cleaned. After cleaning of the data, the data is grouped as per requirement. This grouping of data is known as data clustering. Then check if there is any missing value in the data set or not. It there is some missing value then change it by any default value. After that if any data need to change its format, it is done. That total process before the prediction is known is data pre-processing. After that the data is used for the prediction and forecasting step. Data Prediction and forecasting: In this step, the pre-processed data is taken for the prediction. This prediction can be done in any process which are mentioned above. But the Linear Regression algorithm score more prediction accuracy than the other algorithm. So, in this project the linear regression method is used for the prediction. For that, the pre-processed data is splitted for the train and test purpose. Then a predictive object is created to predict the test value which is trained by the trained value. Then the object is used to forecast data for next few years. DATA SPLITTING: For each experiment, we split the entire dataset into 70% training set and 30% test set. We used the training set for resampling, hyper parameter tuning, and training the model and we used test set to test the performance of the trained model. While splitting the data, we specified a
  • 31. random seed (any random number), which ensured the same data split every time the program executed. TRAINING AND TESTING: Algorithms learn from data. They find relationships, develop understanding, make decisions, and evaluate their confidence from the training data they’re given. And the better the training data is, the better the model performs. In fact, the quality and quantity of your training data has as much to do with the success of your data project as the algorithms themselves. Now, even if you’ve stored a vast amount of well-structured data, it might not be labeled in a way that actually works for training your model. For example, autonomous vehicles don’t just need pictures of the road, they need labeled images where each car, pedestrian, street sign and more are annotated; sentiment analysis projects require labels that help an algorithm understand when someone’s using slang or sarcasm; chatbots need entity extraction and careful syntactic analysis, not just raw language. In other words, the data you want to use for training usually needs to be enriched or labeled. Or you might just need to collect more of it to power your algorithms. But chances are, the data you’ve stored isn’t quite ready to be used to train your classifiers. Because if you’re trying to make a great model, you need great training data. And we know a thing or two about that. After all, we’ve labeled over 5 billion rows of data for some of the most innovative companies in the world. Whether it’s images, text, audio, or, really, any other kind of data, we can help create the training set that makes your models successful.
  • 32. REGRESSION: Random Forest:- Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result. Put simply: random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. One big advantage of random forest is that it can be used for both classification and regression problems, which form the majority of current machine learning systems. Let's look at random forest in classification, since classification is sometimes considered the building block of machine learning. Below you can see how a random forest would look like with two trees: Random forest has nearly the same hyper parameters as a decision tree or a bagging classifier. Fortunately, there's no need to combine a decision tree with a bagging classifier because you can easily use the classifier-class of random forest. With random forest, you can also deal with regression tasks by using the algorithms repressor.
  • 33. Random forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model. Therefore, in random forest, only a random subset of the features is taken into consideration by the algorithm for splitting a node. You can even make trees more random by additionally using random thresholds for each feature rather than searching for the best possible thresholds (like a normal decision tree does). Logistic Regression: It is a classification not a regression algorithm. It is used to estimate discrete values (Binary values like 0/1, yes/no, true/false) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Hence, it is also known as logit regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected). Mathematically, the log odds of the outcome are modelled as a linear combination of the predictor variables. Odds = p/(1-p) = probability of event occurrence / probability of not event occurrence ln(odds) = ln(p/(1-p)) logit(p)=ln(p/(1-p))= b0+b1X1+b2X2+b3X3....+bkXk As we are classifying text on the basis of a wide feature set, with a binary output (true/false or true article/fake article), a logistic regression (LR) model is used, since it provides the intuitive equation to classify problems into binary or multiple classes. We performed hyperparameters tuning to get the best result for all individual datasets, while multiple parameters are tested before acquiring the maximum accuracies from LR model. CONFUSION MATRIX: It is the most commonly used evaluation metrics in predictive analysis mainly because it is very. Easy to understand and it can be used to compute other essential metrics such as accuracy, recall,
  • 34. Precision, etc. It is an NxN matrix that describes the overall performance of a model when used on some dataset, where N is the number of class labels in the classification problem. PERFORMANCE EVALUATION: ACCURACY: Though the train accuracy proved to be good and peaked 90% accurate, validation results seems not satisfying. The model has shown good results for training data than the test sample. Our model is yielding better results with the training set rather than the test set. This particular result is occurring due to overfitting of the test data. A model with no preprocessed data can cause such overfitting events to occur. Hence, at certain events the classifier can be subject to overfitting the test data. Model Loss The loss function that we considered was the binary cross entropy. When we use this function, the trained set can be viewed to be improving in the overall loss of the set but in reality, the test data suggests otherwise. The test data is actually increasing in loss when compared to trained sample. The increase in loss can be attributed to the overfitting of the data. In the above Figure A.3, as more epochs considered for the trained set, the loss decreases with the set. While, the tested set starts with a lower loss value but with the more epochs being considered, the loss of the tested set actually increases. This illustrates the drawback of the architecture and methodology that is being used. At 10 epochs, the loss both the sets are almost equal. After 10
  • 35. epochs, the loss of the training set linearly decreases while the loss of the test set gradually increases. The visual representation and the statistics both confirm the overfitting of the data sets. However, this can all be fixed when we perform normalization, preprocessing, and adding dropout layers. After adding dropout layer and normalizing feature set 2.5 SYSTEM DESIGN: Designing of system is the process in which it is used to define the interface, modules and data for a system to specified the demand to satisfy. System design is seen as the application of the system theory. The main thing of the design a system is to develop the system architecture by giving the data and information that is necessary for the implementation of a system. SYSTEM ARCHITECTURE: USECASE DIAGRAM: Use case diagrams are a way to capture the system's functionality and requirements in UML diagrams. It captures the dynamic behavior of a live system. A use case diagram consists of a use case and an actor DETAILED ARCHITECTURE FLOW
  • 36. CLASS DIAGRAM: Class diagrams are the main building block in object-oriented modeling. They are used to show the different objects in a system, their attributes, their operations and the relationships among them. The different objects are Data owner, Cloud user, Cloud admin these are the objects in this uml relationships and their properties are uploading the documents, generating key for securing the data, maintaining the cloud data s then downloading using the key and accessing the cloud data. STATE DIAGRAM:
  • 37. A state diagram, also known as a state machine diagram or state chart diagram, is an illustration of the states an object can attain as well as the transitions between those states in the Unified Modeling Language. Then, all of the possible existing states are placed in relation to the beginning and the end. ACTIVITY DIAGRAM: Activity Diagrams describe how activities are coordinated to provide a service which can be at different levels of abstraction. Typically, an event needs to be achieved by some operations, particularly where the operation is intended to achieve a number of different things that require coordination
  • 38. SEQUENCE DIAGRAM: A sequence diagram is a type of interaction diagram because it describes how and in what order a group of objects works together. These diagrams are used by software developers and business professionals to understand requirements for a new system or to document an existing process.
  • 39. DATA FLOW DIAGRAM: Data flow diagrams are used to graphically represent the flow of data in a business information system. DFD describes the processes that are involved in a system to transfer data from the input to the file storage and reports generation. Data flow diagrams can be divided into logical and physical. The logical data flow diagram describes flow of data through a system to perform certain functionality of a business. The physical data flow diagram describes the implementation of the logical data flow. STATE DIAGRAM:
  • 40. MODULES: DATASET The dataset used in this system contains the rainfall of several regions in and across the country. It contains rainfall from 1901 – 2015 for the same. Along with that annual rainfall is also been used and the rainfall between the transition of two months. There are in total 4116 rows present in the dataset. The dataset is been collected from data.gov.in. Category – Rainfall in India Released under – NDSAP Contributor – Ministry of Earth Sciences, IMD Group – Rainfall Sectors – Atmosphere science, earth sciences, science & technology. DATA CLEANING: In this module the data is cleaned. After cleaning of the data, the data is grouped as per requirement. This grouping of data is known as data clustering. Then check if there is any missing value in the data set or not. It there is some missing value then change it by any default value. After that if any data need to change its format, it is done. That total process before the prediction is known is data pre-processing. After that the data is used for the prediction and forecasting step.
  • 41. Data Prediction and forecasting: In this step, the pre-processed data is taken for the prediction. This prediction can be done in any process which are mentioned above. But the Linear Regression algorithm score more prediction accuracy than the other algorithm. So, in this project the linear regression method is used for the prediction. For that, the pre-processed data is splitted for the train and test purpose. Then a predictive object is created to predict the test value which is trained by the trained value. Then the object is used to forecast data for next few years. DATA SPLITTING: For each experiment, we split the entire dataset into 70% training set and 30% test set. We used the training set for resampling, hyper parameter tuning, and training the model and we used test set to test the performance of the trained model. While splitting the data, we specified a random seed (any random number), which ensured the same data split every time the program executed. TRAINING AND TESTING: Algorithms learn from data. They find relationships, develop understanding, make decisions, and evaluate their confidence from the training data they’re given. And the better the training data is, the better the model performs. In fact, the quality and quantity of your training data has as much to do with the success of your data project as the algorithms themselves. Now, even if you’ve stored a vast amount of well-structured data, it might not be labeled in a way that actually works for training your model. For example, autonomous vehicles don’t just need pictures of the road, they need labeled images where each car, pedestrian, street sign and more are annotated; sentiment analysis projects require labels that help an algorithm understand when someone’s using slang or sarcasm; chatbots need entity extraction and careful syntactic analysis, not just raw language.
  • 42. In other words, the data you want to use for training usually needs to be enriched or labeled. Or you might just need to collect more of it to power your algorithms. But chances are, the data you’ve stored isn’t quite ready to be used to train your classifiers. Because if you’re trying to make a great model, you need great training data. And we know a thing or two about that. After all, we’ve labeled over 5 billion rows of data for some of the most innovative companies in the world. Whether it’s images, text, audio, or, really, any other kind of data, we can help create the training set that makes your models successful. REGRESSION: Random Forest:- Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result. Put simply: random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. One big advantage of random forest is that it can be used for both classification and regression problems, which form the majority of current machine learning systems. Let's look at random forest in classification, since classification is sometimes considered the building block of machine learning. Below you can see how a random forest would look like with two trees: Random forest has nearly the same hyper parameters as a decision tree or a bagging classifier. Fortunately, no need to combine a decision tree with a bagging classifier because you can easily use the classifier-class of random forest. With random forest, you can also deal with regression tasks by using the algorithms repressor.
  • 43. Random forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model. Therefore, in random forest, only a random subset of the features is taken into consideration by the algorithm for splitting a node. You can even make trees more random by additionally using random thresholds for each feature rather than searching for the best possible thresholds (like a normal decision tree does). CONFUSION MATRIX: It is the most commonly used evaluation metrics in predictive analysis mainly because it is very. Easy to understand and it can be used to compute other essential metrics such as accuracy, recall, Precision, etc. It is an NxN matrix that describes the overall performance of a model when used on some dataset, where N is the number of class labels in the classification problem.
  • 44. PERFORMANCE EVALUATION: ACCURACY: Though the train accuracy proved to be good and peaked 90% accurate, validation results seems not satisfying. The model has shown good results for training data than the test sample. Our model is yielding better results with the training set rather than the test set. This particular result is occurring due to overfitting of the test data. A model with no preprocessed data can cause such overfitting events to occur. Hence, at certain events the classifier can be subject to overfitting the test data. Model Loss The loss function that we considered was the binary cross entropy. When we use this function, the trained set can be viewed to be improving in the overall loss of the set but in reality, the test data suggests otherwise. The test data is actually increasing in loss when compared to trained sample. The increase in loss can be attributed to the overfitting of the data. In the above Figure A.3, as more epochs considered for the trained set, the loss decreases with the set. While, the tested set starts with a lower loss value but with the more epochs being considered, the loss of the tested set actually increases. This illustrates the drawback of the architecture and methodology that is being used. At 10 epochs, the loss both the sets are almost equal. After 10 epochs, the loss of the training set linearly decreases while the loss of the test set gradually increases. The visual representation and the statistics both confirm the overfitting of the data sets. However, this can all be fixed when we perform normalization, preprocessing, and adding dropout layers. After adding dropout layer and normalizing feature set.
  • 45. CHAPTER 3 SOFTWARE SPECIFICATION 3.1 GENERAL ANACONDA It is a free and open-source distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment. Anaconda distribution comes with more than 1,500 packages as well as the Conda package and virtual environment manager. It also includes a GUI, Anaconda Navigator, as a graphical alternative to the Command Line Interface (CLI). The big difference between Conda and the pip package manager is in how package dependencies are managed, which is a significant challenge for Python data science and the reason Conda exists. Pip installs all Python package dependencies required, whether or not those conflict with other packages you installed previously. So your working installation of, for example, Google Tensorflow, can suddenly stop working when you pip install a different package that needs a different version of the Numpy library. More insidiously, everything might still appear to work but now you get different results from your data science, or you are unable to reproduce the same results elsewhere because you didn't pip install in the same order. Conda analyzes your current environment, everything you have installed, any version limitations you specify (e.g. you only want tensorflow>= 2.0) and figures out how to install compatible dependencies. Or it will tell you that what you want can't be done. Pip, by contrast, will just install the thing you wanted and any dependencies, even if that breaks other things.Open source packages can be individually installed from the Anaconda repository, Anaconda Cloud (anaconda.org), or your own private repository or mirror, using the conda install command. Anaconda Inc compiles and builds all the packages in the Anaconda repository itself, and provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit. You can also install
  • 46. anything on PyPI into a Conda environment using pip, and Conda knows what it has installed and what pip has installed. Custom packages can be made using the conda build command, and can be shared with others by uploading them to Anaconda Cloud, PyPI or other repositories.The default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.7. However, you can create new environments that include any version of Python packaged with conda. Anaconda Navigator is a desktop Graphical User Interface (GUI) included in Anaconda distribution that allows users to launch applications and manage conda packages, environments and channels without using command-line commands. Navigator can search for packages on Anaconda Cloud or in a local Anaconda Repository, install them in an environment, run the packages and update them. It is available for Windows, macOS and Linux. The following applications are available by default in Navigator:  JupyterLab  Jupyter Notebook  QtConsole  Spyder
  • 47.  Glueviz  Orange  Rstudio  Visual Studio Code Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is a language-neutral platform for writing programs that can easily and securely interoperate. There’s no language barrier with .NET: there are numerous languages available to the developer including Managed C++, C#, Visual Basic and Java Script. The .NET framework provides the foundation for components to interact seamlessly, whether locally or remotely on different platforms. It standardizes common data types and communications protocols so that components created in different languages can easily interoperate. “.NET” is also the collective name given to various software components built upon the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on). Microsoft VISUAL STUDIO is an Integrated Development Environment (IDE) from Microsoft. It is used to develop computer programs, as well as websites, web apps, web services and mobile apps.
  • 48. Python is a powerful multi-purpose programming language created by Guido van Rossum. It has simple easy-to-use syntax, making it the perfect language for someone trying to learn computer programming for the first time. Python features are:  Easy to code  Free and Open Source  Object-Oriented Language  GUI Programming Support  High-Level Language  Extensible feature  Python is Portable language  Python is Integrated language  Interpreted  Large Standard Library  Dynamically Typed Language PYTHON:  Python is a powerful multi-purpose programming language created by Guido van Rossum.  It has simple easy-to-use syntax, making it the perfect language for someone trying to learn computer programming for the first time. Features Of Python : 1.Easy to code: Python is high level programming language. Python is very easy to learn language as compared to other language like c, c#, java script, java etc. It is very easy to code in python language and anybody can learn python basic in few hours or days. It is also developer-friendly language. 2. Free and Open Source: Python language is freely available at official website and you can download it from the given download link below click on the Download Python keyword. Since, it is open-source, this means that source code is also available to the public. So you can download it as, use it as well as share it. 3.Object-Oriented Language: One of the key features of python is Object-Oriented programming. Python supports object oriented language and concepts of classes, objects encapsulation etc.
  • 49. 4. GUI Programming Support: Graphical Users interfaces can be made using a module such as PyQt5, PyQt4, wxPython or Tk in python. PyQt5 is the most popular option for creating graphical apps with Python. 5. High-Level Language: Python is a high-level language. When we write programs in python, we do not need to remember the system architecture, nor do we need to manage the memory. 6.Extensible feature: Python is a Extensible language. we can write our some python code into c or c++ language and also we can compile that code in c/c++ language. 7. Python is Portable language: Python language is also a portable language. for example, if we have python code for windows and if we want to run this code on other platform such as Linux, Unix and Mac then we do not need to change it, we can run this code on any platform. 8. Python is Integrated language: Python is also an Integrated language because we can easily integrated python with other language like c, c++ etc. 9. Interpreted Language: Python is an Interpreted Language. because python code is executed line by line at a time. like other language c, c++, java etc there is no need to compile python code this makes it easier to debug our code. The source code of python is converted into an immediate form called bytecode. 10. Large Standard Library Python has a large standard library which provides rich set of module and functions so you do not have to write your own code for every single thing.There are many libraries present in python for such as regular expressions, unit-testing, web browsers etc. 11. Dynamically Typed Language: Python is dynamically-typed language. That means the type (for example- int, double, long etc) for a variable is decided at run time not in advance.because of this feature we don’t need to specify the type of variable. APPLICATIONS OF PYTHON : WEB APPLICATIONS
  • 50.  You can create scalable Web Apps using frameworks and CMS (Content Management System) that are built on Python. Some of the popular platforms for creating Web Apps are:Django, Flask, Pyramid, Plone, Django CMS.  Sites like Mozilla, Reddit, Instagram and PBS are written in Python. SCIENTIFIC AND NUMERIC COMPUTING  There are numerous libraries available in Python for scientific and numeric computing. There are libraries like:SciPy and NumPy that are used in general purpose computing. And, there are specific libraries like: EarthPy for earth science, AstroPy for Astronomy and so on.  Also, the language is heavily used in machine learning, data mining and deep learning. CREATING SOFTWARE PROTOTYPES  Python is slow compared to compiled languages like C++ and Java. It might not be a good choice if resources are limited and efficiency is a must.  However, Python is a great language for creating prototypes. For example: You can use Pygame (library for creating games) to create your game's prototype first. If you like the prototype, you can use language like C++ to create the actual game.  GOOD LANGUAGE TO TEACH PROGRAMMING  Python is used by many companies to teach programming to kids  It is a good language with a lot of features and capabilities. Yet, it's one of the easiest language to learn because of its simple easy-to-use sy
  • 51. CHAPTER 4 IMPLEMENTATION 4.1 GENERAL Python is a program that was originally designed to simplify the implementation of numerical linear algebra routines. It has since grown into something much bigger, and it is used to implement numerical algorithms for a wide range of applications. The basic language used is very similar to standard linear algebra notation, but there are a few extensions that will likely cause you some problems at first. 4.2 CODE IMPLEMENTATION
  • 53. CHAPTER 5 CONCLUSION AND REFERENCES CONCLUSION Rainfall forecast is a daunting task for any algorithm to handle. However, the algorithm that we focused on was the Artificial Neural Networks. The reason we chose RF & LR was because of its ability to handle larger data, such as the large batch sizes that were inputted and also allows various types of data used. This was a huge benefactor in our decision of using RF & LR. The other reason was that it performed better than other algorithms when handling inconsistences in the data such as noise or incomplete data. Inconsistences can throw off the accuracy of the algorithms by an exceptional margin. However, RF & LR was capable of handling these types of data. The final results agree with our choice as RF & LR was able to yield an accuracy of 87%. The other algorithms could reach a maximum accuracy of 86%. If we consider extremely large datasets, that 1% can make quite the difference in forecasting. Through our model, we were able to prove that RF & LR are a viable model to be used in the field of weather forecasting. They can handle large data, handle inconsistences, and yield higher accuracies. RF & LR is one of the true spearheads in the domain of weather forecasting FUTURE WORK: In future research, we intend to incorporate different ensemble techniques to combine the diversities of the models and increase the forecasting ability. We are planning to take data from different regions to increase the diversity of the data set and check which model performs well with such noisy data. The architecture of the network model will be examined further to enhance the accuracy of predictions. We intend to extend our wing in understanding of neural networks by using different neural network models like Recurrent Neural network(LSTM) and Time delay
  • 54. neural network(TDNN). The accuracy of the probabilistic model like Naive Bayes will be examined. In order to do so first, we need to perform discretization. REFERENCES 1. Manojit Chattopadhyay and Surajit Chattopadhyay, "Elucidating the role of topological pattern discovery and support vector machine in generating predictive models for Indian summer monsoon rainfall", Theoretical and Applied Climatology, pp. 1-12, July 2015. 2. Kumar Abhishek, Abhay Kumar, Rajeev Ranjan and Sarthak Kumar, "A Rainfall Prediction Model using Artificial Neural Network", 2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012), pp. 82-87, 2012. 3. Minghui Qiu, Peilin Zhao, Ke Zhang, Jun Huang, Xing Shi, Xiaoguang Wang, et al., "A Short-Term Rainfall Prediction Model using Multi-Task Convolutional Neural Networks", IEEE International Conference on Data Mining, pp. 395-400, 2017. 4. S Aswin, P Geetha and R Vinayakumar, "Deep Learning Models for the Prediction of Rainfall", International Conference on Communication and Signal Processing, pp. 0657- 0661, April 3–5, 2018. 5. Xianggen Gan, Lihong Chen, Dongbao Yang and Guang Liu, "The Research Of Rainfall Prediction Models Based On Matlab Neural Network", Proceedings of IEEE CCIS 2011, pp. 45-48. 6. Cramer Sam, Michael Kampouridis, Alex A. Freitas and Antonis Alexandridis, "Predicting Rainfall in the Context of Rainfall Derivatives Using Genetic Programming", 2015 IEEE Symposium Series on Computational Intelligence, pp. 711- 718.
  • 55. 7. Mohini P. Darji, Vipul K. Dabhi and Harshadkumar B. Prajapati, "Rainfall Forecasting Using Neural Network: A Survey", 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA), pp. 706-713. 8. Sandeep Kumar Mohapatra, Anamika Upadhyay and Channabasava Gola, "Rainfall Prediction based on 100 years of Meterological Data", 2017 International Conference on Computing and Communication Technologies for smart Nation, pp. 162-166. 9. Sankhadeep Chatterjee, Bimal Datta, Soumya Sen and Nilanjan Dey, "Rainfall Prediction using Hybrid Neural Network Approach", 2018 2nd International Conference on Recent Advances in Signal Processing Telecommunications & Computing (SigTeICom), pp. 67- 72. 10. Sunil Navadia, Pintukumar Yadav, Jobin Thomas and Shakila Shaikh, "Weather Prediction: A novel approach for measuring and analyzing weather data", International conference on I-SMAC (IoT in Social Mobile Analytics and Cloud) (I-SMAC 2017), pp. 414-417.
  翻译: