Confusion Matrix And Cyber Crime
What is Confusion Matrix?
When we get the data, after data cleaning, pre-processing, and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance, and that is exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.
Well, it is a performance measurement for a machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.
It is extremely useful for measuring Recall, Precision, Specificity, Accuracy, and most importantly AUC-ROC curves.
Four Outcomes of the Confusion Matrix
The confusion matrix visualizes the accuracy of a classifier by comparing the actual and predicted classes. The binary confusion matrix is composed of squares:
- TP: True Positive: Predicted values correctly predicted as actual positive. You projected positive and it turns out to be true. For example, you had predicted that France would win the world cup, and it won.
- FP: Predicted values incorrectly predicted an actual positive. i.e., Negative values predicted as positive. It is also known as Type 1 error. Your prediction is positive, and it is false. You had predicted that England would win, but it lost.
- FN: False Negative: Positive values predicted as negative. It is also known as Type 2 error. Your prediction is negative, and the result in it is also false. You had predicted that France would not win, but it won.
- TN: True Negative: Predicted values correctly predicted as an actual negative. When you predicted negative, and it's true. You had predicted that England would not win and it lost.
What is Cyber Crime?
Cybercrime is a criminal activity that either targets or uses a computer, a computer network, or a networked device. Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations. Some cybercriminals are organized, use advanced techniques, and are highly technically skilled. Others are novice hackers.
Types of cybercrime
Here are some specific examples of the different types of cybercrime:
- Email and internet fraud.
- Identity fraud (where personal information is stolen and used).
- Theft of financial or card payment data.
- Theft and sale of corporate data.
- Cyberextortion (demanding money to prevent a threatened attack).
- Ransomware attacks (a type of cyber extortion).
- Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
- Cyberespionage (where hackers access government or company data).
Cyber Attack Detection and Classification using Parallel Support Vector Machine
Support Vector Machines (SVM) are the classifiers that were originally designed for binary c1assification. The c1assificatioin applications can solve multi-class problems. The result shows that pSVM gives more detection accuracy for classes and comparable to the false alarm rate.
Cyberattack detection is a classification problem, in which we classify the normal pattern from the abnormal pattern (attack) of the system.
The SDF is a very powerful and popular data mining algorithm for decision-making and classification problems. It has been using in many real-life applications like medical diagnosis, radar signal classification, weather prediction, credit approval, and fraud detection, etc.
A parallel Support Vector Machine (pSVM) algorithm was proposed for the detection and classification of cyber attack datasets.
The performance of the support vector machine is greatly dependent on the kernel function used by SVM. Therefore, we modified the Gaussian kernel function in a data-dependent way in order to improve the efficiency of the classifiers. The relative results of both the classifiers are also obtained to ascertain the theoretical aspects. The analysis is also taken up to show that PSVM performs better than SDF.
The classification accuracy of PSVM remarkably improve (accuracy for Normal class as well as DOS class is almost 100%) and comparable to false alarm rate and training, testing times.