SlideShare a Scribd company logo
A Malware Detection Method for Health Sensor Data
Based on Machine Learning
Abstract:
Traditional signature-based malware detection approaches are sensitive to small changes in
the malware code. Currently, most malware programs are adapted from existing programs. Hence,
they share some common patterns but have different signatures. To health sensor data, it is necessary
to identify the malware pattern rather than only detect the small changes. However, to detect these
health sensor data in malware programs timely, we propose a fast detection strategy to detect the
patterns in the code with machine learning-based approaches. In particular, XGBoost, LightGBM and
Random Forests will be exploited in order to analyze the code from health sensor dataTerabytes of
program with labels, including benign and malware programs, have been collected. The challenges of
this task are to select and get the features, modify the three models in order to train and test the
dataset, which consists of health sensor data, and evaluate the features and models. When a malware
program is detected by one model, its pattern will be broadcast to the other models, which will
prevent malware program from intrusion effectively.
Introduction:
INTRODUCTION With the advent of the Internet of Things Era, all kinds of sensors are
applied to collect health sensor data. Inevitably, some malware or malicious codes concealed in
health sensor data, which are considered as intrusion in the target host computer, are executed
according to the logic prescribed by a hacker. The categories of malicious codes in health sensor
data include computerviruses,worms,trojans,botnets,ransomware andsoon[1]. Malware attacks
can steal core data andsensitive informationanddamage computersystemsandnetworks.It is one
of the greatest threats to today's computer security [2, 3]. The method of performing malware
analysis is usually one of two types [4-7]. (1) Static analysis is usually accomplished by
demonstrating the different resources of a binary file without implementing it and studying each
component.Binaryfilescanalsobe disassembled (or redesign) using a disassembler (such as IDA).
Machine code can sometimes be interpreted into assembly code and humans can read and
understand assembly code. Malware analysts can understand assembly instructions and get an
image of what the program should execute. Some modern malware is created using ambiguous
techniquestodefeatthistype of analysis,suchasembeddinggrammatical code errors.These errors
can confuse the disassembler, but they still work in the actual execution. (2) Dynamic analysis is
performed by observing how the malware actually behaves when it runs on the host 1 This work
was supported by the Qatar National Research Fund (a member of the Qatar Foundation) under
Grant NPRP10-1205-160012. The statements made herein are solely the responsibility of the
authors. system. Modern malware can encompass a variety of ambiguous techniques that are
designedtoovercome dynamicanalysis,includingtestingvirtualenvironmentsor active debuggers,
delaying the execution of malicious payloads, or requiring some form of interactive user input
EXISTINGSYSTEM :
Basedon thissituation,anatural ideaisto applymachine learning-based methods that use
existing experience and knowledge to perform static code analysis on unknown binary code and
automaticallyclassifymalware.Accordingtothe guidance,thispaperuses the related technologies
of machine learningbasedmethodsandexploresthe applicationof thismethodinthe classification
of malware
DIS ADVANTAGE:
Must need basicknowledge to perform static code analysis on unknown binary code and
automatically classify malware
PROPOSED SYSTEM :
In this paper, we mainly focus on static code analysis. The early static code analysis
methodsmainly include feature matchingorbroad-spectrumsignature scanning. Feature matching
simplyusesfeature stringmatchingtocompletethe detection, while the broad-spectrum scanning
scans the feature code andusesmaskedbytestodivide the sections that need to be compared and
those that do not need to be compared. Since both methods need to get malware samples and
extractfeaturesbefore theycanbe detected,the hysteresis problem is serious. Furthermore, with
the developmentof malware technology,malware begins to deform in the transmission process in
order to avoid being found and killed, and there is a sudden increase in the number of malware
variants. The shape of the variants changes a lot so that it is difficult to extract a piece of code as a
malware signature.
ADVANTAGE:
simplyusesfeature stringmatchingtocompletethe detection,whilethe broad-spectrum
scanning done both comparison and un-comparsion
Datasets:
The datasets are manually download from kaggle
We use the PE header to get the first 4096 number strings of exe files as follows. Number string of
exe files.
Thisis a seriesof stringsfrom 0-255, and a label 0/1 is at the beginning. We count the number of 0-
255s in all strings and make libsvm files using them.
Libsvm files A libsvm file is a common data format in machine learning. Each line in it starts with a
label and some data such as x:y
Technology:
• Machine learning
• Deep learning
• Python packages
Programming language andpackages:
• Python
• Numpy,pandas,keras,sklearn,tkintertable,matplotlib,pillow,imutils.
• Tensorflow,opencv,nlp,nltketc
Software
• Pythonidel 3.7 version (or)
• Anaconda3.7 ( or)
• Jupiter (or)
• Google colab
Hardware
• Operatingsystem:windows,linux
• Processor: minimuminteli3
• Ram: minimum4 gb
• Harddisk: minimum250gb
Algorithm models
XGBoost,LightGBMand RandomForests
Working modules
• Dataset upload
• Pre-processing data
• Extracting dataset
• Spliting dataset traing and testing
• Applying models
Project implementation
• Gatheringthe datasetfromdatabase
• Pre-processingthe datasetandanalysisdataset
• Splittingthe datasetsintotrainingandtestinginthe ration80% XGBoost,LightGBMand
RandomForestsmodels toanalysisthe data
• Obtainthe accuracy in prediction
Input
• The inputis csv data
Output
• The outputis the accuracy inprediction
conclusion
• Withthe increasingcomplexityof malware codesconcealedinhealthsensordata,the
applicationof machine learningalgorithmsinthe detectionof maliciouscode hasbeen
increasinglyvaluedbythe academiccommunityandnumeroussecurityvendors.Basedonthe
theoryof machine learning,thispapercombinesthe advantagesof differentmodels[31-33,36-
37] anddiscussesthe staticcode analysisbasedondifferentmachine learningalgorithmsand
differentcode features.Thisworkcanprovide referential value forthe future designand
implementationof malware detectiontechnologyformachine learning[34].However,thisarea
still belongstothe developmental stage.There are still manyfuture tasksandchallengesand
theyare summarizedbelow
• 1. Lack of valuable data:A machine learningalgorithmoftenrequirestensof thousandsof data
[35] to be trainedinorder to getan effectivemodel.The acquisitionof these basicdataoften
requiresmanual operationsandthe speedcannotbe guaranteed.
• 2. Lack of interpretable results:The internal reasonisthatformanyfeatures,we onlyknowthat
theyare effective anddonotknow why.The interpretation of thisissue will be the most
importantchallenge forthe future.
Ad

More Related Content

Similar to A malware detection method for health sensor data based on machine learning (20)

Malware analysis and detection using reverse Engineering, Available at: www....
Malware analysis and detection using reverse Engineering,  Available at: www....Malware analysis and detection using reverse Engineering,  Available at: www....
Malware analysis and detection using reverse Engineering, Available at: www....
Research Publish Journals (Publisher)
 
A017660107
A017660107A017660107
A017660107
IOSR Journals
 
Novel Malware Clustering System Based on Kernel Data Structure
Novel Malware Clustering System Based on Kernel Data StructureNovel Malware Clustering System Based on Kernel Data Structure
Novel Malware Clustering System Based on Kernel Data Structure
iosrjce
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approach
ijsrd.com
 
A trust system based on multi level virus detection
A trust system based on multi level virus detectionA trust system based on multi level virus detection
A trust system based on multi level virus detection
UltraUploader
 
What Are The Types of Malware? Must Read
What Are The Types of Malware? Must ReadWhat Are The Types of Malware? Must Read
What Are The Types of Malware? Must Read
Bytecode Security
 
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSA STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
ijaia
 
Antimalware
AntimalwareAntimalware
Antimalware
Mayank Chaudhari
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Sandeep Kumar Seeram
 
Malware Detection Using Data Mining Techniques
Malware Detection Using Data Mining Techniques Malware Detection Using Data Mining Techniques
Malware Detection Using Data Mining Techniques
Akash Karwande
 
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
IJET - International Journal of Engineering and Techniques
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulators
UltraUploader
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
Prashant Chopra
 
Features for Detecting Malware on Computing Environments
Features for Detecting Malware on Computing EnvironmentsFeatures for Detecting Malware on Computing Environments
Features for Detecting Malware on Computing Environments
IJEACS
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using ML
Siva krishnam raju Patsamatla
 
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptxMALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
MogilicharlaPavanKal
 
Utilization Data Mining to Detect Spyware
Utilization Data Mining to Detect Spyware Utilization Data Mining to Detect Spyware
Utilization Data Mining to Detect Spyware
IOSR Journals
 
A0430104
A0430104A0430104
A0430104
IOSR Journals
 
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET Journal
 
Survey on Malware Detection Techniques
Survey on Malware Detection TechniquesSurvey on Malware Detection Techniques
Survey on Malware Detection Techniques
Editor IJMTER
 
Malware analysis and detection using reverse Engineering, Available at: www....
Malware analysis and detection using reverse Engineering,  Available at: www....Malware analysis and detection using reverse Engineering,  Available at: www....
Malware analysis and detection using reverse Engineering, Available at: www....
Research Publish Journals (Publisher)
 
Novel Malware Clustering System Based on Kernel Data Structure
Novel Malware Clustering System Based on Kernel Data StructureNovel Malware Clustering System Based on Kernel Data Structure
Novel Malware Clustering System Based on Kernel Data Structure
iosrjce
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approach
ijsrd.com
 
A trust system based on multi level virus detection
A trust system based on multi level virus detectionA trust system based on multi level virus detection
A trust system based on multi level virus detection
UltraUploader
 
What Are The Types of Malware? Must Read
What Are The Types of Malware? Must ReadWhat Are The Types of Malware? Must Read
What Are The Types of Malware? Must Read
Bytecode Security
 
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSA STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
ijaia
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Sandeep Kumar Seeram
 
Malware Detection Using Data Mining Techniques
Malware Detection Using Data Mining Techniques Malware Detection Using Data Mining Techniques
Malware Detection Using Data Mining Techniques
Akash Karwande
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulators
UltraUploader
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
Prashant Chopra
 
Features for Detecting Malware on Computing Environments
Features for Detecting Malware on Computing EnvironmentsFeatures for Detecting Malware on Computing Environments
Features for Detecting Malware on Computing Environments
IJEACS
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using ML
Siva krishnam raju Patsamatla
 
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptxMALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
MogilicharlaPavanKal
 
Utilization Data Mining to Detect Spyware
Utilization Data Mining to Detect Spyware Utilization Data Mining to Detect Spyware
Utilization Data Mining to Detect Spyware
IOSR Journals
 
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET Journal
 
Survey on Malware Detection Techniques
Survey on Malware Detection TechniquesSurvey on Malware Detection Techniques
Survey on Malware Detection Techniques
Editor IJMTER
 

Recently uploaded (13)

How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
ProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptxProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptx
OlenaKotovska
 
Paper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdfPaper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdf
Steven McGee
 
Cloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptxCloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptx
marketing140789
 
Breaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdfBreaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdf
Internet Bundle Now
 
plataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdfplataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdf
valdiviesovaleriamis
 
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCONJava developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Jago de Vreede
 
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdfGiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
Giacomo Vacca
 
introduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.pptintroduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.ppt
SherifElGohary7
 
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
werhkr1
 
IoT PPT introduction to internet of things
IoT PPT introduction to internet of thingsIoT PPT introduction to internet of things
IoT PPT introduction to internet of things
VaishnaviPatil3995
 
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness GuideThe Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
russellpeter1995
 
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and MonitoringPresentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
mdaoudi
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
ProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptxProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptx
OlenaKotovska
 
Paper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdfPaper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdf
Steven McGee
 
Cloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptxCloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptx
marketing140789
 
Breaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdfBreaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdf
Internet Bundle Now
 
plataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdfplataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdf
valdiviesovaleriamis
 
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCONJava developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Jago de Vreede
 
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdfGiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
Giacomo Vacca
 
introduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.pptintroduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.ppt
SherifElGohary7
 
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
werhkr1
 
IoT PPT introduction to internet of things
IoT PPT introduction to internet of thingsIoT PPT introduction to internet of things
IoT PPT introduction to internet of things
VaishnaviPatil3995
 
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness GuideThe Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
russellpeter1995
 
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and MonitoringPresentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
mdaoudi
 
Ad

A malware detection method for health sensor data based on machine learning

  • 1. A Malware Detection Method for Health Sensor Data Based on Machine Learning Abstract: Traditional signature-based malware detection approaches are sensitive to small changes in the malware code. Currently, most malware programs are adapted from existing programs. Hence, they share some common patterns but have different signatures. To health sensor data, it is necessary to identify the malware pattern rather than only detect the small changes. However, to detect these health sensor data in malware programs timely, we propose a fast detection strategy to detect the patterns in the code with machine learning-based approaches. In particular, XGBoost, LightGBM and Random Forests will be exploited in order to analyze the code from health sensor dataTerabytes of program with labels, including benign and malware programs, have been collected. The challenges of this task are to select and get the features, modify the three models in order to train and test the dataset, which consists of health sensor data, and evaluate the features and models. When a malware program is detected by one model, its pattern will be broadcast to the other models, which will prevent malware program from intrusion effectively. Introduction: INTRODUCTION With the advent of the Internet of Things Era, all kinds of sensors are applied to collect health sensor data. Inevitably, some malware or malicious codes concealed in health sensor data, which are considered as intrusion in the target host computer, are executed according to the logic prescribed by a hacker. The categories of malicious codes in health sensor data include computerviruses,worms,trojans,botnets,ransomware andsoon[1]. Malware attacks can steal core data andsensitive informationanddamage computersystemsandnetworks.It is one of the greatest threats to today's computer security [2, 3]. The method of performing malware analysis is usually one of two types [4-7]. (1) Static analysis is usually accomplished by demonstrating the different resources of a binary file without implementing it and studying each component.Binaryfilescanalsobe disassembled (or redesign) using a disassembler (such as IDA). Machine code can sometimes be interpreted into assembly code and humans can read and understand assembly code. Malware analysts can understand assembly instructions and get an image of what the program should execute. Some modern malware is created using ambiguous techniquestodefeatthistype of analysis,suchasembeddinggrammatical code errors.These errors can confuse the disassembler, but they still work in the actual execution. (2) Dynamic analysis is performed by observing how the malware actually behaves when it runs on the host 1 This work was supported by the Qatar National Research Fund (a member of the Qatar Foundation) under Grant NPRP10-1205-160012. The statements made herein are solely the responsibility of the authors. system. Modern malware can encompass a variety of ambiguous techniques that are
  • 2. designedtoovercome dynamicanalysis,includingtestingvirtualenvironmentsor active debuggers, delaying the execution of malicious payloads, or requiring some form of interactive user input EXISTINGSYSTEM : Basedon thissituation,anatural ideaisto applymachine learning-based methods that use existing experience and knowledge to perform static code analysis on unknown binary code and automaticallyclassifymalware.Accordingtothe guidance,thispaperuses the related technologies of machine learningbasedmethodsandexploresthe applicationof thismethodinthe classification of malware DIS ADVANTAGE: Must need basicknowledge to perform static code analysis on unknown binary code and automatically classify malware PROPOSED SYSTEM : In this paper, we mainly focus on static code analysis. The early static code analysis methodsmainly include feature matchingorbroad-spectrumsignature scanning. Feature matching simplyusesfeature stringmatchingtocompletethe detection, while the broad-spectrum scanning scans the feature code andusesmaskedbytestodivide the sections that need to be compared and those that do not need to be compared. Since both methods need to get malware samples and extractfeaturesbefore theycanbe detected,the hysteresis problem is serious. Furthermore, with the developmentof malware technology,malware begins to deform in the transmission process in order to avoid being found and killed, and there is a sudden increase in the number of malware variants. The shape of the variants changes a lot so that it is difficult to extract a piece of code as a malware signature. ADVANTAGE: simplyusesfeature stringmatchingtocompletethe detection,whilethe broad-spectrum scanning done both comparison and un-comparsion Datasets: The datasets are manually download from kaggle We use the PE header to get the first 4096 number strings of exe files as follows. Number string of exe files. Thisis a seriesof stringsfrom 0-255, and a label 0/1 is at the beginning. We count the number of 0- 255s in all strings and make libsvm files using them. Libsvm files A libsvm file is a common data format in machine learning. Each line in it starts with a label and some data such as x:y
  • 3. Technology: • Machine learning • Deep learning • Python packages Programming language andpackages: • Python • Numpy,pandas,keras,sklearn,tkintertable,matplotlib,pillow,imutils. • Tensorflow,opencv,nlp,nltketc Software • Pythonidel 3.7 version (or) • Anaconda3.7 ( or) • Jupiter (or) • Google colab Hardware • Operatingsystem:windows,linux • Processor: minimuminteli3 • Ram: minimum4 gb • Harddisk: minimum250gb Algorithm models XGBoost,LightGBMand RandomForests Working modules • Dataset upload • Pre-processing data
  • 4. • Extracting dataset • Spliting dataset traing and testing • Applying models Project implementation • Gatheringthe datasetfromdatabase • Pre-processingthe datasetandanalysisdataset • Splittingthe datasetsintotrainingandtestinginthe ration80% XGBoost,LightGBMand RandomForestsmodels toanalysisthe data • Obtainthe accuracy in prediction Input • The inputis csv data Output • The outputis the accuracy inprediction conclusion • Withthe increasingcomplexityof malware codesconcealedinhealthsensordata,the applicationof machine learningalgorithmsinthe detectionof maliciouscode hasbeen increasinglyvaluedbythe academiccommunityandnumeroussecurityvendors.Basedonthe theoryof machine learning,thispapercombinesthe advantagesof differentmodels[31-33,36- 37] anddiscussesthe staticcode analysisbasedondifferentmachine learningalgorithmsand differentcode features.Thisworkcanprovide referential value forthe future designand implementationof malware detectiontechnologyformachine learning[34].However,thisarea still belongstothe developmental stage.There are still manyfuture tasksandchallengesand theyare summarizedbelow • 1. Lack of valuable data:A machine learningalgorithmoftenrequirestensof thousandsof data [35] to be trainedinorder to getan effectivemodel.The acquisitionof these basicdataoften requiresmanual operationsandthe speedcannotbe guaranteed.
  • 5. • 2. Lack of interpretable results:The internal reasonisthatformanyfeatures,we onlyknowthat theyare effective anddonotknow why.The interpretation of thisissue will be the most importantchallenge forthe future.
  翻译: