SlideShare a Scribd company logo
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27
Vulnerability Detection
Based on Git History
Agenda
Introduction
Introduction
Research
Background
The Trend of Security Incidents
Key facts. Why this research is important:
In Quantity
# of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD]
In Quality
• Equifax exposed 143M consumers’ data due to website application
vulnerability (2017)
• Yahoo breached 3B users’ account information (2013)
The Century of Vulnerability
# OF VULNERABILITIES
As information technology is broadly
adopted, the impact of security
incidents is getting extensive and
critical.
Introduction To Help Code Reviewer
We know how to deliver software in proper quality. Code review!
Best Practice is Well-Known
Review patches before release and fix bugs before deployment. Still, however,
even the famous OSS projects struggle with the lack of code reviewers.
A Trade-Off of Automation Techniques
Software projects widely adopt a variety of automation approaches. Vulnerability
detection techniques faces a contradictory:
• (a) High precision. Useless if the tool outputs a billions of false positives.
• (b) Adaptability. No one wants to make efforts only for ensuring security such
as annotating unsafe user inputs.
Research
Background
# Example of taint annotation
int printf(/*@untainted@*/ char *fmt,
...);
Git is somewhat difficult.
No worries, it’s not only you!
WHAT’S GIT?
“(Git is) expressly designed
to make you feel
less intelligent than you
thought you were”
– Andrew Morton
The Greatness of Git -
www.linuxfoundation.org
Introduction
What’s Git?
But Git is Always Stay With You
Trust me, or try this command on your terminal:
# List up how much you rely on Git
history | awk '{ print $2 }' | sort | 
uniq -c | sort -r | head
Introduction
What’s Git?
Git for Machine Learning
Git provides what machine learning requires; good data:
• Adopted by 69.2% of 30K developers [StackOverfow]
• Trusted by most prominent OSS projects such as Linux
Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and
Apache HTTPD.
Introduction
What’s Git?
CVE-ID and Security Fix on Git
A sufficient number of reliable security fixes:
• Refers CVE-IDs in their commit message
• Or, fixed commits are referred by CVE database
Introduction
What’s Git?
A Brief Introduction of Git Features
Agenda
Methodology
A static analysis to detect
suspicious vulnerabilities based
on Git history.
METHODOLOGY - HVD
Methodology Proposal Approach
Concept
• This research proposes the approach which aims to
reduce the false positive rate compared to VCCFinder
[Perl et al] without sacrificing adaptability.
• The data source is the same to VCCFinder but this
approach takes account of added-lines and removed-
lines in patch feature while VCCFinder doesn’t.
Methodology VCCFinder: a Novel Approach
Concept
Generally, it’s hard to apply machine learning to source code
because most high-level programming languages such as
C/C++ are less redundant compared to natural languages
and assembly languages. To address this difficulty, Perl et
al.:
• Narrowed down the problem to the quantifiable lemma.
The quality of source code can be hardly quantified but
vulnerability can be expressed as 0 or 1.
• Leveraged the legacies. CVE database and the prominent
OSS projects.
“I really never wanted to do
source control
management at all and felt
that it was just about the
least interesting thing in
the computing world”
– Linus Torvalds
10 Years of Git -
www.linuxfoundation.org
Methodology Overall Architecture
Concept
Methodology Abbreviations
Terms
• HVD: History-based Vulnerability Detector
• VCC: Vulnerability-Contributing Commit(s). Changes
containing vulnerability
• UC: Unclassified Changes
• LT-S: Line type sensitive. The HVD approach
• LT-I: Line type insensitive. The replication of VCCFinder
Methodology Exploit vs Vulnerability
Terms
Potential
vulnerability
Vulnerability
Exploit
(malicious input)
Agenda
Evaluation
351,452
commits in total
Evaluation Dataset Provided by Perl et al.
Experiment
• This dataset contains commits labelled by VCC and UC and associated with
their CVE-IDs.
• It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories
implemented in C/C++.
• The number of unique tokens counts 170k.
• Compressed size is 525mb (npz).
Evaluation Implementation in Python
Experiment
To make the experiment reliable, I adopted a variety of libraries including:
• Numpy
• SciPy
• Scikit-learn
• Unidiff
LT-I: note that the reproducibility is limited since the source of VCCFinder is not
publicly available.
Evaluation Environment Specs
Experiment
The computation was performed at the one of CX250 Cluster (MPC):
• CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2
• Memory: 64GB (4GB DDR3-1866 ECC x16)
Evaluation Precision Improvement
• LT-S improved the AUC (area under curve) of its precision-recall curve by
18.8% from LT-I.
Precision
Evaluation Trade-off
• Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s)
• Note: the vast majority of the processing time is occupied by learning phase.
In the practical use case, the learnt model is dumped and shared with future
predictions for a while once calculated. Then, it takes a few seconds to parse a
given unknown commit and perform prediction by using the shared model.
Hence, the execution time of learning phase should not influence the
development process.
Precision
Evaluation The most contributing features
Effective Features
To gain more profound insights from the
experiment, this study also reveals that
valuables consisting of words related to
computer resource most significantly
contributed to the classification model.
For instance:
• (RAM) structors: memory allocation with
complex structures
• (RAM) vmalloc: virtual memory allocation
• (CPU) skbuf_head: a spin-lock of threads
• (network) tso: TCP Segmentation Offload
• (network) if_ether: a flag of Ethernet
availability
Evaluation Findings & insights
Effective Features
Findings:
• The valuable tokens which are relevant to computer resources such as CPU,
memory, and network
• The figure also shows most contributing valuables are added-tokens.
Insights:
• These findings do not surprise us because it’s obvious that vulnerability occurs
correlating closely with side effects with computer resource management and
adding code.
• However, it’s worth verifying that automatic detection approach makes no
difference with the experiential intuition of human.
Agenda
Conclusion
Despite the difficulty that the features acquirable via Git are limited, this study shows LT-
S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing
the original advantages:
• (a) Scalability
• (b) Generality
• (c) Explainability
CONCLUSION
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05
Thank you!
Questions & discussion
Ad

More Related Content

What's hot (20)

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
Kevin Brockhoff
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
Priyanka Aash
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE - ATT&CKcon
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
Lionel Briand
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security
Scott Raynovich
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
Lionel Briand
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
Amuhinda Hungai
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
Abhik Roychoudhury
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
TestingCR
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
DevOps.com
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes Manifests
Akond Rahman
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
RAKESH RANA
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDS
Alfredo Hickman
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
Akond Rahman
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security
David Jorm
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
Open Networking Perú (Opennetsoft)
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
Kevin Brockhoff
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
Priyanka Aash
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE - ATT&CKcon
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
Lionel Briand
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security
Scott Raynovich
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
Lionel Briand
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
Amuhinda Hungai
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
Abhik Roychoudhury
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
TestingCR
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
DevOps.com
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes Manifests
Akond Rahman
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
RAKESH RANA
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDS
Alfredo Hickman
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
Akond Rahman
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security
David Jorm
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
Open Networking Perú (Opennetsoft)
 

Similar to Vulnerability Detection Based on Git History (20)

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
sedukull
 
DevSecOps - Background, Status and Future Challenges
DevSecOps - Background, Status and Future ChallengesDevSecOps - Background, Status and Future Challenges
DevSecOps - Background, Status and Future Challenges
dsc71656
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
Matt Lucas
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
1506.08725v1
1506.08725v11506.08725v1
1506.08725v1
Sandeep Sivanandan
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Ryan Hodgin
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
Ashley Zupkus
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
VMware Tanzu
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
Alexandre (Shura) Iline
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
Aarno Aukia
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
DevOps.com
 
Network Security ffffffffffffffffffffffffff
Network Security ffffffffffffffffffffffffffNetwork Security ffffffffffffffffffffffffff
Network Security ffffffffffffffffffffffffff
simonlaurette1
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420
Steve Goeringer
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)
Dinis Cruz
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
SonjaChevre
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?
Reuven Harrison
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
RISC-V International
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
NGINX, Inc.
 
Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
sedukull
 
DevSecOps - Background, Status and Future Challenges
DevSecOps - Background, Status and Future ChallengesDevSecOps - Background, Status and Future Challenges
DevSecOps - Background, Status and Future Challenges
dsc71656
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
Matt Lucas
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Ryan Hodgin
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
Ashley Zupkus
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
VMware Tanzu
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
Aarno Aukia
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
DevOps.com
 
Network Security ffffffffffffffffffffffffff
Network Security ffffffffffffffffffffffffffNetwork Security ffffffffffffffffffffffffff
Network Security ffffffffffffffffffffffffff
simonlaurette1
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420
Steve Goeringer
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)
Dinis Cruz
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
SonjaChevre
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?
Reuven Harrison
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
RISC-V International
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
NGINX, Inc.
 
Ad

More from Kenta Yamamoto (10)

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)
Kenta Yamamoto
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
Kenta Yamamoto
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティ
Kenta Yamamoto
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計する
Kenta Yamamoto
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか
Kenta Yamamoto
 
Tips for bash script
Tips for bash scriptTips for bash script
Tips for bash script
Kenta Yamamoto
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則
Kenta Yamamoto
 
20110805 ui14課題2
20110805 ui14課題220110805 ui14課題2
20110805 ui14課題2
Kenta Yamamoto
 
20110804 ui14課題
20110804 ui14課題20110804 ui14課題
20110804 ui14課題
Kenta Yamamoto
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3
Kenta Yamamoto
 
The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)
Kenta Yamamoto
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
Kenta Yamamoto
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティ
Kenta Yamamoto
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計する
Kenta Yamamoto
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか
Kenta Yamamoto
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則
Kenta Yamamoto
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3
Kenta Yamamoto
 
Ad

Recently uploaded (20)

How to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryErrorHow to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryError
Tier1 app
 
Digital Twins Software Service in Belfast
Digital Twins Software Service in BelfastDigital Twins Software Service in Belfast
Digital Twins Software Service in Belfast
julia smits
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
GC Tuning: A Masterpiece in Performance Engineering
GC Tuning: A Masterpiece in Performance EngineeringGC Tuning: A Masterpiece in Performance Engineering
GC Tuning: A Masterpiece in Performance Engineering
Tier1 app
 
The-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptx
The-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptxThe-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptx
The-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptx
james brownuae
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
How to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber PluginHow to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber Plugin
eGrabber
 
sequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineeringsequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineering
aashrithakondapalli8
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
Adobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREEAdobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREE
zafranwaqar90
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
Reinventing Microservices Efficiency and Innovation with Single-Runtime
Reinventing Microservices Efficiency and Innovation with Single-RuntimeReinventing Microservices Efficiency and Innovation with Single-Runtime
Reinventing Microservices Efficiency and Innovation with Single-Runtime
Natan Silnitsky
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Top 12 Most Useful AngularJS Development Tools to Use in 2025
Top 12 Most Useful AngularJS Development Tools to Use in 2025Top 12 Most Useful AngularJS Development Tools to Use in 2025
Top 12 Most Useful AngularJS Development Tools to Use in 2025
GrapesTech Solutions
 
How to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryErrorHow to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryError
Tier1 app
 
Digital Twins Software Service in Belfast
Digital Twins Software Service in BelfastDigital Twins Software Service in Belfast
Digital Twins Software Service in Belfast
julia smits
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
GC Tuning: A Masterpiece in Performance Engineering
GC Tuning: A Masterpiece in Performance EngineeringGC Tuning: A Masterpiece in Performance Engineering
GC Tuning: A Masterpiece in Performance Engineering
Tier1 app
 
The-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptx
The-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptxThe-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptx
The-Future-is-Hybrid-Exploring-Azure’s-Role-in-Multi-Cloud-Strategies.pptx
james brownuae
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
How to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber PluginHow to Install and Activate ListGrabber Plugin
How to Install and Activate ListGrabber Plugin
eGrabber
 
sequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineeringsequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineering
aashrithakondapalli8
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
Adobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREEAdobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREE
zafranwaqar90
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
Reinventing Microservices Efficiency and Innovation with Single-Runtime
Reinventing Microservices Efficiency and Innovation with Single-RuntimeReinventing Microservices Efficiency and Innovation with Single-Runtime
Reinventing Microservices Efficiency and Innovation with Single-Runtime
Natan Silnitsky
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Top 12 Most Useful AngularJS Development Tools to Use in 2025
Top 12 Most Useful AngularJS Development Tools to Use in 2025Top 12 Most Useful AngularJS Development Tools to Use in 2025
Top 12 Most Useful AngularJS Development Tools to Use in 2025
GrapesTech Solutions
 

Vulnerability Detection Based on Git History

  • 1. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27 Vulnerability Detection Based on Git History
  • 3. Introduction Research Background The Trend of Security Incidents Key facts. Why this research is important: In Quantity # of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD] In Quality • Equifax exposed 143M consumers’ data due to website application vulnerability (2017) • Yahoo breached 3B users’ account information (2013)
  • 4. The Century of Vulnerability # OF VULNERABILITIES As information technology is broadly adopted, the impact of security incidents is getting extensive and critical.
  • 5. Introduction To Help Code Reviewer We know how to deliver software in proper quality. Code review! Best Practice is Well-Known Review patches before release and fix bugs before deployment. Still, however, even the famous OSS projects struggle with the lack of code reviewers. A Trade-Off of Automation Techniques Software projects widely adopt a variety of automation approaches. Vulnerability detection techniques faces a contradictory: • (a) High precision. Useless if the tool outputs a billions of false positives. • (b) Adaptability. No one wants to make efforts only for ensuring security such as annotating unsafe user inputs. Research Background # Example of taint annotation int printf(/*@untainted@*/ char *fmt, ...);
  • 6. Git is somewhat difficult. No worries, it’s not only you! WHAT’S GIT?
  • 7. “(Git is) expressly designed to make you feel less intelligent than you thought you were” – Andrew Morton The Greatness of Git - www.linuxfoundation.org
  • 8. Introduction What’s Git? But Git is Always Stay With You Trust me, or try this command on your terminal: # List up how much you rely on Git history | awk '{ print $2 }' | sort | uniq -c | sort -r | head
  • 9. Introduction What’s Git? Git for Machine Learning Git provides what machine learning requires; good data: • Adopted by 69.2% of 30K developers [StackOverfow] • Trusted by most prominent OSS projects such as Linux Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and Apache HTTPD.
  • 10. Introduction What’s Git? CVE-ID and Security Fix on Git A sufficient number of reliable security fixes: • Refers CVE-IDs in their commit message • Or, fixed commits are referred by CVE database
  • 11. Introduction What’s Git? A Brief Introduction of Git Features
  • 13. A static analysis to detect suspicious vulnerabilities based on Git history. METHODOLOGY - HVD
  • 14. Methodology Proposal Approach Concept • This research proposes the approach which aims to reduce the false positive rate compared to VCCFinder [Perl et al] without sacrificing adaptability. • The data source is the same to VCCFinder but this approach takes account of added-lines and removed- lines in patch feature while VCCFinder doesn’t.
  • 15. Methodology VCCFinder: a Novel Approach Concept Generally, it’s hard to apply machine learning to source code because most high-level programming languages such as C/C++ are less redundant compared to natural languages and assembly languages. To address this difficulty, Perl et al.: • Narrowed down the problem to the quantifiable lemma. The quality of source code can be hardly quantified but vulnerability can be expressed as 0 or 1. • Leveraged the legacies. CVE database and the prominent OSS projects.
  • 16. “I really never wanted to do source control management at all and felt that it was just about the least interesting thing in the computing world” – Linus Torvalds 10 Years of Git - www.linuxfoundation.org
  • 18. Methodology Abbreviations Terms • HVD: History-based Vulnerability Detector • VCC: Vulnerability-Contributing Commit(s). Changes containing vulnerability • UC: Unclassified Changes • LT-S: Line type sensitive. The HVD approach • LT-I: Line type insensitive. The replication of VCCFinder
  • 19. Methodology Exploit vs Vulnerability Terms Potential vulnerability Vulnerability Exploit (malicious input)
  • 22. Evaluation Dataset Provided by Perl et al. Experiment • This dataset contains commits labelled by VCC and UC and associated with their CVE-IDs. • It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories implemented in C/C++. • The number of unique tokens counts 170k. • Compressed size is 525mb (npz).
  • 23. Evaluation Implementation in Python Experiment To make the experiment reliable, I adopted a variety of libraries including: • Numpy • SciPy • Scikit-learn • Unidiff LT-I: note that the reproducibility is limited since the source of VCCFinder is not publicly available.
  • 24. Evaluation Environment Specs Experiment The computation was performed at the one of CX250 Cluster (MPC): • CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2 • Memory: 64GB (4GB DDR3-1866 ECC x16)
  • 25. Evaluation Precision Improvement • LT-S improved the AUC (area under curve) of its precision-recall curve by 18.8% from LT-I. Precision
  • 26. Evaluation Trade-off • Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s) • Note: the vast majority of the processing time is occupied by learning phase. In the practical use case, the learnt model is dumped and shared with future predictions for a while once calculated. Then, it takes a few seconds to parse a given unknown commit and perform prediction by using the shared model. Hence, the execution time of learning phase should not influence the development process. Precision
  • 27. Evaluation The most contributing features Effective Features To gain more profound insights from the experiment, this study also reveals that valuables consisting of words related to computer resource most significantly contributed to the classification model. For instance: • (RAM) structors: memory allocation with complex structures • (RAM) vmalloc: virtual memory allocation • (CPU) skbuf_head: a spin-lock of threads • (network) tso: TCP Segmentation Offload • (network) if_ether: a flag of Ethernet availability
  • 28. Evaluation Findings & insights Effective Features Findings: • The valuable tokens which are relevant to computer resources such as CPU, memory, and network • The figure also shows most contributing valuables are added-tokens. Insights: • These findings do not surprise us because it’s obvious that vulnerability occurs correlating closely with side effects with computer resource management and adding code. • However, it’s worth verifying that automatic detection approach makes no difference with the experiential intuition of human.
  • 30. Despite the difficulty that the features acquirable via Git are limited, this study shows LT- S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing the original advantages: • (a) Scalability • (b) Generality • (c) Explainability CONCLUSION
  • 31. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05 Thank you! Questions & discussion
  翻译: