SlideShare a Scribd company logo
TOPOLOGY FOR DATA
SCIENCE: MORSETHEORY
AND APPLICATION
Colleen M. Farrelly
Level Sets in Everyday Life
• Front maps partition weather patterns by areas
of the same pressure (isobars).
• Elevation maps partition land areas by height
above/below sea level.
Level Sets of Functions
• Continuous functions have defined
local and global peaks, valleys, and
passes.
• Define height “slices” to partition
function.
• Akin to a cheese grater scraping off
layers of a cheese block.
• In the example, the blue lines slice a
sine wave into pieces of similar height.
• Function on discrete date (points) can
be partitioned into level sets, too.
Level Sets to Critical Points
• Continuous functions:
• Can be decomposed with level sets.
• Contain local optima (critical points).
• Maxima (peaks)
• Minima (valleys)
• Saddle points (inflections/height change)
• Continuous functions can live in
higher-dimensional spaces with more
complicated critical points.
Degenerate and Non-DegenerateOptima
• Morse functions have stable and isolated local
optima (non-degenerate critical points).
• Related to 1st and 2nd derivatives of function.
• Don’t change with small shifts to the function.
• Technically, related to Hessian being
defined/undefined at the critical point.
• Reflects neighborhood behavior around the
critical point.
1. Non-degenerate critical points have defined
behavior in the critical point’s neighborhood.
2. Degenerate points have undefined behavior
near the critical point.
f’=0
f’=0
f’’(x)<0
f’’(x)>0
f’’(x)=0
Morse Function Definition
1. None of the function’s critical points
are degenerate.
2. None of the critical points share the
same value.
• These properties allow a map between a
function’s critical point values to a space
of level sets (left).
• All critical values map to values in the level
set collection.
• Function can be plotted nicely to
summarize its peaks, valleys, and in-
between spaces.
1
0
-1
Level Set
Critical
Point
Map
Discrete Extensions to DataAnalysis
• Morse functions can be extended to
discrete spaces.
• Data lives in a discrete point cloud.
• Topological spaces, called simplicial
complexes, can be built from these.
• Several algorithms exist to connect
points to each other via shared
neighborhoods.
• Vietoris-Rips complexes are built from
connecting points with d distance from
each other.
• Any metric distance can be used.
• Process turns data into a topological space
upon which a Morse function can be
defined.
2-d neighborhoods are
defined by Euclidean
distance.
Points within a given
circle are mutually
connected, forming a
simplex.
Example
simplicial
complex
Morse-Smale Clustering
• Partition space between minima and
maxima of function by flow.
• Example:
• The truncated sine wave shown has 2
minima and 2 maxima shown (dots).
• Pieces between local minima and maxima
define regions of the function.
1. Yellow
2. Blue
3. Red
• Higher-dimensional spaces can be
simplified by this partitioning.
• Can be used to cluster data.
• Subgroups can then be compared across
characteristics using statistical tests (t-
test, Chi square…).
Cluster 1
Cluster 2
Cluster 3
Intuitive 2-Dimensional Example
• Imagine a soccer player kicking a ball on the ground of a hilly field.
• The high and low points determine where the ball will come to rest.
• These paths of the ball define which parts of the field share common hills and
valleys.
• These paths are actually gradient paths defined by height on the field’s topological
space.
• The spaces they define are the Morse-Smale complex of the field, partitioning it
into different regions (clusters).
Algorithms that compute
Morse-Smale complexes
typically follow this intuition.
Morse-Smale Regression
• Type of piece-wise regression.
• Fit regression model to partitions
found by Morse-Smale
decompositions of a space given a
Morse function.
• Regression models include:
• Linear and generalized linear models
• Machine learning models
• Random forest
• Elastic net
• Boosted regression
• Neural/deep networks
• Can examine group-wise differences
in regression models.
Example: 2 groups,
3 predictors
Reeb Graphs
• Track evolution of level sets
through critical points of a
Morse function.
• Partition space according to a
function (left by height).
• Plot critical points entering
model.
• Track until they are subsumed
into another partition.
• Useful in image analytics and
shape comparison.
Persistent Homology
• Filtration of simplicial complexes built from
data
• Iterative changing of lens with which to examine
data (neighborhood size…)
• Topological features (critical points) appear and
disappear as the lens changes.
• Creates a nested sequence of features with
underlying algebraic properties, called a homology
sequence:
Hom1⊂Hom2⊂Hom3⊂Hom4
• Persistence gives length of feature existence in
homology sequence.
• Many plots (left) exist to summarize this
information, and special statistical tools can
compare datasets/topological spaces.
• Filtration defines an MRI-type examination of
data’s topological characteristics and evolution
of critical points.
0 2 4 6 8 10
0246810
Birth
Death
0 2 4 6 8 10
time
MapperAlgorithm
• Generalizes Reeb graphs to track
connected components through
covers/nerves of a space with a defined
Morse function.
• Basic steps:
• Define distance metric on data
• Define filtration function (Morse function)
• Linear, density-based, curvature-based…
• Slice multidimensional dataset with that
function
• Examine function behavior across slice (level
set)
• Cluster by connected components of cover
• Plot clusters by overlap of points across
covers
Response
gradations
Outliers
Multiscale Mapper Methods
• Mapper clusters change with
parameter scale change
(unstable solutions).
• Filtrations at multiple
resolution settings to create
stability (see above example).
• Creates hierarchy of Reeb
graphs (mapper clusters) from
each slice.
• Analyze across slices to gain
deeper insight underlying data
structures.
1st Scale 2nd Scale
Scale
change
Psychometric
test example:
verbal vs.
math ability
Conclusion
• Morse functions underlie several methods used in modern data analysis.
• Understanding the theory and application can facilitate use on new data
problems, as well as development of new tools based on these methods.
• Combined with statistics and machine learning, these methods can create power
analytics pipelines yielding more insight than individual
Good References
• Carlsson,G. (2009).Topology and data. Bulletin of the American MathematicalSociety,
46(2), 255-308.
• Gerber, S., Rübel, O., Bremer, P.T., Pascucci,V., &Whitaker, R.T. (2013). Morse–smale
regression. Journal of Computational and Graphical Statistics, 22(1), 193-214.
• Edelsbrunner, H., & Harer, J. (2008). Persistent homology-a survey. Contemporary
mathematics, 453, 257-282.
• Forman, R. (2002).A user’s guide to discrete Morse theory. Sém. Lothar. Combin, 48, 35pp.
• Carr, H., Garth, C., &Weinkauf,T. (Eds.). (2017). Topological Methods in Data Analysis and
Visualization IV:Theory, Algorithms, and Applications. Springer.
• Di Fabio, B., & Landi,C. (2016).The edit distance for Reeb graphs of surfaces. Discrete &
Computational Geometry, 55(2), 423-461.
Ad

More Related Content

What's hot (20)

Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
Ricardo Wendell Rodrigues da Silveira
 
Selection in Evolutionary Algorithm
Selection in Evolutionary AlgorithmSelection in Evolutionary Algorithm
Selection in Evolutionary Algorithm
Riyad Parvez
 
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
SaurabhWani6
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk Modeling
Colleen Farrelly
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
Cristhian Figueroa
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
chenhm
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
swapnac12
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
SreerajVA
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
Pier Luca Lanzi
 
Data discretization
Data discretizationData discretization
Data discretization
Hadi M.Abachi
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
sreedevibalasubraman
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
Krish_ver2
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
Amir Razmjou
 
Machine Learning vs. Deep Learning
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
Machine Learning vs. Deep Learning
Belatrix Software
 
Selection in Evolutionary Algorithm
Selection in Evolutionary AlgorithmSelection in Evolutionary Algorithm
Selection in Evolutionary Algorithm
Riyad Parvez
 
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
SaurabhWani6
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk Modeling
Colleen Farrelly
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
chenhm
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
swapnac12
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
SreerajVA
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
Pier Luca Lanzi
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
Krish_ver2
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
Amir Razmjou
 
Machine Learning vs. Deep Learning
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
Machine Learning vs. Deep Learning
Belatrix Software
 

Similar to Topology for data science (20)

Presentation
PresentationPresentation
Presentation
Peyman Faizian
 
Applied GIS - 3022.pptx
Applied GIS - 3022.pptxApplied GIS - 3022.pptx
Applied GIS - 3022.pptx
temesgenabebe1
 
Self Organizing Maps
Self Organizing MapsSelf Organizing Maps
Self Organizing Maps
Daksh Raj Chopra
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
Colleen Farrelly
 
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scaleODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
Kuldeep Jiwani
 
unitiv-spacialdataanalysis-200423132043.pdf
unitiv-spacialdataanalysis-200423132043.pdfunitiv-spacialdataanalysis-200423132043.pdf
unitiv-spacialdataanalysis-200423132043.pdf
sumitshrivastav2904
 
TYBSC IT PGIS Unit IV Spacial Data Analysis
TYBSC IT PGIS Unit IV  Spacial Data AnalysisTYBSC IT PGIS Unit IV  Spacial Data Analysis
TYBSC IT PGIS Unit IV Spacial Data Analysis
Arti Parab Academics
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
Akash Borate
 
Geospatial Data ppt.pptx
Geospatial Data ppt.pptxGeospatial Data ppt.pptx
Geospatial Data ppt.pptx
Dhanya184890
 
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
DeronRodrigues1
 
Geostatistics for spatia data analysis and interpretaion.pptx
Geostatistics for spatia data analysis and interpretaion.pptxGeostatistics for spatia data analysis and interpretaion.pptx
Geostatistics for spatia data analysis and interpretaion.pptx
MariamKariam1
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functions
PRAMODA G
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
GopalPatidar13
 
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
EDAB - Principal Components Analysis and Classification -Module - 5.pptxEDAB - Principal Components Analysis and Classification -Module - 5.pptx
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
preethiBP2
 
Land Suitability Analysis.pdf
Land Suitability Analysis.pdfLand Suitability Analysis.pdf
Land Suitability Analysis.pdf
MarkMwari
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
Mason Porter
 
Vector data model
Vector data model Vector data model
Vector data model
PRAMODA G
 
Vector data model
Vector data modelVector data model
Vector data model
PRAMODA G
 
Geographical Information Science and remote sensing and analysis
Geographical Information Science and remote sensing and analysisGeographical Information Science and remote sensing and analysis
Geographical Information Science and remote sensing and analysis
SHREEKANTSB
 
Applied GIS - 3022.pptx
Applied GIS - 3022.pptxApplied GIS - 3022.pptx
Applied GIS - 3022.pptx
temesgenabebe1
 
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scaleODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
Kuldeep Jiwani
 
unitiv-spacialdataanalysis-200423132043.pdf
unitiv-spacialdataanalysis-200423132043.pdfunitiv-spacialdataanalysis-200423132043.pdf
unitiv-spacialdataanalysis-200423132043.pdf
sumitshrivastav2904
 
TYBSC IT PGIS Unit IV Spacial Data Analysis
TYBSC IT PGIS Unit IV  Spacial Data AnalysisTYBSC IT PGIS Unit IV  Spacial Data Analysis
TYBSC IT PGIS Unit IV Spacial Data Analysis
Arti Parab Academics
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
Akash Borate
 
Geospatial Data ppt.pptx
Geospatial Data ppt.pptxGeospatial Data ppt.pptx
Geospatial Data ppt.pptx
Dhanya184890
 
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
DeronRodrigues1
 
Geostatistics for spatia data analysis and interpretaion.pptx
Geostatistics for spatia data analysis and interpretaion.pptxGeostatistics for spatia data analysis and interpretaion.pptx
Geostatistics for spatia data analysis and interpretaion.pptx
MariamKariam1
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functions
PRAMODA G
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
GopalPatidar13
 
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
EDAB - Principal Components Analysis and Classification -Module - 5.pptxEDAB - Principal Components Analysis and Classification -Module - 5.pptx
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
preethiBP2
 
Land Suitability Analysis.pdf
Land Suitability Analysis.pdfLand Suitability Analysis.pdf
Land Suitability Analysis.pdf
MarkMwari
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
Mason Porter
 
Vector data model
Vector data model Vector data model
Vector data model
PRAMODA G
 
Vector data model
Vector data modelVector data model
Vector data model
PRAMODA G
 
Geographical Information Science and remote sensing and analysis
Geographical Information Science and remote sensing and analysisGeographical Information Science and remote sensing and analysis
Geographical Information Science and remote sensing and analysis
SHREEKANTSB
 
Ad

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
Colleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
Colleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
Colleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
Colleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
Colleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
Colleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
Colleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
Colleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
Colleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
Colleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
Colleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
Colleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
Colleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
Colleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
Colleen Farrelly
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
Colleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
Colleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
Colleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
Colleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
Colleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
Colleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
Colleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
Colleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
Colleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
Colleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
Colleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
Colleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
Colleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
Colleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
Colleen Farrelly
 
Ad

Recently uploaded (20)

indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
AWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdfAWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdf
philsparkshome
 
Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]
globibo
 
Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
AWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdfAWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdf
philsparkshome
 
Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]
globibo
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 

Topology for data science

  • 1. TOPOLOGY FOR DATA SCIENCE: MORSETHEORY AND APPLICATION Colleen M. Farrelly
  • 2. Level Sets in Everyday Life • Front maps partition weather patterns by areas of the same pressure (isobars). • Elevation maps partition land areas by height above/below sea level.
  • 3. Level Sets of Functions • Continuous functions have defined local and global peaks, valleys, and passes. • Define height “slices” to partition function. • Akin to a cheese grater scraping off layers of a cheese block. • In the example, the blue lines slice a sine wave into pieces of similar height. • Function on discrete date (points) can be partitioned into level sets, too.
  • 4. Level Sets to Critical Points • Continuous functions: • Can be decomposed with level sets. • Contain local optima (critical points). • Maxima (peaks) • Minima (valleys) • Saddle points (inflections/height change) • Continuous functions can live in higher-dimensional spaces with more complicated critical points.
  • 5. Degenerate and Non-DegenerateOptima • Morse functions have stable and isolated local optima (non-degenerate critical points). • Related to 1st and 2nd derivatives of function. • Don’t change with small shifts to the function. • Technically, related to Hessian being defined/undefined at the critical point. • Reflects neighborhood behavior around the critical point. 1. Non-degenerate critical points have defined behavior in the critical point’s neighborhood. 2. Degenerate points have undefined behavior near the critical point. f’=0 f’=0 f’’(x)<0 f’’(x)>0 f’’(x)=0
  • 6. Morse Function Definition 1. None of the function’s critical points are degenerate. 2. None of the critical points share the same value. • These properties allow a map between a function’s critical point values to a space of level sets (left). • All critical values map to values in the level set collection. • Function can be plotted nicely to summarize its peaks, valleys, and in- between spaces. 1 0 -1 Level Set Critical Point Map
  • 7. Discrete Extensions to DataAnalysis • Morse functions can be extended to discrete spaces. • Data lives in a discrete point cloud. • Topological spaces, called simplicial complexes, can be built from these. • Several algorithms exist to connect points to each other via shared neighborhoods. • Vietoris-Rips complexes are built from connecting points with d distance from each other. • Any metric distance can be used. • Process turns data into a topological space upon which a Morse function can be defined. 2-d neighborhoods are defined by Euclidean distance. Points within a given circle are mutually connected, forming a simplex. Example simplicial complex
  • 8. Morse-Smale Clustering • Partition space between minima and maxima of function by flow. • Example: • The truncated sine wave shown has 2 minima and 2 maxima shown (dots). • Pieces between local minima and maxima define regions of the function. 1. Yellow 2. Blue 3. Red • Higher-dimensional spaces can be simplified by this partitioning. • Can be used to cluster data. • Subgroups can then be compared across characteristics using statistical tests (t- test, Chi square…). Cluster 1 Cluster 2 Cluster 3
  • 9. Intuitive 2-Dimensional Example • Imagine a soccer player kicking a ball on the ground of a hilly field. • The high and low points determine where the ball will come to rest. • These paths of the ball define which parts of the field share common hills and valleys. • These paths are actually gradient paths defined by height on the field’s topological space. • The spaces they define are the Morse-Smale complex of the field, partitioning it into different regions (clusters). Algorithms that compute Morse-Smale complexes typically follow this intuition.
  • 10. Morse-Smale Regression • Type of piece-wise regression. • Fit regression model to partitions found by Morse-Smale decompositions of a space given a Morse function. • Regression models include: • Linear and generalized linear models • Machine learning models • Random forest • Elastic net • Boosted regression • Neural/deep networks • Can examine group-wise differences in regression models. Example: 2 groups, 3 predictors
  • 11. Reeb Graphs • Track evolution of level sets through critical points of a Morse function. • Partition space according to a function (left by height). • Plot critical points entering model. • Track until they are subsumed into another partition. • Useful in image analytics and shape comparison.
  • 12. Persistent Homology • Filtration of simplicial complexes built from data • Iterative changing of lens with which to examine data (neighborhood size…) • Topological features (critical points) appear and disappear as the lens changes. • Creates a nested sequence of features with underlying algebraic properties, called a homology sequence: Hom1⊂Hom2⊂Hom3⊂Hom4 • Persistence gives length of feature existence in homology sequence. • Many plots (left) exist to summarize this information, and special statistical tools can compare datasets/topological spaces. • Filtration defines an MRI-type examination of data’s topological characteristics and evolution of critical points. 0 2 4 6 8 10 0246810 Birth Death 0 2 4 6 8 10 time
  • 13. MapperAlgorithm • Generalizes Reeb graphs to track connected components through covers/nerves of a space with a defined Morse function. • Basic steps: • Define distance metric on data • Define filtration function (Morse function) • Linear, density-based, curvature-based… • Slice multidimensional dataset with that function • Examine function behavior across slice (level set) • Cluster by connected components of cover • Plot clusters by overlap of points across covers Response gradations Outliers
  • 14. Multiscale Mapper Methods • Mapper clusters change with parameter scale change (unstable solutions). • Filtrations at multiple resolution settings to create stability (see above example). • Creates hierarchy of Reeb graphs (mapper clusters) from each slice. • Analyze across slices to gain deeper insight underlying data structures. 1st Scale 2nd Scale Scale change Psychometric test example: verbal vs. math ability
  • 15. Conclusion • Morse functions underlie several methods used in modern data analysis. • Understanding the theory and application can facilitate use on new data problems, as well as development of new tools based on these methods. • Combined with statistics and machine learning, these methods can create power analytics pipelines yielding more insight than individual
  • 16. Good References • Carlsson,G. (2009).Topology and data. Bulletin of the American MathematicalSociety, 46(2), 255-308. • Gerber, S., Rübel, O., Bremer, P.T., Pascucci,V., &Whitaker, R.T. (2013). Morse–smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-214. • Edelsbrunner, H., & Harer, J. (2008). Persistent homology-a survey. Contemporary mathematics, 453, 257-282. • Forman, R. (2002).A user’s guide to discrete Morse theory. Sém. Lothar. Combin, 48, 35pp. • Carr, H., Garth, C., &Weinkauf,T. (Eds.). (2017). Topological Methods in Data Analysis and Visualization IV:Theory, Algorithms, and Applications. Springer. • Di Fabio, B., & Landi,C. (2016).The edit distance for Reeb graphs of surfaces. Discrete & Computational Geometry, 55(2), 423-461.
  翻译: