SlideShare a Scribd company logo
Sedna Query Parser & Optimizing Rewriter  Dmitry Lizorkin [email_address] Ph.D., software developer Sedna team
Goals Wide range of queries/statements support: XQuery queries, XML update statements, data definition language statements High performance for both query evaluation and updates execution Query optimization strategies designed in correspondence with Sedna internal data representation design
Query processing steps query execution plan (QEP): to Sedna executor operation tree Physical Plan Generator Optimizing Rewriter Static Analyzer Parser XML update statement Data Definition Language statement XQuery query
Query parser Input: 3 types of queries/statements XQuery queries XML update statements Data Definition Language statements (e.g. create document statement) Output: uniform operation tree for all above 3 query/statement types
Static analyzer Query static analysis phase Static context is initialized with XQuery Functions and Operators and augmented with query prolog declarations Query operation tree is expanded with imported XQuery modules All namespace prefixes, function names and variable names are resolved XQuery static errors are reported, if any
Optimizing rewriter step Physical Plan Generator Optimizing Rewriter query execution plan Static Analyzer Parser operation tree XML update statement Data Definition Language statement XQuery query
Optimizing Rewriter Optimization based on query rewriting Removing unnecessary ordering operations Combining abbreviated descendant-or-self path step with a next path step Removing unnecessary node copies for constructed content Analyzing nested for-clauses
Ordering operations: challenges Many XQuery expressions have the semantics for their resulting sequences to be ordered in document order with duplicate nodes removed “ Distinct-Document Order” semantics is expressed via explicit DDO operations in Sedna query operation tree DDO operations decrease query execution performance Require the whole argument sequence to be evaluated before a first result item could be produced Break execution pipeline
Ordering operations: optimization Idea: removing unnecessary ordering operations Analysis: for each operation in the query operation tree, the following properties for the resulting sequence are recursively found out whether in DDO whether consists of no more than a single item whether consists of nodes on a common level of an XML tree Result: a DDO operation is removed if either its argument is known to be in DDO, or DDO is not required for the resulting sequence
Descendant-or-self: challenges The  “//”  abbreviated path step (expanded into  descendant-or-self::node() ) is frequently used in practical XQuery queries Expensive to evaluate Bad selectivity: generally, selects almost all nodes in an XML document Does not allow to use benefits of Sedna descriptive schema-driven storage strategy
Descendant-or-self: optimization idea Idea: combining the  //  step with a next path step E.g.,  //para  transformed to  /descendant::para Better intermediate selectivity Benefits of Sedna schema-driven storage Technical issue: context position/size “ The path expression  //para[1]  does  not  mean the same as the path expression  /descendant::para[1] ” (XQuery Spec., Subsect. 3.2.4)
Descendant-or-self: solution For the  //  path step, its next step predicate expressions are analyzed (if any) If predicate expressions results do not depend on context position and size (neither explicitly, nor implicitly), than the  //  step can be combined with its next step while keeping the original query semantics
Removing unnecessary node copies An XQuery constructor semantics implies constructor content being new nodes, “ even if they are copies of existing nodes ” Problem: making a deep copy of an XML subtree is expensive to evaluate Idea: avoiding node copies that do not affect query result semantics Algorithm: a constructed node needs not be copied if it is a part of the query result sequence, or it becomes a direct child of another constructed node
Analyzing nested for-clauses FLWOR-expressions generally contain multiple iteration variables in for-clauses for $u in doc('users')//user_tuple, $i in doc('items')//item_tuple where ... return ... Binding sequences with nested loop semantics An expression associated with an inner iteration variable is analyzed If the associated expression does not depend on outer iteration variables, it is marked as “lazy” Lazy associated expression value can be evaluated just once, with the query semantics preserved
Physical plan generation step Physical Plan Generator Optimizing Rewriter query execution plan Static Analyzer Parser operation tree XML update statement Data Definition Language statement XQuery query
Query physical plan generation Query execution plan (QEP) for the query operation tree is constructed Structural location path fragments are extracted A path that starts from an XML document node and contains only descending axes and no predicates Mapped to Sedna descriptive schema access operations For & order-by clauses are mapped to tuple stream generation & reordering operations respectively
Implementation details Query parser is implemented with ANTLR parser generator (www.antlr.org) ANTLR native representation for an operation tree is an S-expression S-expression is a native data representation for Scheme programming language Scheme provides extensive native features for S-expressions rewriting purposes Optimizing rewriter is implemented in Scheme The Scheme-to-C compiler produces high-performance code
Summary Complete query static analysis phase and rewriting-based optimization processing (i.e. from query textual representation to physical plan)
Thank you for your attention!
Future work: Cost-based optimization Problem analysis Join operations are primary candidates to benefit from cost-based optimization implemented In XML, the problem of join evaluation is not as vital as for relational data One-to-many entity relationships can often be modeled via XML elements nesting mechanism For many-to-many relationships, Sedna explicit indexes can be used
Cost-based optimization: plans Join operations extraction in a query operation tree Selectivity estimation mechanism for XQuery expressions A storage for selectivity statistics is required (XML?) Cost-based physical plan selection is a complicated task in general; however, a lot of related work exists can be relatively easily implemented for simple cases Hints can be used for complex cases
Ad

More Related Content

What's hot (20)

Stored procedure in sql server
Stored procedure in sql serverStored procedure in sql server
Stored procedure in sql server
baabtra.com - No. 1 supplier of quality freshers
 
1 list datastructures
1 list datastructures1 list datastructures
1 list datastructures
Nguync91368
 
2 a stacks
2 a stacks2 a stacks
2 a stacks
Nguync91368
 
Chapter 3 stored procedures
Chapter 3 stored proceduresChapter 3 stored procedures
Chapter 3 stored procedures
baabtra.com - No. 1 supplier of quality freshers
 
X FILES
X FILESX FILES
X FILES
SaraswathiRamalingam
 
Xpath presentation
Xpath presentationXpath presentation
Xpath presentation
Alfonso Gabriel López Ceballos
 
SQL
SQLSQL
SQL
kaushal123
 
Oracle: PLSQL Introduction
Oracle: PLSQL IntroductionOracle: PLSQL Introduction
Oracle: PLSQL Introduction
DataminingTools Inc
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
maclean liu
 
XML and XPath details
XML and XPath detailsXML and XPath details
XML and XPath details
DSK Chakravarthy
 
Unit 3
Unit 3Unit 3
Unit 3
Abha Damani
 
4. plsql
4. plsql4. plsql
4. plsql
Amrit Kaur
 
Introduction to PL/SQL
Introduction to PL/SQLIntroduction to PL/SQL
Introduction to PL/SQL
Kailash N
 
ORACLE PL SQL
ORACLE PL SQLORACLE PL SQL
ORACLE PL SQL
Srinath Maharana
 
Packages in PL/SQL
Packages in PL/SQLPackages in PL/SQL
Packages in PL/SQL
Pooja Dixit
 
Oracle db subprograms
Oracle db subprogramsOracle db subprograms
Oracle db subprograms
Simon Huang
 
Pl sql content
Pl sql contentPl sql content
Pl sql content
MargaretMaryT
 
Structures and Pointers
Structures and PointersStructures and Pointers
Structures and Pointers
Prabu U
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
avniS
 
Xslt mule
Xslt   muleXslt   mule
Xslt mule
Sindhu VL
 

Similar to Sedna XML Database: Query Parser & Optimizing Rewriter (20)

Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Alexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
Alexander Decker
 
ch02-240507064009-ac337bf1 .ppt
ch02-240507064009-ac337bf1             .pptch02-240507064009-ac337bf1             .ppt
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
Chapter15
Chapter15Chapter15
Chapter15
gourab87
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Ch-2-Query-Process.pptx advanced database
Ch-2-Query-Process.pptx advanced databaseCh-2-Query-Process.pptx advanced database
Ch-2-Query-Process.pptx advanced database
tasheebedane
 
700442110-advanced database Ch-2-Query-Process.pptx
700442110-advanced database Ch-2-Query-Process.pptx700442110-advanced database Ch-2-Query-Process.pptx
700442110-advanced database Ch-2-Query-Process.pptx
tasheebedane
 
Patterns in Python
Patterns in PythonPatterns in Python
Patterns in Python
dn
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
Dave Allen
 
Database programming
Database programmingDatabase programming
Database programming
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Day5
Day5Day5
Day5
madamewoolf
 
Sql Summit Clr, Service Broker And Xml
Sql Summit   Clr, Service Broker And XmlSql Summit   Clr, Service Broker And Xml
Sql Summit Clr, Service Broker And Xml
David Truxall
 
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easyIEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
IEEEMEMTECHSTUDENTSPROJECTS
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
Jan Wiegelmann
 
Analysing Performance of Algorithmic SQL and PLSQL.pptx
Analysing Performance of Algorithmic SQL and PLSQL.pptxAnalysing Performance of Algorithmic SQL and PLSQL.pptx
Analysing Performance of Algorithmic SQL and PLSQL.pptx
Brendan Furey
 
chapter 5 Objectdesign.ppt
chapter 5 Objectdesign.pptchapter 5 Objectdesign.ppt
chapter 5 Objectdesign.ppt
TemesgenAzezew
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
Luc Bors
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
Mark Kromer
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Alexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
Alexander Decker
 
ch02-240507064009-ac337bf1 .ppt
ch02-240507064009-ac337bf1             .pptch02-240507064009-ac337bf1             .ppt
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Ch-2-Query-Process.pptx advanced database
Ch-2-Query-Process.pptx advanced databaseCh-2-Query-Process.pptx advanced database
Ch-2-Query-Process.pptx advanced database
tasheebedane
 
700442110-advanced database Ch-2-Query-Process.pptx
700442110-advanced database Ch-2-Query-Process.pptx700442110-advanced database Ch-2-Query-Process.pptx
700442110-advanced database Ch-2-Query-Process.pptx
tasheebedane
 
Patterns in Python
Patterns in PythonPatterns in Python
Patterns in Python
dn
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
Dave Allen
 
Sql Summit Clr, Service Broker And Xml
Sql Summit   Clr, Service Broker And XmlSql Summit   Clr, Service Broker And Xml
Sql Summit Clr, Service Broker And Xml
David Truxall
 
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easyIEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
IEEEMEMTECHSTUDENTSPROJECTS
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
Jan Wiegelmann
 
Analysing Performance of Algorithmic SQL and PLSQL.pptx
Analysing Performance of Algorithmic SQL and PLSQL.pptxAnalysing Performance of Algorithmic SQL and PLSQL.pptx
Analysing Performance of Algorithmic SQL and PLSQL.pptx
Brendan Furey
 
chapter 5 Objectdesign.ppt
chapter 5 Objectdesign.pptchapter 5 Objectdesign.ppt
chapter 5 Objectdesign.ppt
TemesgenAzezew
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
Luc Bors
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
Mark Kromer
 
Ad

Recently uploaded (20)

Top-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptxTop-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptx
BR Softech
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
João Esperancinha
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
MEMS IC Substrate Technologies Guide 2025.pptx
MEMS IC Substrate Technologies Guide 2025.pptxMEMS IC Substrate Technologies Guide 2025.pptx
MEMS IC Substrate Technologies Guide 2025.pptx
IC substrate Shawn Wang
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Vasileios Komianos
 
React Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for SuccessReact Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for Success
Amelia Swank
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
Artificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptxArtificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptx
03ANMOLCHAURASIYA
 
Top-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptxTop-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptx
BR Softech
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
João Esperancinha
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
MEMS IC Substrate Technologies Guide 2025.pptx
MEMS IC Substrate Technologies Guide 2025.pptxMEMS IC Substrate Technologies Guide 2025.pptx
MEMS IC Substrate Technologies Guide 2025.pptx
IC substrate Shawn Wang
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Digital Technologies for Culture, Arts and Heritage: Insights from Interdisci...
Vasileios Komianos
 
React Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for SuccessReact Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for Success
Amelia Swank
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
Artificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptxArtificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptx
03ANMOLCHAURASIYA
 
Ad

Sedna XML Database: Query Parser & Optimizing Rewriter

  • 1. Sedna Query Parser & Optimizing Rewriter Dmitry Lizorkin [email_address] Ph.D., software developer Sedna team
  • 2. Goals Wide range of queries/statements support: XQuery queries, XML update statements, data definition language statements High performance for both query evaluation and updates execution Query optimization strategies designed in correspondence with Sedna internal data representation design
  • 3. Query processing steps query execution plan (QEP): to Sedna executor operation tree Physical Plan Generator Optimizing Rewriter Static Analyzer Parser XML update statement Data Definition Language statement XQuery query
  • 4. Query parser Input: 3 types of queries/statements XQuery queries XML update statements Data Definition Language statements (e.g. create document statement) Output: uniform operation tree for all above 3 query/statement types
  • 5. Static analyzer Query static analysis phase Static context is initialized with XQuery Functions and Operators and augmented with query prolog declarations Query operation tree is expanded with imported XQuery modules All namespace prefixes, function names and variable names are resolved XQuery static errors are reported, if any
  • 6. Optimizing rewriter step Physical Plan Generator Optimizing Rewriter query execution plan Static Analyzer Parser operation tree XML update statement Data Definition Language statement XQuery query
  • 7. Optimizing Rewriter Optimization based on query rewriting Removing unnecessary ordering operations Combining abbreviated descendant-or-self path step with a next path step Removing unnecessary node copies for constructed content Analyzing nested for-clauses
  • 8. Ordering operations: challenges Many XQuery expressions have the semantics for their resulting sequences to be ordered in document order with duplicate nodes removed “ Distinct-Document Order” semantics is expressed via explicit DDO operations in Sedna query operation tree DDO operations decrease query execution performance Require the whole argument sequence to be evaluated before a first result item could be produced Break execution pipeline
  • 9. Ordering operations: optimization Idea: removing unnecessary ordering operations Analysis: for each operation in the query operation tree, the following properties for the resulting sequence are recursively found out whether in DDO whether consists of no more than a single item whether consists of nodes on a common level of an XML tree Result: a DDO operation is removed if either its argument is known to be in DDO, or DDO is not required for the resulting sequence
  • 10. Descendant-or-self: challenges The “//” abbreviated path step (expanded into descendant-or-self::node() ) is frequently used in practical XQuery queries Expensive to evaluate Bad selectivity: generally, selects almost all nodes in an XML document Does not allow to use benefits of Sedna descriptive schema-driven storage strategy
  • 11. Descendant-or-self: optimization idea Idea: combining the // step with a next path step E.g., //para transformed to /descendant::para Better intermediate selectivity Benefits of Sedna schema-driven storage Technical issue: context position/size “ The path expression //para[1] does not mean the same as the path expression /descendant::para[1] ” (XQuery Spec., Subsect. 3.2.4)
  • 12. Descendant-or-self: solution For the // path step, its next step predicate expressions are analyzed (if any) If predicate expressions results do not depend on context position and size (neither explicitly, nor implicitly), than the // step can be combined with its next step while keeping the original query semantics
  • 13. Removing unnecessary node copies An XQuery constructor semantics implies constructor content being new nodes, “ even if they are copies of existing nodes ” Problem: making a deep copy of an XML subtree is expensive to evaluate Idea: avoiding node copies that do not affect query result semantics Algorithm: a constructed node needs not be copied if it is a part of the query result sequence, or it becomes a direct child of another constructed node
  • 14. Analyzing nested for-clauses FLWOR-expressions generally contain multiple iteration variables in for-clauses for $u in doc('users')//user_tuple, $i in doc('items')//item_tuple where ... return ... Binding sequences with nested loop semantics An expression associated with an inner iteration variable is analyzed If the associated expression does not depend on outer iteration variables, it is marked as “lazy” Lazy associated expression value can be evaluated just once, with the query semantics preserved
  • 15. Physical plan generation step Physical Plan Generator Optimizing Rewriter query execution plan Static Analyzer Parser operation tree XML update statement Data Definition Language statement XQuery query
  • 16. Query physical plan generation Query execution plan (QEP) for the query operation tree is constructed Structural location path fragments are extracted A path that starts from an XML document node and contains only descending axes and no predicates Mapped to Sedna descriptive schema access operations For & order-by clauses are mapped to tuple stream generation & reordering operations respectively
  • 17. Implementation details Query parser is implemented with ANTLR parser generator (www.antlr.org) ANTLR native representation for an operation tree is an S-expression S-expression is a native data representation for Scheme programming language Scheme provides extensive native features for S-expressions rewriting purposes Optimizing rewriter is implemented in Scheme The Scheme-to-C compiler produces high-performance code
  • 18. Summary Complete query static analysis phase and rewriting-based optimization processing (i.e. from query textual representation to physical plan)
  • 19. Thank you for your attention!
  • 20. Future work: Cost-based optimization Problem analysis Join operations are primary candidates to benefit from cost-based optimization implemented In XML, the problem of join evaluation is not as vital as for relational data One-to-many entity relationships can often be modeled via XML elements nesting mechanism For many-to-many relationships, Sedna explicit indexes can be used
  • 21. Cost-based optimization: plans Join operations extraction in a query operation tree Selectivity estimation mechanism for XQuery expressions A storage for selectivity statistics is required (XML?) Cost-based physical plan selection is a complicated task in general; however, a lot of related work exists can be relatively easily implemented for simple cases Hints can be used for complex cases
  翻译: