Graph Query Language standards: change and continuity

Alastair Green

Graph data specialist. Vice-chair of Linked Data Benchmark Council. LDBC representative on WG3 (SQL + GQL committee). Lead of LDBC Extended GQLSchema (LEX) Working Group.

Published Jan 16, 2020

If you’ve followed some of my previous posts, you’ll know that I’ve been working for a while at Neo4j (Cypher: An Evolving Query Language for Property Graphs), and with many other industry and academic partners, to pull together a project for a new international standard property Graph Query Language (The GQL Manifesto, Critical milestone for ISO graph query standard GQL).

In September the ISO/IEC GQL project started (SQL ... and now GQL), so that central goal has now been accomplished. (A draft Wikipedia article on GQL is in review for publication: it gathers a wide range of references — including pointers to the formal project goals, which have recently been made public.)

As we move into a new year, I’d like to give an update on what I’m up to, and on my view of the prospects for the GQL initiative, and graph data management more broadly.

Moving on from Neo4j

At the turn of the year I finished working for Neo4j. This may give me more time to spend with my family, but it will definitely give me more time to work on a Computer Science PhD at Birkbeck College, University of London, with the help of my supervisors Peter Wood and Alex Poulovassilis. It's a great opportunity to work with noted researchers in this field.

After forty years in industrial practice, I’m interested in looking deeper into the field of graph data management than is possible within the constraints of a product company and of tightly-focussed standards collaborations. I’m not quite sure what this is going to narrow down to in terms of a thesis topic, but overall this is an area that is attracting increasing theoretical attention as the use of graph databases expands faster and faster — as evidenced by the recent Dagstuhl seminar on graphs and big data. It’s a good time to be taking a step back, and looking farther forward.

And, as I joked to Victor Lee from Tigergraph, when talking recently to their standards group about this move — there’s no better time in life to do a PhD than when it can have no impact on your career!

Looking back

In the last three and more years, I’ve been privileged to work with many very talented and intellectually honest people at Neo4j, from whom I’ve learnt a very great deal. It’s an invidious business, naming names, but I think there are eight people I do need to mention in this context.

One is Andrés Taylor, who created the original Cypher language. His shoulders have a fair bit to answer for.

The Query Languages Standards and Research team have been my day-to-day colleagues, teachers and endlessly engaging intellectual collaborators: Petra Selmer, Stefan Plantikow, Tobias Lindaaker and Hannes Voigt. Mats Rydberg used to be a member of that team's predecessor, the Neo4j Cypher Language Group. (He has gone on to lead the engineering team that produced Cypher for Apache Spark, which contains composable graph queries, named graphs and views, and graph types; that team is now productizing graph algorithms in the Neo4j Data Science offering.)

Finally two people whose reasoned and emotional conviction about the need for a property graph language standard has never wavered, and whose support has been critical (and far from automatic): Philip Rathle, the head of product at Neo4j, and Emil Eifrem, the CEO. They made openCypher happen back in 2015 (enabling around a dozen separate implementations of the Cypher language since then), and they've backed the evolution towards GQL in the last couple of years. Those are significant (and far-sighted) bets for a young, fast-growing company with plenty of other, day-to-day problems and opportunities at hand.

I’d also like to thank the technical staff and management of three other companies, each of which has played a critical role in this first phase of a big, long journey: Oracle, TigerGraph and Redis Labs. There have been very many twists and turns (and arguments), but the energy and determination to achieve new standard platforms for property graph data processing which has been exhibited by these companies, each in their own way, has been instrumental. The courtesy and appetite for collaboration of these good colleagues has been inspiring. There are too many to name, but they know who they are!

I hope that in ten years or so, a lot of people in the database industry, and a lot of users of graph data management services, are going to have a lot to thank all these people for. They haven’t just "helped": they’re making big new things happen.

That takes courage — a quality that is rarer than vision or technical understanding. There are always seemingly good reasons not to conduct hard and risky projects, which challenge received wisdom and incumbent expertise, and inevitably expose ignorance. There is no certainty about the success of the GQL project, but there are good objective and subjective grounds for optimism.

Looking forward

When Philip Rathle and I were talking at one point about my (long-planned and very amicable) departure from Neo4j, we asked ourselves who might be able to help the company (and transitively, the wider industry) organize the next stage. GQL has to become an actual specification, with a broad technical consensus behind the words on paper, and with multiple implementations beginning to emerge from several vendors and in open-source and research projects.

Keith Hare, of JCC Consulting, has been the convenor of the SQL standards committee, ISO/IEC JTC 1/SC 32 WG3, for the past dozen years. He is also vice-chair of the U.S. INCITS DM32 technical working group for data management, which is where the bulk of technical work on Database Languages has taken place historically.

WG3 is now responsible for the graph extensions to SQL (SQL/PGQ) and for the GQL project.

Keith has been instrumental, and increasingly active, in all the efforts to address property graph data in the formal standards arena. He co-authored the WG3 presentation on graph data standards (along with Jim Melton and Jan Michels) that was put forward at a Linked Data Benchmark Council (LDBC) meeting at SAP’s HQ, in early 2017. That initiative led to SQL/PGQ. Keith was also importantly visible as a leading voice for GQL at the Berlin W3C workshop on graph data management standards in March 2019, and has spoken on the topic most recently at Graphorum in Chicago last autumn, among other venues.

I’m delighted to hear that Keith will now be playing a big role in Neo4j’s work in this area in the coming year.

Oracle staff like Don Deutsch, Jim Melton and Jan Michels, or Karl Schendel from Actian, are able to simultaneously represent their company and work very effectively for the wider database industry in the international standards process. Similarly, Keith is going to continue his (unpaid) efforts as WG3 convenor, but take on direct (paid) responsibility for Neo4j’s public standards work. This will be a powerful complement to the company’s existing efforts on technical design, specification writing/editing and alignment of the emerging GQL standard with the continually evolving Neo4j database product.

Standards, products and users

Two example illustrate the point about standards and product convergence for companies like Neo4j. The emerging thinking about GQL schema or “graph types” aligns with the need to define resources for role-based access control in Neo4j 4.0 and beyond. Similarly the Fabric facility in Neo4j 4.0 shows the power of multiple named graphs and graph views (both sub-graphs and “super graphs”), which are key concepts in the emerging GQL design. Similar observations about current and future features aligning with the emerging standard can be made about other company’s products.

This standards work is practical, needed and important for customers and users.

People want to do more and do better with graph data services and workloads. A standard that is informed by the experience and shape of the core of SQL gives proven features and invaluable familiarity; one that is made by forward-looking vendors and researchers working with a more potent model gives users new opportunities.

So, it’s very good to see a constant and reinforced commitment from Neo4j to the cause of the GQL standard. I’m looking forward to continuing interactions and discussions, especially in the context of LDBC community efforts (see below) … but I am actually also looking forward to spending more time with my family!

Supporting the work of LDBC

Linked Data Benchmark Council (LDBC) provides the organisational frame for GQL community efforts. It also has a formal liaison with WG3, which allows for mutual information exchanges, and gives a route for the wider graph data community to influence and enrich the work of the standards authors, editors and reviewers.

A good example of LDBC’s role is the work from 2015-2018 of its Query Language Task Force, which produced G-CORE: A Core for Future Graph Query Languages).

This approach is continued in working groups set up to support the genesis and design of GQL, which LDBC agreed to sponsor at its last board meeting half a year ago. An instance is the Property Graph Schema working group led by Juan Sequeda at data.world and Jan Hidders from Birkbeck (which is holding its third face-to-face meeting at the end of this month in Brussels).

LDBC brings industry and academia together, internationally and pretty informally. It is able to look ahead, and to chart an indicative intellectual roadmap that the official standards community can then exploit, over time. The Existing Languages group led by Petra Selmer is providing a comparative study to frame and verify the content of GQL as the design of the language shapes up. There are also interesting plans for another LDBC working group or task force to work on formal (denotational) semantics for GQL, to be led by Leonid Libkin from the University of Edinburgh/ENS Paris.

LDBC is also an ideal venue for practical open-source software collaborations to back up the paper specification. LBDC benchmarks are supported by abundant open-source tools. The experimental implementation of G-CORE is another example. I hope that in time openCypher's language tooling will spawn similar GQL community projects under the LDBC umbrella. Current work to make feature categories explicit in the Cypher TCK show the value that many implementations have found in this kind of "language engineering".

As the Vice-chair of LDBC, I have been putting a lot of work into professionalizing its administration and finances, and reviewing its IP processes and policies. The aim is that LDBC can reinforce and expand its benchmarking work, as well as widen its membership, and act as a vibrant liaison partner for ISO/IEC, and potentially for W3C. I personally am hoping to find a way of continuing to support LDBC's work, which intersects interesting research collaborations.

I view LDBC as a crossroads and meeting point for all who are interested in the future of graph data management. I hope to see you there at some point. Please feel free to message me on LinkedIn if you need any information or introductions.

Ian Chotakoo

IT, Data & Operations

Best of luck with this Alastair, enjoy the adventure

Michal Bachman

CEO at GraphAware | Serving intelligence analysts mission-critical insights from connected data | Trusted by democratic government agencies

Congratulations Alastair and thank you for all your mega-valuable contributions to the graph community. All the best!

Arjen P. de Vries 🕊️

Professor Information Retrieval and Research Director of ICIS @ Radboud University

Congrats on the brave dive into PhD research life! You won't regret it! (Well, regrets won't ever last too long I'd say)

Neal Hill

Congratulations! Wonderful news... I feel like you're heeding a digital calling and following the great academic path of ANOTHER talented Green. May you find success and joy in this new future.

Christopher Blake

Product Director at SAS

A great read. Enjoy this next chapter!

See more comments

To view or add a comment, sign in

Graph Query Language standards: change and continuity

Alastair Green

Graph data specialist. Vice-chair of Linked Data Benchmark Council. LDBC representative on WG3 (SQL + GQL committee). Lead of LDBC Extended GQLSchema (LEX) Working Group.

Moving on from Neo4j

Looking back

Looking forward

Standards, products and users

Supporting the work of LDBC

More articles by Alastair Green

Insights from the community

Others also viewed

Why Most RAG Applications Deliver Subpar Answers

War Rooms to Workspaces: The Epic Journey of SQL Weaving the Fabric of the Digital Economy

Look at the big picture.

Scaling SPARQL: Querying a Billion Observations with Ontop & DuckDB

Introduction to PromQL for Cloud-Native Observability and Time Series Data Analysis

Demystifying Nodes, Edges, Verbs, and Nouns

DATA SCIENCE: From Academia to Private Sector?

Cost-Based Iteration Over Dense Graphs

Getting Started with PromQL: Why we Need it for Observability Queries

Data Science and Self-Growth: The Correlation.

Explore topics

Moving on from Neo4j

Looking back

Looking forward

Standards, products and users

Supporting the work of LDBC

More articles by Alastair Green

The "Ontology Gap"

Fight or Align? RDF vocabularies and LPG schemas

Graph patterns ➤ Projecting subgraphs

LDBC Technical Users Committee, 30-31 August

Open-source language tools for GQL

GQL in code

Are graph and relational enemies?

First GQL research implementation from Olof Morra at TU Eindhoven!

PostgreSQL, Oracle ... graph query language standards adoption begins

SQL ... and now GQL

Insights from the community

Others also viewed

Why Most RAG Applications Deliver Subpar Answers

War Rooms to Workspaces: The Epic Journey of SQL Weaving the Fabric of the Digital Economy

Look at the big picture.

Scaling SPARQL: Querying a Billion Observations with Ontop & DuckDB

Introduction to PromQL for Cloud-Native Observability and Time Series Data Analysis

Demystifying Nodes, Edges, Verbs, and Nouns

DATA SCIENCE: From Academia to Private Sector?

Cost-Based Iteration Over Dense Graphs

Getting Started with PromQL: Why we Need it for Observability Queries

Data Science and Self-Growth: The Correlation.

Explore topics