Graph Query Language standards: change and continuity

If you’ve followed some of my previous posts, you’ll know that I’ve been working for a while at Neo4j (Cypher: An Evolving Query Language for Property Graphs), and with many other industry and academic partners, to pull together a project for a new international standard property Graph Query Language (The GQL Manifesto, Critical milestone for ISO graph query standard GQL).

In September the ISO/IEC GQL project started (SQL ... and now GQL), so that central goal has now been accomplished. (A draft Wikipedia article on GQL is in review for publication: it gathers a wide range of references — including pointers to the formal project goals, which have recently been made public.)

As we move into a new year, I’d like to give an update on what I’m up to, and on my view of the prospects for the GQL initiative, and graph data management more broadly.

Moving on from Neo4j

At the turn of the year I finished working for Neo4j. This may give me more time to spend with my family, but it will definitely give me more time to work on a Computer Science PhD at Birkbeck College, University of London, with the help of my supervisors Peter Wood and Alex Poulovassilis. It's a great opportunity to work with noted researchers in this field.

After forty years in industrial practice, I’m interested in looking deeper into the field of graph data management than is possible within the constraints of a product company and of tightly-focussed standards collaborations. I’m not quite sure what this is going to narrow down to in terms of a thesis topic, but overall this is an area that is attracting increasing theoretical attention as the use of graph databases expands faster and faster — as evidenced by the recent Dagstuhl seminar on graphs and big data. It’s a good time to be taking a step back, and looking farther forward. 

And, as I joked to Victor Lee from Tigergraph, when talking recently to their standards group about this move — there’s no better time in life to do a PhD than when it can have no impact on your career!

Looking back 

In the last three and more years, I’ve been privileged to work with many very talented and intellectually honest people at Neo4j, from whom I’ve learnt a very great deal. It’s an invidious business, naming names, but I think there are eight people I do need to mention in this context.

One is Andrés Taylor, who created the original Cypher language. His shoulders have a fair bit to answer for.

The Query Languages Standards and Research team have been my day-to-day colleagues, teachers and endlessly engaging intellectual collaborators: Petra Selmer, Stefan Plantikow, Tobias Lindaaker and Hannes Voigt. Mats Rydberg used to be a member of that team's predecessor, the Neo4j Cypher Language Group. (He has gone on to lead the engineering team that produced Cypher for Apache Spark, which contains composable graph queries, named graphs and views, and graph types; that team is now productizing graph algorithms in the Neo4j Data Science offering.)

Finally two people whose reasoned and emotional conviction about the need for a property graph language standard has never wavered, and whose support has been critical (and far from automatic): Philip Rathle, the head of product at Neo4j, and Emil Eifrem, the CEO. They made openCypher happen back in 2015 (enabling around a dozen separate implementations of the Cypher language since then), and they've backed the evolution towards GQL in the last couple of years. Those are significant (and far-sighted) bets for a young, fast-growing company with plenty of other, day-to-day problems and opportunities at hand.

I’d also like to thank the technical staff and management of three other companies, each of which has played a critical role in this first phase of a big, long journey: Oracle, TigerGraph and Redis Labs. There have been very many twists and turns (and arguments), but the energy and determination to achieve new standard platforms for property graph data processing which has been exhibited by these companies, each in their own way, has been instrumental. The courtesy and appetite for collaboration of these good colleagues has been inspiring. There are too many to name, but they know who they are!

I hope that in ten years or so, a lot of people in the database industry, and a lot of users of graph data management services, are going to have a lot to thank all these people for. They haven’t just "helped": they’re making big new things happen.

That takes courage — a quality that is rarer than vision or technical understanding. There are always seemingly good reasons not to conduct hard and risky projects, which challenge received wisdom and incumbent expertise, and inevitably expose ignorance. There is no certainty about the success of the GQL project, but there are good objective and subjective grounds for optimism.

Looking forward

When Philip Rathle and I were talking at one point about my (long-planned and very amicable) departure from Neo4j, we asked ourselves who might be able to help the company (and transitively, the wider industry) organize the next stage. GQL has to become an actual specification, with a broad technical consensus behind the words on paper, and with multiple implementations beginning to emerge from several vendors and in open-source and research projects. 

Keith Hare, of JCC Consulting, has been the convenor of the SQL standards committee, ISO/IEC JTC 1/SC 32 WG3, for the past dozen years. He is also vice-chair of the U.S. INCITS DM32 technical working group for data management, which is where the bulk of technical work on Database Languages has taken place historically. 

WG3 is now responsible for the graph extensions to SQL (SQL/PGQ) and for the GQL project. 

Keith has been instrumental, and increasingly active, in all the efforts to address property graph data in the formal standards arena. He co-authored the WG3 presentation on graph data standards (along with Jim Melton and Jan Michels) that was put forward at a Linked Data Benchmark Council (LDBC) meeting at SAP’s HQ, in early 2017. That initiative led to SQL/PGQ. Keith was also importantly visible as a leading voice for GQL at the Berlin W3C workshop on graph data management standards in March 2019, and has spoken on the topic most recently at Graphorum in Chicago last autumn, among other venues. 

I’m delighted to hear that Keith will now be playing a big role in Neo4j’s work in this area in the coming year.

Oracle staff like Don Deutsch, Jim Melton and Jan Michels, or Karl Schendel from Actian, are able to simultaneously represent their company and work very effectively for the wider database industry in the international standards process. Similarly, Keith is going to continue his (unpaid) efforts as WG3 convenor, but take on direct (paid) responsibility for Neo4j’s public standards work. This will be a powerful complement to the company’s existing efforts on technical design, specification writing/editing and alignment of the emerging GQL standard with the continually evolving Neo4j database product.

Standards, products and users

Two example illustrate the point about standards and product convergence for companies like Neo4j. The emerging thinking about GQL schema or “graph types” aligns with the need to define resources for role-based access control in Neo4j 4.0 and beyond. Similarly the Fabric facility in Neo4j 4.0 shows the power of multiple named graphs and graph views (both sub-graphs and “super graphs”), which are key concepts in the emerging GQL design. Similar observations about current and future features aligning with the emerging standard can be made about other company’s products.

This standards work is practical, needed and important for customers and users. 

People want to do more and do better with graph data services and workloads. A standard that is informed by the experience and shape of the core of SQL gives proven features and invaluable familiarity; one that is made by forward-looking vendors and researchers working with a more potent model gives users new opportunities.

So, it’s very good to see a constant and reinforced commitment from Neo4j to the cause of the GQL standard. I’m looking forward to continuing interactions and discussions, especially in the context of LDBC community efforts (see below) … but I am actually also looking forward to spending more time with my family! 

Supporting the work of LDBC

Linked Data Benchmark Council (LDBC) provides the organisational frame for GQL community efforts. It also has a formal liaison with WG3, which allows for mutual information exchanges, and gives a route for the wider graph data community to influence and enrich the work of the standards authors, editors and reviewers.

A good example of LDBC’s role is the work from 2015-2018 of its Query Language Task Force, which produced G-CORE: A Core for Future Graph Query Languages).

This approach is continued in working groups set up to support the genesis and design of GQL, which LDBC agreed to sponsor at its last board meeting half a year ago. An instance is the Property Graph Schema working group led by Juan Sequeda at data.world and Jan Hidders from Birkbeck (which is holding its third face-to-face meeting at the end of this month in Brussels).

LDBC brings industry and academia together, internationally and pretty informally. It is able to look ahead, and to chart an indicative intellectual roadmap that the official standards community can then exploit, over time. The Existing Languages group led by Petra Selmer is providing a comparative study to frame and verify the content of GQL as the design of the language shapes up. There are also interesting plans for another LDBC working group or task force to work on formal (denotational) semantics for GQL, to be led by Leonid Libkin from the University of Edinburgh/ENS Paris. 

LDBC is also an ideal venue for practical open-source software collaborations to back up the paper specification. LBDC benchmarks are supported by abundant open-source tools. The experimental implementation of G-CORE is another example. I hope that in time openCypher's language tooling will spawn similar GQL community projects under the LDBC umbrella. Current work to make feature categories explicit in the Cypher TCK show the value that many implementations have found in this kind of "language engineering".

As the Vice-chair of LDBC, I have been putting a lot of work into professionalizing its administration and finances, and reviewing its IP processes and policies. The aim is that LDBC can reinforce and expand its benchmarking work, as well as widen its membership, and act as a vibrant liaison partner for ISO/IEC, and potentially for W3C. I personally am hoping to find a way of continuing to support LDBC's work, which intersects interesting research collaborations.

I view LDBC as a crossroads and meeting point for all who are interested in the future of graph data management. I hope to see you there at some point. Please feel free to message me on LinkedIn if you need any information or introductions.

Ian Chotakoo

IT, Data & Operations

4y

Best of luck with this Alastair, enjoy the adventure

Like
Reply
Michal Bachman

CEO at GraphAware | Serving intelligence analysts mission-critical insights from connected data | Trusted by democratic government agencies

5y

Congratulations Alastair and thank you for all your mega-valuable contributions to the graph community. All the best!

Like
Reply
Arjen P. de Vries 🕊️

Professor Information Retrieval and Research Director of ICIS @ Radboud University

5y

Congrats on the brave dive into PhD research life! You won't regret it! (Well, regrets won't ever last too long I'd say)

Like
Reply

Congratulations! Wonderful news... I feel like you're heeding a digital calling and following the great academic path of ANOTHER talented Green. May you find success and joy in this new future.

Like
Reply

A great read. Enjoy this next chapter!

Like
Reply

To view or add a comment, sign in

More articles by Alastair Green

  • The "Ontology Gap"

    I was looking forward to speaking at next week's Knowledge Graph Conference, but I had a stroke in early March, so I've…

    18 Comments
  • Fight or Align? RDF vocabularies and LPG schemas

    I am the author of the GQL Manifesto, the Vice-chair of LDBC, a member of the WG3 SQL and GQL standards committee, and…

    56 Comments
  • Graph patterns ➤ Projecting subgraphs

    LDBC TUC: a focus on graph data in China Shanghai -- We’ve recently come out of two long, interesting days at LDBC’s…

  • LDBC Technical Users Committee, 30-31 August

    Shipeng Qi of Ant Group, who is a board member of LDBC (Linked Data Benchmark Council), has done a great job of pulling…

    1 Comment
  • Open-source language tools for GQL

    By Alastair Green, Vice-chair LDBC, and author of the GQL Manifesto. 9 May 2024.

    4 Comments
  • GQL in code

    Lots of gratifying announcements about the GQL standard: Neo4j, TigerGraph, JTC 1, AWS/Neo4j, Memgraph, Stefan the…

    1 Comment
  • Are graph and relational enemies?

    First, news of some welcome progress in the field of database management: Major progress for SQL/PGQ and GQL standards…

    4 Comments
  • First GQL research implementation from Olof Morra at TU Eindhoven!

    The official GQL project started back in September 2019: almost exactly two years later we can now see the first…

    2 Comments
  • PostgreSQL, Oracle ... graph query language standards adoption begins

    The momentum of the twin graph database language standards, SQL/PGQ and GQL, is building. Since I posted last in…

    1 Comment
  • SQL ... and now GQL

    A standard query language for property graphs It's official. In June national standards bodies around the world…

    17 Comments

Insights from the community

Others also viewed

Explore topics