Spark SQL – Learning about basic string functions

Spark SQL – Learning about basic string functions

Problem

A big data engineer can transform data stored in files using Spark dataframe methods or Spark SQL functions. I chose to use the Spark SQL syntax since it is more widely used. Every language has at least three core data types: strings, numbers, and date/time. How do we manipulate strings using Spark SQL?

Solution

In the last tip, I reviewed the syntax for numeric Spark SQL functions. This tip focuses on the available string functions. The sheer number of string functions in Spark SQL requires them to be broken into two categories: basic and encoding. Today, we will discuss what I consider basic functions seen in most databases and/or languages.

Business Problem

Our manager has asked us to explore the syntax of the string functions available in Azure Databricks. I will execute Spark SQL using the magic command in a Python notebook. That way, we can see the output for a given input. After testing, I usually turn the Spark SQL into a string variable that can be executed by the spark.sql method.

During our exploration, we will discuss some written and digital content: Three Blind Mice, The Three Musketeers, and Star Wars. The first two appeared in books a long time ago. I have seen a couple of variations of the “Musketeers” at the movies during my lifetime. I fondly remember seeing Star Wars at my local theater in 1977. At the end of this article, the big data engineer will have a good overview of string functions.

Please see MS SQL TIPS article for details.

To view or add a comment, sign in

More articles by John Miner

  • Fabric Bites 05 - The tale of two REST APIs

    I recently had to install and configure the Fabric Unified Admin Monitoring (FUAM) framework for a client. I ran into a…

  • SQL Saturday NYC - Weekend Recap

    The morning drive down to NYC was very not easy with the rain, road construction, and car accidents. This picture is me…

    1 Comment
  • Fabric Warehouse Talk

    The community conference will take place at the Microsoft Technology Center located in downtown New York City. I will…

  • Fabric Bites 04 - Managing Lakehouse - Part B

    Today, we are going to investigate the semantic link libraries. This library comes in two parts.

    1 Comment
  • Fabric Bites 03 - Managing Lakehouse - Part A

    As a senior data architect, I want the ability to manage the Microsoft Fabric objects by using Python code. Today, we…

    1 Comment
  • Fabric Bites 02 - Managing Warehouses

    I am back in the saddle again after a long needed vacation away from work. Today, I want to talk about…

  • Fabric North American Online Conference

    I am pleased to announce I am speaking at this conference on 22 April 2025. Please use the following link to find out…

    3 Comments
  • Fabric Bites 01 - Case Insensitive Warehouses

    Live from Las Vegas, the start of a new set of articles about what is new in Fabric. Today, the product team talked…

    3 Comments
  • Parting is such sweet sorrow!

    Today is the last day of the MVP Summit 2025. I want to thank Rie Merritt, Betsy Webber, and Rochelle Sonnenberg for…

    3 Comments
  • Why use Tally Tables in the Fabric Warehouse?

    Technical Problem Did you know that Edgar F. Codd is considered the father of the relational model that is used by most…

Insights from the community

Others also viewed

Explore topics