PySpark usage of like, ilike, rlike and not like

PySpark usage of like, ilike, rlike and not like

Using a sample pyspark Dataframe

No alt text provided for this image


ILIKE

(from 3.3.0)

SQL ILIKE expression (case insensitive LIKE). Returns a boolean Column based on a case insensitive match.

df1.filter(df1.firstname.ilike('%Ria')).collect()        
No alt text provided for this image


RLIKE

We can get similar match with RLIKE

df1.filter(df1.firstname.rlike('(?i)Ria$')).collect()        
No alt text provided for this image


LIKE

Case sensitive match

df1.filter(df1.firstname.like('%Ria')).collect()        


with like as expression

df = df1.filter("firstname like '%Ria%'")
df.collect()        
No alt text provided for this image


Not Like

There is nothing like notlike function, however negation of Like can be used to achieve this, using the '~'operator

df1.filter(~ df1.firstname.like('%Ria')).collect()        

SQL we can use *not like *

df = df1.filter("firstname not like '%Ria%'"
df.collect())        
No alt text provided for this image


Notebook

Luca Cavallini

AI Engineer & Quantum Ambassador at IBM

2y

This was useful, thank you!

Sundarraj T

Azure Data Engineer@TechM | Ex HCL | Ex Cognizant | PySpark | HDFS | Hive | Spark | Python | Sqoop | Azure Cloud | SQL | Hadoop Ecosystem

2y

Informative

To view or add a comment, sign in

More articles by Deepa Vasanthkumar

  • Using Symbols in Linkedin Posts

    Usually, I add 🌌🚩🎯 symbols in my posts, to emphasize ✔ or focus 👀 on important part of that. As we human, have a…

  • Generating incrementing numbers in pyspark

    monotonically_increasing_id monotonically_increasing_id is guaranteed to be monotonically increasing and unique, but…

    1 Comment

Insights from the community

Others also viewed

Explore topics