Introduction to Spark 3.0 - Part 3 : Data Loading From Nested Folders
API’s and libraries of the platform. This release sets the tone for next year’s direction of the framework. So understanding these few features is critical to understand for the ones who want to make use all the advances in this new release. So in this series of blog posts, I will be discussing about different improvements landing in Spark 3.0.
This is the third post in the series where I am going to talk about data loading from nested folders. You can access all posts in this series here.
TL;DR All code examples are available on github.
Data in Nested Folders
Many times we need to load data from a nested data directory. These nested data directories typically created when there is an ETL job which keep on putting data from different dates in different folder.
Data Engineer at AllianceBernstein | Python, PySpark, GenAI/LLMs, Databricks, SQL, Airflow, Azure, AWS, Azure OpenAI
5yVery crisp and clear as always. Thank you for sharing madhukara ! Really Helpful..