6 Amazing features of Snowflake

Snowflake comes with many outstanding features; here, I present the striking features that caught my attention in solving most of our Big data challenges and are so different from the BigData ecosystem.

Many more amazing features are getting added day by day.

1. Architecture of Snowflake

When Cloud Computing came up more than a decade ago, one of the main features introduced was segregated storage from computing. But then, I did not understand how this would help anyone. But after looking at snowflake architecture that follows the same approach, I am amazed by its power by separating compute & storage.

 When I was architecting Big Data applications, I faced many challenges; as mentioned below, the system was so complex it needed expertise across so many technologies and make them work in synergy. A few significant and challenging decisions for Big Data platforms to work are,

1.  Platform Setup on-premise or on Cloud, if Cloud which one (AWS/Azure/GCP)?

2.  Which BigData distribution to use?

3.  Which processing engine to go with MapReduce, Pig, Spark, Flink, etc.?

4.  Which SQL Engine to go with Hive, Tez, Impala, etc.?

5.  To go with SQL MPP DB or No SQL DB (Cassandra, HBase, etc.)

6.  How do I handle Security & Governance across different technologies was another big challenge in itself?

7.  These rapidly changing technologies might result in breaking changes in architecture.

The above BigData challenges got solved with a single silver bullet, and that's none other than Snowflake. On Cloud, easy learning curve needs to know SQL, near-zero maintenance, not only on infra side but also on performance optimization. Unified Governance, Data sharing, and a lot more features to talk about. I wonder why I could not come up with such an architecture even when the problem was on my plate to solve.

 2. Data Sharing

In more than a couple of decades, every company wanted to keep data for themselves; this is true even now to some extent. But some folks came up with the concept of monetizing the data - any organization who wanted to buy data from third-party providers like demographic data, weather data, etc. So I had to agree with a vendor to have the paperwork done and then start getting the data before developing ETL to get data into their systems.

Complex the process more chances of failure. Every company had to invest time in paperwork, invest time in building ETL/ maintaining it, and have data with 6hr/12hrs/24hrs old data. How about have a single copy of data that all companies can use with a simple SQL statement and always have access to real-time data without having to go through the hassle of building ETL pipelines? Yes, that's possible now with Snowflake's Data Sharing Feature.

Many use cases require the same data to be accessible by many teams, and everyone wants to do different kinds of processing on it to infer insights or use it in their applications. To cater to the needs of multiple teams, the Data team had to set up a set of ETL pipelines, maintain them, and the mammoth task of keeping all the copies of the data in sync, and it was next to impossible to have everything done at less cost.

Some tried to solve this issue with replication which had its challenges, though. How about maintaining a single copy of data so that there is no need to sync and save the cost of storage. I couldn't imagine this can be an option till I heard about the Data Sharing feature in Snowflake. Yes, with Snowflake, it is possible to maintain a single copy of data but accessed by many other teams without any ETL pipelines and have access to the latest data all the time.

 3. Data Security

When dealing with Big Data tools, so many tools/technologies had to be used to replace one single data warehouse. After so much effort, it was finally decided that we cannot replace a data warehouse with a Data Lake (big data tools) rather, both should co-exist. We could not solve many of the features like Row-level, column level security, the SLA's. Access management was next to impossible if it had to be handled across multiple technologies like Hive, HDFS, HBase/Cassandra, Spark, etc. But, again, with Snowflake, it's so much easy everything comes out of the box, and the developers must follow the best practices defined. So, we can concentrate more on bringing in the business value rather than just doing the technical/operational activity, which was not directly adding business value.

4. Data Cloning

In any application, the biggest challenge is to recreate the production issues in a lower environment. Unfortunately, we don't have the data with a similar scenario, or the amount of data present in production cannot be created in lower environments either for cost reasons.

When it comes to BigData, and humongous amount of data made it nearly impossible to recreate a production-like environment. This meant certain scenarios could not be tested on lower settings, and no one would realize the issue until it hits the clients/end users. But with the Zero copy cloning feature of Snowflake, we can easily create the production-like data within seconds, and it does not cost you storage. Wow, that's amazing. You can solve such a massive problem with little effort and no cost.

5. Time Travel Feature

With BigData or any other technology, if there was any requirement to execute any scripts in a production environment, we used to be very scared thinking if something goes wrong. So, we had to take backup of entire databases before making any changes, make sure the change is tested in all environments, etc. What if we say there is an option to travel back in time and get your data warehouse back to the same status as it was on a particular day/time, and all of this is possible just out of the box. This is exactly what Snowflake's Time Travel feature helps in achieving.

6. Change Data Capture

There was a time when we used to write a lot of logic and consume more resources to identify the delta change between the previous run to the current run, mainly when you don't have the last modified timestamp information on source data. Which is a prolonged and costly operation; what if I say this is all possible out of the box with Snowflake? Yeah, using streams and tasks in Snowflake, we can identify the changed data and deal with it very efficiently.

Upendra Goli

Data Strategist | Data Engineering | Data Analytics | Data Architect | Big Data | AWS Cloud | GCP Cloud | Azure Cloud | GENAI

3y

Very helpful information Kiran.. Thanks for sharing.. 👍

Soumi Basu

Customer Centric | Scrum Certified | Product Manager | Expert in Digital Transformation, Data Management, and Product Life Cycle Management for B2B-B2C |SaaS products

3y

Very nice Kiran

Kumar Chinnakali

Reimagining the Contact Center Management Systems with Empathy, Data, AI, and Scale.

3y

Great one Kiran Earalli

Vivek Nimje

Manager - Projects| Cloud Big Data Architect | GCP | AWS | Hadoop - Big Data| Spark | NoSQL

3y

I am really impressed with Snowflake's time travel feature.

Lokesh KV

E2E Data Analytics Project Handler || ETL-Alteryx, Pentaho-DI, KNIME & Azure ADF || Microsoft Fabric || Business Intelligence - Power-BI, Grafana, Tableau || Cloud Solutions|| SQL || Bigpanda ||Python || IC ||

3y

Well explained 👏 👌 👍

To view or add a comment, sign in

More articles by Kiran Earalli

  • Wondering Who is Happy in IT

    Architects - Are not happy thinking they are not getting the right work what an architect should do. Developers - Are…

    3 Comments

Insights from the community

Others also viewed

Explore topics