Real Story on Salesforce’s Zero-copy Architecture
Salesforce has been talking about its “zero-copy architecture” in its Data Cloud offering for some time, and recently announced a zero-copy “partner network” which includes most major data lakehouse vendors.
Zero-copy seems sexy on the surface because it obviates the need to copy enterprise customer data to a separate CDP store. With Salesforce, though, you always have to ask: what’s real and what’s not?
The Claim
Salesforce says that its Data Cloud offers zero-copy data integration through the Bring Your Own Lake (BYOL) data federation capability. They claim the approach provides direct access to customer data in partner systems, allowing near real-time data access without having to physically copy the data into the Data Cloud. Instead, an external Data Lake Object (DLO) is created. This serves as a metadata reference pointing to the data stored in the partner’s data warehouse or data lake.
A zero-copy architecture seems like a great idea. It’s elegant. It potentially saves money. What’s not to like about it?
But when it’s Salesforce, you can never be sure. I wrote about an earlier whistleblower complaint from a former executive suggesting that one of Salesforce’s much-vaunted innovations was actually a Potemkin village.
So you have to dig deeper. So we did that and found some interesting aspects of Salesforce’s implementation of zero-copy.
What Salesforce Says
This shows a schematic of “The Data Cloud Bring Your Own Lake (BYOL) data federation“, which essentially allows Salesforce and your lakehouse (like Snowflake) to share data without copying.
The documentation of this on Salesforce site has this to say:
When you deploy a data stream, an external data lake object (DLO) is created. The external DLO is a storage container with metadata for the federated data. The DLO acts as a reference and points to the data physically stored in the partner’s data warehouse or data lake. You can also opt for acceleration to improve performance. For more information, see Acceleration in Data Federation.
The documentation further explains this acceleration as
“When acceleration is enabled while creating a data stream, data is retrieved periodically from the partner. The data is persisted in the Accelerated data lake objects in Data Cloud.”
The documentation further states:
“You can use the partner data with many Data Cloud features. After you process a job, the resulting data persists in Data Cloud. “
Analysis: What’s Really Happening?
At RSG we always sit on your side of the table, and that means telling the real story on vendor claims. Let’s break down the various Salesforce claims here.
Recommended by LinkedIn
Claims Breakdown:
Zero-Copy Integration: Data is directly queried without being copied into Data Cloud.
Critical Analysis:
Data Federation and Usage: Partner data can be used with many Data Cloud features after mapping.
Critical Analysis:
Near Real-Time Access: Provides near real-time access to federated data.
Critical Analysis:
Conclusion
Salesforce's zero-copy data access might offer some potential future advantages in terms of reducing initial data duplication and potentially lowering storage costs. However, there are several caveats and potential challenges:
In summary, a savvy MarTech leader like you will critically assess these claims and conduct extensive testing to ensure your specific needs and performance expectations get met.
But there is a wider, business conclusion here. And that is that you should never default to Salesforce Data Cloud just because of an existing investment in a broader Salesfroce estate. This is what Salesforce (desperately) wants, but may not prove your best decision. Before defaulting to Data Could, be sure to test it vigorously, head-to-head with other offerings, using an agile methodology. At RSG we have templates and experience here; ping me for details.
Sources:
--
6moEvery data movement essentially is an ETL process. The zero-copy claim here is essentially ETL on-demand or ETL Just-in-time. The true zero-copy requires technologies like RDMA.
Seasoned Business & Technology Digital Transformation Leader | Sales | Strategy Consulting | Product Innovation (Digital CX/CRM - HLS) Healthcare & Life Sciences
7moExcellent insights on zero copy claims of salesforce. Thanks for sharing!
SVP - Cloud Engineering Operations (Dev/Prod/Sec/Corp/Fin/Data/AI-ML Ops)
8mo"And that is that you should never default to Salesforce Data Cloud just because of an existing investment in a broader Salesforce estate. This is what Salesforce (desperately) wants, but may not prove your best decision. Before defaulting to Data Could, be sure to test it vigorously, head-to-head with other offerings,..."
Great summary Apoorv Durga, Ph.D. The data modelling point is a key one - the set up seems to be very brittle at present - if you change source data location or structure the link will break. I suspect this could be improved in future but not yet. Would be interested if you see differences by source data location. My gut instinct is this will work better with AWS data sets, utilising native data federation capabilities. I understand SF built with AWS first and then rapidly expanded out, I suspect with a slightly weaker version, for connection with other environments (same principle, but more constrained set up and less optimal performance) Overall, if: 1) your use cases don’t require RT data 2) you are connecting to solid production data tables with good governance on change control 3) ideally your data is also in AWS 4) you have very high internal barriers to setting up and maintaining external sharing of your data … then this can be a helpful solution. Not revolutionary, but positive! No comment on how SF promote it to their clients and prospects!
Lead MarTech/MarOps & CRM Integrations Manager
9moAgain, with the new jargon thrown for the martech experts. So, even if they create a DLO and we can query, isn't that we can export the queried results or take snapshots. Reading all the critical analysis section, I think we can always trace back to the source data. How are CCPA or GDPR or other laws taken care in this whole ecosystem?