Snowflake: Data Loading
This piece is an introduction to Snowflake data loading. Snowflake has four options for data loading.
1. Web Interface: limited amounts of data
2. Bulk Data: SQL and SnowSQL using the command-line interface
3. Automated: bulk load data using Snowpipe
4. Third-party: bulk load data from external sources
Here, the focus is bulk loading of structured data from delimited plain text files such as CSV files, although Snowflake supports loading data from semi-structured data.
Bulk loading is performed in two phases. First, upload data files to a location where Snowflake can access the files. This is referred to as staging. Then load the data from staged files into a designated table.
Recommended by LinkedIn
Snowflake hold files in a 'temporary' location called stages. Each table and user has a stage. Snowflake also supports creating named stages. Internal stages enable the secure storage of data files without requiring external resources. However, if data files are already staged in a supported cloud storage location, skip step one and load directly from external locations. Just provide the URL for the location, as well as access credentials if the location is protected. Named stages can also be created to point at external locations.
Phase two requires running a virtual warehouse. The warehouse extracts data from each file and inserts it as rows in a table. Warehouse size impacts load performance times. A bigger warehouse will process these tasks quicker (see post on Virtual Warehouses for more information on sizing requirements).
As alluded to, Snowpipe can be used to bulk load data from file stages and external locations. Snowpipe uses the copy command but with additional features that let you automate the process. Snowpipe also eliminates the need for a virtual warehouse. Instead, it uses internal compute resources to load data as files are staged continuously.
Third-party tools to bulk load data is often called extract, transform, load (ETL). An expanding ecosystem of applications and drivers for loading data from a range of external sources can be found under the Partner Connect tab (Fivetran makes light work of connecting to multiple data sources with a quick and straightforward setup).
The fourth option for loading data is the data loading wizard in the Snowflake Web UI. Select the table and click the load button. The wizard simplifies data loading by combining the two phases into a single operation and automatically deletes the stage files once the data has made its way to the designated table. The wizard, however, is only intended for loading a limited number of files with small amounts of data. For larger quantities of data, it is best to use the other options.
This concludes the introduction to data loading.