SlideShare a Scribd company logo
How to make data available for analytics
ASAP
Jens Röwekamp
Software Engineer
MariaDB Corporation
Markus Mäkelä
Senior Software Engineer
MariaDB Corporation
1. MariaDB ColumnStore
○ What is ColumnStore
2. Loading data into ColumnStore
○ Command Line Tools & SQL
○ Bulk Write API & Native Wrapper APIs
○ Bulk Write API and Apache Spark
○ Application integration via Data Adapters
3. Future Improvements
Quick Overview
2
MariaDB
ColumnStore
3
What is ColumnStore?
● Columnar storage
○ Efficient use of large and wide tables
■ SELECT column_1 FROM 2000_column_table
○ Columns split into 8M row blocks (extents)
● Massively parallel
○ Columns and extents read in parallel
○ Push work to worker nodes (PMs)
■ Predicate filtering
■ join processing
■ initial aggregation of data
● Designed for analytical workloads
4
● PM (Performance Module) nodes
○ Handles primitive jobs
○ Preliminary aggregation
○ Predicate filtering
● UM (User Module) nodes
○ Handle SQL statements
○ Combines primitive results
● Storage Layer
○ Local
○ SAN
○ GlusterFS
What is ColumnStore?
5
6
MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN | NAS | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data App
Application
MariaDB SQL
Front End
Distributed
Query Engine
Data Storage
Loading data into
ColumnStore
Command Line Tools & SQL
7
cpimport
● Command line client
● CSV input
● In all versions
● Local data loading
8
[root@cs ~]# cpimport test tmp1 << EOF
1|a
2|b
3|c
EOF
cpimport: Mode 1
● Data and cpimport on one UM/PM
9
Source
File
cpimport UM Node
PM
Node
PM
Node
PM
Node
...
cpimport: Mode 2
● cpimport on one UM/PM
● Partitioned data on PM
10
cpimport UM Node
PM
Node
PM
Node
PM
Node
...
Partitioned
Source File
Partitioned
Source File
Partitioned
Source File
cpimport: Mode 3
● Partitioned data and cpimport on PM
11
UM Node
PM
Node
PM
Node
PM
Node
...
Partitioned
Source File
Partitioned
Source File
Partitioned
Source File
cpimport cpimport
LOAD DATA INFILE
● Familiar SQL interface
● With autocommit on, it uses cpimport
● With autocommit off, it uses native DML
● LOAD DATA LOCAL INFILE uses native DML
12
MariaDB [test]> LOAD DATA INFILE
‘import.tbl’ INTO TABLE tmp1 FIELDS
TERMINATED BY ‘|’;
mcsimport
● Command line client
● CSV input
● Since ColumnStore 1.2
● Remote data loading
13
PS C:> mcsimport.exe test tmp1
C:data-to-importtable.csv -c
C:cs-configsColumnstore.xml
Loading data into
ColumnStore
Bulk Write API & Native Wrapper APIs
14
Language Bindings
● Simple C++ (11) API to bulk load data directly into the PMs
● Currently available on modern Linux distributions and Windows
● Other language bindings are implemented using SWIG which generates
efficient almost identical native implementations on top of the C++ library:
○ Java 8 (also providing Scala support)
○ Python 2 & 3
○ Other language bindings can be implemented in the future
15
System Configuration
● The API assumes the existence of a Columnstore.xml file in the system in order
to determine the system topology, hosts, and ports for the PM nodes.
● If you are running on a ColumnStore node, the adapter will work immediately
● For a remote host, you will need to copy the Columnstore.xml from a server
node.
● The API will need to be able to connect with the ProcMon (8800), WriteEngine
(8630), and DBRMController (8616) ports.
16
Core Classes
The following classes provide the core interface:
● ColumnStoreDriver: Entry point / connection
● ColumnStoreBulkInsert: Per table interface for writing a transaction
● ColumnStoreSystemCatalog: Table metadata retrieval
Language namespaces:
● C++ - mcsapi::
● Java - com.mariadb.columnstore.api
● Python - pymcsapi
17
Core Classes - ColumnStoreDriver
● Entry point and factory class for creating:
○ ColumnStoreBulkInsert objects to allow bulk write of a single transaction for a single
table
○ ColumnStoreSystemCatalog object to allow retrieval of table and column data
● Default constructor will look for Columnstore.xml in:
○ /usr/local/mariadb/columnstore/etc/Columnstore.xml
○ $COLUMNSTORE_INSTALL_DIR/etc/Columnstore.xml
● Alternatively pass path to Columnstore.xml as constructor argument to specify
non standard location
● Also able to list and clear ColumnStore table locks since version 1.2.2.
18
Core Classes - ColumnStoreBulkInsert
● Encapsulates bulk insert operations. Constructed for a single table and
transaction.
● Multiple instances can be created for multiple drivers but you can only have one
active per table per ColumnStore instance.
● Error handling is important. If you fail to commit or rollback, a ColumnStore
table lock will be left. This can be released manually with the cleartablelock
command or through ColumnStoreDriver.
● After completion getSummary returns details about the injection.
19
Core Classes - ColumnStoreSystemCatalog
● Allows retrieval of ColumnStore table and column metadata to allow for generic
manipulations.
20
ColumnStoreSystemCatalog.getTable(db, table)
ColumnStoreSystemCatalogTable.getColumnCount()
…
ColumnStoreSystemCatalogTable.getColumn(id | name)
ColumnStoreSystemCatalogTableColumn.getType()
…
ColumnStoreSystemCatalogTableColumn.getDefaultValue()
pymcsapi example
import pymcsapi
try:
driver = pymcsapi.ColumnStoreDriver()
bulk = driver.createBulkInsert("test", "t1", 0, 0)
for i in range(0,1000):
bulk.setColumn(0, i)
bulk.setColumn(1, 1000-i)
bulk.writeRow()
bulk.commit()
except RuntimeError as err:
bulk.rollback()
print("Error caught: %s" % (err,))
21
Loading data into
ColumnStore
Bulk Write API and Apache Spark
22
Apache Spark & MariaDB ColumnStore
● Enables best of breed approach:
○ In memory machine learning algorithms of Spark
○ Publish results to ColumnStore for ease of consumption with SQL tools such as
Tableau
● To read data from ColumnStore into Spark
○ JDBC connector
○ Spark SQL
● To write data from Spark into ColumnStore
○ Bulk Write API
○ ColumnStoreExporter
23
Export via ColumnStoreExporter
● Object on top of the Bulk Write API to export data with minimal lines of code.
● Requires the same structure of DataFrame/RDD and ColumnStore table.
● Methods:
○ generateTableStatement(DataFrame, [database], [table], [determineTypeLength])
○ export(database, table, DataFrame, [path to Columnstore.xml])
○ exportFromWorkers(database, table, RDD, [partitions], [path to Columnstore.xml])*
*only available in Scala
24
ColumnStoreExporter example
import columnStoreExporter
from pyspark.sql import SparkSession, Row
import mysql.connector as mariadb
spark = SparkSession.builder.appName("DataFrame export into ColumnStore").getOrCreate()
df = spark.createDataFrame(
spark.sparkContext.parallelize(range(0, 128)).map(lambda i: Row(number=i, ASCII=chr(i)))
)
25
ColumnStoreExporter example
try:
conn = mariadb.connect(user='root', database='', host='127.0.0.1', password='')
cursor = conn.cursor()
cursor.execute(columnStoreExporter.generateTableStatement(df, "test", "pyspark_export"))
except mariadb.Error as err:
print("Error during table creation: ", err)
finally:
if cursor: cursor.close()
if conn: conn.close()
columnStoreExporter.export("test","pyspark_export",df)
spark.stop()
26
Loading data into
ColumnStore
Application integration via Data Adapters
27
Pentaho Data Integration Adapter
● PlugIn for PDI and Pentaho Server.
● Allows the export of data from PDI transformations into ColumnStore.
● Dedicated session tomorrow at 10:10am.
28
29
30
31
MaxScale CDC Adapter
● Connects to MaxScale CDC service
○ MariaDB binary logs to ColumnStore data
● Replicate mode
○ Convenient
■ INSERTs via Bulk Insert API
■ UPDATE and DELETE via SQL interface
○ Intended for INSERT heavy workloads
● Transformation mode
○ Fastest
■ INSERT →{"type": "insert", ...}
■ DELETE → {"type": "delete, ...}
■ UPDATE → {"type": "update_before", ...} + {"type": "update_after", ...}
32
Replicate Mode
33
UM
PM
PM
PM
MaxScale
Transform Mode
34
UM
PM
PM
PM
MaxScale
f(x)
● Java connectivity via Kafka
○ Reads Confluent Avro serialized data from Kafka
○ Inserts into ColumnStore
○ Java → Kafka → ColumnStore
● One topic per table
○ Topics applied in parallel
Kafka to ColumnStore Adapter
35
Kafka to ColumnStore Adapter
36
Future Improvements
37
Integrated MaxScale CDC
38
● Live transformation
○ No storage required
○ Faster
● Transform & Replicate
● Planned for MaxScale 2.4
Integrated MaxScale CDC
39
UM
PM
PM
PM
MaxScale
f(x)
Performance enhancements
40
● Bulk Write API
○ Async loading
● mcsimport
○ Code and compiler optimizations
○ Pipelining
Summary
41
Ways to inject data into ColumnStore
● Command line tools & SQL
○ cpimport
○ mcsimport
○ LOAD DATA INFILE
● Bulk Write API
○ C++, Java and Python
○ ColumnStoreExporter for Apache Spark
● Data Adapters
○ Pentaho Data Integration
○ MaxScale CDC Adapter
○ Kafka to ColumnStore Adapter
42
THANK YOU!
43
44
Backup Slides
45
Further cpimport examples
46
mcsmysql -q -e 'select * from <source-table>;' -N <source-db> |
cpimport -s 't' <target-db> <target-table>
Import an InnoDB table into ColumnStore
aws s3 cp --quiet s3://dthompson-test/trades_bulk.csv - |
cpimport test trades -s ","
Import data from an AWS S3 bucket
Motivation
● Organizations need to make data available for analysis as soon as it arrives.
● Enable Machine learning results to be published and accessible by business
users through SQL based tools.
● Ease of integration whether custom or ETL tools.
47
Bulk Write API
Applications can use the bulk
write API to collect and write
data - on-demand data
loading
No need to copy CSV to
ColumnStore node -
simpler
Bypass SQL interface,
parser and optimizer -
faster writes
MariaDB Server
ColumnStore UM
Application
ColumnStore PM ColumnStore PMColumnStore PM
Write API Write API Write API
MariaDB Server
ColumnStore UM
Bulk Data Adapter
1. For each row
a. For each column
bulkInsert->setColumn
b. bulkInsert->writeRow
2. bulkInsert->commit
* Buffer 100,000 rows by default
48
MARIADB
PRESENTATION
PRESENTER NAME
Presenter Title
MariaDB Corporation
49
MARIADB
PRESENTATION
PRESENTER NAME
Presenter Title
MariaDB Corporation
50
MARIADB
PRESENTATION
PRESENTER NAME
Presenter Title
MariaDB Corporation
51
52
PURPOSE-BUILT
STORAGE
53
PURPOSE-BUILT STORAGE: MYROCKS
● SSD optimized: space, writes and lifetime
● Writes: trades random IO on writes for random IO on reads
● Storage: does not use a fixed page size (InnoDB is sector aligned: 4KB)
● Storage: has smaller metadata for primary key indexes
○ InnoDB: 13 bytes, and not compressed
○ MyRocks: 8 bytes + zero filling + prefix key encoding, then compressed
54
SCHEMA
EVOLUTION
55
SCHEMA EVOLUTION:
INVISIBLE COLUMNS
CREATE TABLE users(
id INT PRIMARY KEY,
name VARCHAR(50),
bio TEXT(2000) COMPRESSED,
secret VARCHAR(10) INVISIBLE);
SQL Server = HIDDEN (period columns only), DB2 = IMPLICITLY HIDDEN, ORACLE = INVISIBLE
56
TEMPORAL TABLES
AND QUERIES
57
THIS IS A TITLE
Lorem ipsum dolor sit amet, consectetur adipiscing
elit. Nam pretium augue nunc, quis bibendum ligula
molestie sit amet. Nam sed luctus tellus.
Praesent nec cursus ex, vel commodo tellus. Duis
tempus pharetra ante a ullamcorper. Curabitur
commodo purus eget tempus faucibus.
58
PURPOSE-BUILT STORAGE: WHY?
Purpose-built database
Relational
Database
(mixed)
Wide-column
Database
(write-intensive)
Document
Database
(scalable)
Columnar
Database
(analytical)
59
We are running more than 25 billion
queries an hour on MariaDB...the query
patterns change every hour.
—Tim Yim, Director of Operations, ServiceNow
60
61
62
Place Image Here
63
THANK YOU!
64
Place Image HereTHANK YOU!
65
Blue Azure
#0E6488
Sea Fresh
#96DDCF
Granite
#424F62
Brand Guidelines
Colors and Type
Open Seas
#2F99A3
Deep Ocean
#003545
Electric Eel
#AC74A
66
Brand Guidelines
Logos
67
Icons
Miscellaneous
68
Icons
Company
MariaDB Server ColumnStore MaxScale
69
Icons
Miscellaneous (Circle)
70
Icons
Miscellaneous (Filled)
71
Icons
Social
Facebook Twitter LinkedIn YouTubeGoogle+
72
Ad

More Related Content

What's hot (20)

M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
MariaDB plc
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
MariaDB plc
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologies
MariaDB plc
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB plc
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
MariaDB plc
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
MariaDB plc
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
MariaDB plc
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
MariaDB plc
 
Höchste Datenbankleistung durch Anpassung und Optimierung
Höchste Datenbankleistung durch Anpassung und OptimierungHöchste Datenbankleistung durch Anpassung und Optimierung
Höchste Datenbankleistung durch Anpassung und Optimierung
MariaDB plc
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresM|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
MariaDB plc
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
MariaDB plc
 
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
MariaDB Corporation
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
MariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
M|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With AutomationM|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With Automation
MariaDB plc
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
MariaDB plc
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
MariaDB plc
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
MariaDB plc
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologies
MariaDB plc
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB plc
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
MariaDB plc
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
MariaDB plc
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
MariaDB plc
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
MariaDB plc
 
Höchste Datenbankleistung durch Anpassung und Optimierung
Höchste Datenbankleistung durch Anpassung und OptimierungHöchste Datenbankleistung durch Anpassung und Optimierung
Höchste Datenbankleistung durch Anpassung und Optimierung
MariaDB plc
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresM|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
MariaDB plc
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
MariaDB plc
 
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
MariaDB Corporation
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
MariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
M|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With AutomationM|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With Automation
MariaDB plc
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
MariaDB plc
 

Similar to How to make data available for analytics ASAP (20)

M|18 Ingesting Data with the New Bulk Data Adapters
M|18 Ingesting Data with the New Bulk Data AdaptersM|18 Ingesting Data with the New Bulk Data Adapters
M|18 Ingesting Data with the New Bulk Data Adapters
MariaDB plc
 
What’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStoreWhat’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStore
MariaDB plc
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Taiwan User Group
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
Etl confessions pg conf us 2017
Etl confessions   pg conf us 2017Etl confessions   pg conf us 2017
Etl confessions pg conf us 2017
Corey Huinker
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
Yi Pan
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 Workshop
Yi Tseng
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
ScyllaDB
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import!
Nabil Nawaz
 
Les 18 space
Les 18 spaceLes 18 space
Les 18 space
Femi Adeyemi
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
MariaDB plc
 
M|18 Ingesting Data with the New Bulk Data Adapters
M|18 Ingesting Data with the New Bulk Data AdaptersM|18 Ingesting Data with the New Bulk Data Adapters
M|18 Ingesting Data with the New Bulk Data Adapters
MariaDB plc
 
What’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStoreWhat’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStore
MariaDB plc
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Taiwan User Group
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
Etl confessions pg conf us 2017
Etl confessions   pg conf us 2017Etl confessions   pg conf us 2017
Etl confessions pg conf us 2017
Corey Huinker
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
Yi Pan
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 Workshop
Yi Tseng
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
ScyllaDB
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import!
Nabil Nawaz
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
MariaDB plc
 
Ad

More from MariaDB plc (20)

MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
MariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
MariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
MariaDB plc
 
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024MariaDB München Roadshow - 24 September, 2024
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024MariaDB Paris Roadshow - 19 September 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
MariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
MariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
MariaDB plc
 
Ad

Recently uploaded (20)

AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?
Amara Nielson
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studiesTroubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Tier1 app
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with PrometheusMeet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Eric D. Schabell
 
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World ExamplesMastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
jamescantor38
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Tools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google CertificateTools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google Certificate
VICTOR MAESTRE RAMIREZ
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Sequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptxSequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptx
aashrithakondapalli8
 
How I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetryHow I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetry
Cees Bos
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Programs as Values - Write code and don't get lost
Programs as Values - Write code and don't get lostPrograms as Values - Write code and don't get lost
Programs as Values - Write code and don't get lost
Pierangelo Cecchetto
 
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdfProtect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
株式会社クライム
 
AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?
Amara Nielson
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studiesTroubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Tier1 app
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with PrometheusMeet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Eric D. Schabell
 
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World ExamplesMastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
jamescantor38
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Tools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google CertificateTools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google Certificate
VICTOR MAESTRE RAMIREZ
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Sequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptxSequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptx
aashrithakondapalli8
 
How I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetryHow I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetry
Cees Bos
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Programs as Values - Write code and don't get lost
Programs as Values - Write code and don't get lostPrograms as Values - Write code and don't get lost
Programs as Values - Write code and don't get lost
Pierangelo Cecchetto
 
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdfProtect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
株式会社クライム
 

How to make data available for analytics ASAP

  • 1. How to make data available for analytics ASAP Jens Röwekamp Software Engineer MariaDB Corporation Markus Mäkelä Senior Software Engineer MariaDB Corporation
  • 2. 1. MariaDB ColumnStore ○ What is ColumnStore 2. Loading data into ColumnStore ○ Command Line Tools & SQL ○ Bulk Write API & Native Wrapper APIs ○ Bulk Write API and Apache Spark ○ Application integration via Data Adapters 3. Future Improvements Quick Overview 2
  • 4. What is ColumnStore? ● Columnar storage ○ Efficient use of large and wide tables ■ SELECT column_1 FROM 2000_column_table ○ Columns split into 8M row blocks (extents) ● Massively parallel ○ Columns and extents read in parallel ○ Push work to worker nodes (PMs) ■ Predicate filtering ■ join processing ■ initial aggregation of data ● Designed for analytical workloads 4
  • 5. ● PM (Performance Module) nodes ○ Handles primitive jobs ○ Preliminary aggregation ○ Predicate filtering ● UM (User Module) nodes ○ Handle SQL statements ○ Combines primitive results ● Storage Layer ○ Local ○ SAN ○ GlusterFS What is ColumnStore? 5
  • 6. 6 MariaDB ColumnStore Architecture Columnar Distributed Data Storage Local Storage | SAN | NAS | EBS | Gluster FS BI Tool SQL Client Custom Big Data App Application MariaDB SQL Front End Distributed Query Engine Data Storage
  • 8. cpimport ● Command line client ● CSV input ● In all versions ● Local data loading 8 [root@cs ~]# cpimport test tmp1 << EOF 1|a 2|b 3|c EOF
  • 9. cpimport: Mode 1 ● Data and cpimport on one UM/PM 9 Source File cpimport UM Node PM Node PM Node PM Node ...
  • 10. cpimport: Mode 2 ● cpimport on one UM/PM ● Partitioned data on PM 10 cpimport UM Node PM Node PM Node PM Node ... Partitioned Source File Partitioned Source File Partitioned Source File
  • 11. cpimport: Mode 3 ● Partitioned data and cpimport on PM 11 UM Node PM Node PM Node PM Node ... Partitioned Source File Partitioned Source File Partitioned Source File cpimport cpimport
  • 12. LOAD DATA INFILE ● Familiar SQL interface ● With autocommit on, it uses cpimport ● With autocommit off, it uses native DML ● LOAD DATA LOCAL INFILE uses native DML 12 MariaDB [test]> LOAD DATA INFILE ‘import.tbl’ INTO TABLE tmp1 FIELDS TERMINATED BY ‘|’;
  • 13. mcsimport ● Command line client ● CSV input ● Since ColumnStore 1.2 ● Remote data loading 13 PS C:> mcsimport.exe test tmp1 C:data-to-importtable.csv -c C:cs-configsColumnstore.xml
  • 14. Loading data into ColumnStore Bulk Write API & Native Wrapper APIs 14
  • 15. Language Bindings ● Simple C++ (11) API to bulk load data directly into the PMs ● Currently available on modern Linux distributions and Windows ● Other language bindings are implemented using SWIG which generates efficient almost identical native implementations on top of the C++ library: ○ Java 8 (also providing Scala support) ○ Python 2 & 3 ○ Other language bindings can be implemented in the future 15
  • 16. System Configuration ● The API assumes the existence of a Columnstore.xml file in the system in order to determine the system topology, hosts, and ports for the PM nodes. ● If you are running on a ColumnStore node, the adapter will work immediately ● For a remote host, you will need to copy the Columnstore.xml from a server node. ● The API will need to be able to connect with the ProcMon (8800), WriteEngine (8630), and DBRMController (8616) ports. 16
  • 17. Core Classes The following classes provide the core interface: ● ColumnStoreDriver: Entry point / connection ● ColumnStoreBulkInsert: Per table interface for writing a transaction ● ColumnStoreSystemCatalog: Table metadata retrieval Language namespaces: ● C++ - mcsapi:: ● Java - com.mariadb.columnstore.api ● Python - pymcsapi 17
  • 18. Core Classes - ColumnStoreDriver ● Entry point and factory class for creating: ○ ColumnStoreBulkInsert objects to allow bulk write of a single transaction for a single table ○ ColumnStoreSystemCatalog object to allow retrieval of table and column data ● Default constructor will look for Columnstore.xml in: ○ /usr/local/mariadb/columnstore/etc/Columnstore.xml ○ $COLUMNSTORE_INSTALL_DIR/etc/Columnstore.xml ● Alternatively pass path to Columnstore.xml as constructor argument to specify non standard location ● Also able to list and clear ColumnStore table locks since version 1.2.2. 18
  • 19. Core Classes - ColumnStoreBulkInsert ● Encapsulates bulk insert operations. Constructed for a single table and transaction. ● Multiple instances can be created for multiple drivers but you can only have one active per table per ColumnStore instance. ● Error handling is important. If you fail to commit or rollback, a ColumnStore table lock will be left. This can be released manually with the cleartablelock command or through ColumnStoreDriver. ● After completion getSummary returns details about the injection. 19
  • 20. Core Classes - ColumnStoreSystemCatalog ● Allows retrieval of ColumnStore table and column metadata to allow for generic manipulations. 20 ColumnStoreSystemCatalog.getTable(db, table) ColumnStoreSystemCatalogTable.getColumnCount() … ColumnStoreSystemCatalogTable.getColumn(id | name) ColumnStoreSystemCatalogTableColumn.getType() … ColumnStoreSystemCatalogTableColumn.getDefaultValue()
  • 21. pymcsapi example import pymcsapi try: driver = pymcsapi.ColumnStoreDriver() bulk = driver.createBulkInsert("test", "t1", 0, 0) for i in range(0,1000): bulk.setColumn(0, i) bulk.setColumn(1, 1000-i) bulk.writeRow() bulk.commit() except RuntimeError as err: bulk.rollback() print("Error caught: %s" % (err,)) 21
  • 22. Loading data into ColumnStore Bulk Write API and Apache Spark 22
  • 23. Apache Spark & MariaDB ColumnStore ● Enables best of breed approach: ○ In memory machine learning algorithms of Spark ○ Publish results to ColumnStore for ease of consumption with SQL tools such as Tableau ● To read data from ColumnStore into Spark ○ JDBC connector ○ Spark SQL ● To write data from Spark into ColumnStore ○ Bulk Write API ○ ColumnStoreExporter 23
  • 24. Export via ColumnStoreExporter ● Object on top of the Bulk Write API to export data with minimal lines of code. ● Requires the same structure of DataFrame/RDD and ColumnStore table. ● Methods: ○ generateTableStatement(DataFrame, [database], [table], [determineTypeLength]) ○ export(database, table, DataFrame, [path to Columnstore.xml]) ○ exportFromWorkers(database, table, RDD, [partitions], [path to Columnstore.xml])* *only available in Scala 24
  • 25. ColumnStoreExporter example import columnStoreExporter from pyspark.sql import SparkSession, Row import mysql.connector as mariadb spark = SparkSession.builder.appName("DataFrame export into ColumnStore").getOrCreate() df = spark.createDataFrame( spark.sparkContext.parallelize(range(0, 128)).map(lambda i: Row(number=i, ASCII=chr(i))) ) 25
  • 26. ColumnStoreExporter example try: conn = mariadb.connect(user='root', database='', host='127.0.0.1', password='') cursor = conn.cursor() cursor.execute(columnStoreExporter.generateTableStatement(df, "test", "pyspark_export")) except mariadb.Error as err: print("Error during table creation: ", err) finally: if cursor: cursor.close() if conn: conn.close() columnStoreExporter.export("test","pyspark_export",df) spark.stop() 26
  • 27. Loading data into ColumnStore Application integration via Data Adapters 27
  • 28. Pentaho Data Integration Adapter ● PlugIn for PDI and Pentaho Server. ● Allows the export of data from PDI transformations into ColumnStore. ● Dedicated session tomorrow at 10:10am. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. MaxScale CDC Adapter ● Connects to MaxScale CDC service ○ MariaDB binary logs to ColumnStore data ● Replicate mode ○ Convenient ■ INSERTs via Bulk Insert API ■ UPDATE and DELETE via SQL interface ○ Intended for INSERT heavy workloads ● Transformation mode ○ Fastest ■ INSERT →{"type": "insert", ...} ■ DELETE → {"type": "delete, ...} ■ UPDATE → {"type": "update_before", ...} + {"type": "update_after", ...} 32
  • 35. ● Java connectivity via Kafka ○ Reads Confluent Avro serialized data from Kafka ○ Inserts into ColumnStore ○ Java → Kafka → ColumnStore ● One topic per table ○ Topics applied in parallel Kafka to ColumnStore Adapter 35
  • 36. Kafka to ColumnStore Adapter 36
  • 38. Integrated MaxScale CDC 38 ● Live transformation ○ No storage required ○ Faster ● Transform & Replicate ● Planned for MaxScale 2.4
  • 40. Performance enhancements 40 ● Bulk Write API ○ Async loading ● mcsimport ○ Code and compiler optimizations ○ Pipelining
  • 42. Ways to inject data into ColumnStore ● Command line tools & SQL ○ cpimport ○ mcsimport ○ LOAD DATA INFILE ● Bulk Write API ○ C++, Java and Python ○ ColumnStoreExporter for Apache Spark ● Data Adapters ○ Pentaho Data Integration ○ MaxScale CDC Adapter ○ Kafka to ColumnStore Adapter 42
  • 44. 44
  • 46. Further cpimport examples 46 mcsmysql -q -e 'select * from <source-table>;' -N <source-db> | cpimport -s 't' <target-db> <target-table> Import an InnoDB table into ColumnStore aws s3 cp --quiet s3://dthompson-test/trades_bulk.csv - | cpimport test trades -s "," Import data from an AWS S3 bucket
  • 47. Motivation ● Organizations need to make data available for analysis as soon as it arrives. ● Enable Machine learning results to be published and accessible by business users through SQL based tools. ● Ease of integration whether custom or ETL tools. 47
  • 48. Bulk Write API Applications can use the bulk write API to collect and write data - on-demand data loading No need to copy CSV to ColumnStore node - simpler Bypass SQL interface, parser and optimizer - faster writes MariaDB Server ColumnStore UM Application ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Bulk Data Adapter 1. For each row a. For each column bulkInsert->setColumn b. bulkInsert->writeRow 2. bulkInsert->commit * Buffer 100,000 rows by default 48
  • 52. 52
  • 54. PURPOSE-BUILT STORAGE: MYROCKS ● SSD optimized: space, writes and lifetime ● Writes: trades random IO on writes for random IO on reads ● Storage: does not use a fixed page size (InnoDB is sector aligned: 4KB) ● Storage: has smaller metadata for primary key indexes ○ InnoDB: 13 bytes, and not compressed ○ MyRocks: 8 bytes + zero filling + prefix key encoding, then compressed 54
  • 56. SCHEMA EVOLUTION: INVISIBLE COLUMNS CREATE TABLE users( id INT PRIMARY KEY, name VARCHAR(50), bio TEXT(2000) COMPRESSED, secret VARCHAR(10) INVISIBLE); SQL Server = HIDDEN (period columns only), DB2 = IMPLICITLY HIDDEN, ORACLE = INVISIBLE 56
  • 58. THIS IS A TITLE Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam pretium augue nunc, quis bibendum ligula molestie sit amet. Nam sed luctus tellus. Praesent nec cursus ex, vel commodo tellus. Duis tempus pharetra ante a ullamcorper. Curabitur commodo purus eget tempus faucibus. 58
  • 59. PURPOSE-BUILT STORAGE: WHY? Purpose-built database Relational Database (mixed) Wide-column Database (write-intensive) Document Database (scalable) Columnar Database (analytical) 59
  • 60. We are running more than 25 billion queries an hour on MariaDB...the query patterns change every hour. —Tim Yim, Director of Operations, ServiceNow 60
  • 61. 61
  • 62. 62
  • 66. Blue Azure #0E6488 Sea Fresh #96DDCF Granite #424F62 Brand Guidelines Colors and Type Open Seas #2F99A3 Deep Ocean #003545 Electric Eel #AC74A 66
  翻译: