SlideShare a Scribd company logo
Big Data Analytics
with MariaDB AX
Maria Luisa Raviol
Senior Sales Engineer
EMEA
Agenda
• The Task - Analytics – Why and what
• The Requirements – What do we need for analytics
• The Solution – Column Based Storage
• The Product – MariaDB AX and MariaDB ColumnStore
• The Uses – MariaDB ColumnStore in action
Why Analytics ?
• Get the most value of your data asset
• Faster Better decision making process
• Cost reduction
• New products and services
What is likely
to happen?
Why is it
happening?
Types of analytics
What is
happening?
What should I
do about it?
Descriptive: What happened ?
● Reports
○ Sales Report
○ Expense summary
● Ad-hoc requests to analyst
Diagnostics: Why did it happen
• Aggregates: aggregate measure over one or
more dimension
– Find total sales
– Top five product ranked by sales
• Roll-ups: Aggregate at different levels of
dimension hierarchy
– given total sales by city, roll-up to get sales by
state
• Drill-down: Inverse of roll-ups
– given total sales by state, drill-down to get total
by city
• Slicing and Dicing:
– Equality and range selections on one or more
dimensions
Predictive: What is likely to happen
• Sales Prediction
– Analyze data to identify trends, spot
weakness or determine conditions
among broader data sets for making
decisions about the future
• Targeted marketing
– what is likelihood of a customer buying
a particular product based on past
buying behavior
Big Data Analytics Use Cases
By industry
Finance
Identify trade patterns
Detect fraud and anomolies
Predict trading outcomes
Manufacturing
Simulations to improve design/yield
Detect production anomolies
Predict machine failures (sensor data)
Telecom
Behavioral analysis of customer calls
Network analysis (perf and reliability)
Healthcare
Find genetic profiles/matches
Analyze health vs spending
Predict viral oubreaks
The analytics workload
• Deals with data from a high level perspective
• Handles data in large groups of rows
– SELECTs data by date, customer location, product id etc.
– Data is loaded in batch or streamed in
– Data is mostly just INSERTed
• Dealing with individual data items is usually ineffective
• Data structures are optimized for analytics use and performance
• Data is sometimes purged, but just as often not
• Contains structured, semi-structured and sometimes unstructured data
• Data often comes from many different sources, internal and external
• Queries are ad-hoc, largely
• Transactions and ACID requirements are relaxed
Analytics database requirements
• Fast access to large amounts of data
• Scalable as data grows over time
– Analytics requirements increasing
– Regulatory requirements
– New data sources are added
• Load performance must be fast, scalable and predictable
• Data loading should be very flexible due to the different sources of data
– Some data loaded in batch, other is streamed
• Query performance also need to be scalable
• Data compression is a requirement
– Data size constraints, as well as read performance from disk
In summary, what do we need
• Something that can compress data A LOT
• Something that can be written to with fast and predictable performance
• Something that doesn't necessarily support transactions
– It doesn't hurt, but performance is so much more important
• Something that can support analytics queries
– Ad-hoc queries
– Aggregate queries
• Something that can scale as data grows
• Something that can still have a level of high availability
• Something that works with analytics tools, like Tableau, R etc.
Existing Approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses
Hadoop / NoSQL
LIMITED SQL SUPPORT
DIFFICULT TO
INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA
MANAGEMENT
Hard to use
The Solution
Distributed Column based storage
Row-oriented vs. Column-oriented format
• Row oriented
– Rows stored sequentially in
a file
– Scans through every record
row by row
• Column oriented:
– Each column is stored in a
separate file
– Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
Single-Row Operations - Insert
Row oriented:
new rows appended to
the end.
Column oriented:
new value added to
each file.
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs.
column. Batch load on column-oriented is faster (compression, no indexes).
Single-Row Operations - Update
Row oriented:
Update 100% of rows
means change 100%
of blocks on disk.
Column oriented:
Just update the blocks
needed to be updated
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Single-Row Operations - Delete
Row oriented:
new rows deleted
Column oriented:
value deleted from
each file
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Changing the table structure
Row oriented:
requires rebuilding of
the whole table
Column oriented:
Create new file for the
new column
Column-oriented is very flexible for adding columns, no need for a full rebuild
required with it.
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N
MariaDB ColumnStore Architecture
MariaDB ColumnStore
High performance columnar storage engine that supports a wide variety
of analytical use cases in highly scalable distributed environments
Parallel query
processing for distributed
environments
Faster, More
Efficient Queries
Single Interface for
OLTP and analytics
Easy to Manage and
Scale
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance
Easier Enterprise
Analytics
ANSI SQL
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for
data in motion , role based access and auditability
Full ANSI SQL
• No more SQL “like” query
• Support complex join, aggregation and window
function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated horizontal/vertical partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
Faster, More
Efficient Queries
Optimized for Columnar storage
• Columnar storage reduces disk I/O
• Blazing fast read-intensive workload
• Ultra fast data import
Parallel distributed query execution
• Distributed queries into series of parallel operations
• Fully parallel high speed data ingestion
Highly available analytic environment
• Built-in Redundancy
• Automatic fail-over
Parallel
Query Processing
MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
User Connections
User Module nUser Module 1
Performance
Module n
Performance
Module 2
Performance
Module 1
MariaDB
Front End
Query Engine
User Module
Processes SQL Requests
Performance Module
Distributed Processing Engine
Storage Architecture
• Columnar storage
– Each column stored as separate file
– No index management for query
performance tuning
– Online Schema changes: Add new column
without impacting running queries
• Automatic horizontal partitioning
– Logical partition every 8 Million rows
– In memory metadata of partition min and max
– Query engine performs partition elimination.
– No partition management for query
performance tuning
• Compression
– Accelerate decompression rate
– Reduce I/O for compressed blocks
Column 1
Extent 1 (8 million rows, 8MB 64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition
High Performance Query Processing
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection,
group by, and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell 2016-01-12 G
2 1 2 Monitor 5 200 LG 2016-01-13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
MariaDB ColumnStore
Analytics Use Cases
Healthcare / Life Science Industry
Genome analysis
• In-depth genome research for the dairy industry to improve production of milk and protein.
• Fast data load for large amount of genome dataset (DNA data for 7billion cows in US - 20GB per load)
Healthcare spending analysis
• Analyze 3TB of US health care spending for 155 conditions with 7 years of historical data
• Used sankey diagram, treemap, and pyramid chart to analyze trends by age, sex, type of care, and condition
Why MariaDB ColumnStore
• Strong security features including role based data access and audit plug in
• MPP architecture handles analytics on big data with high speed
• Easy to analyze archived data with SQL based analytics
• Does not require DBA to index or partition data
Telecommunication Industry
Customer behavior analysis
• Analyze call data record to segment customers based on their behavior
• Data-driven analysis for customer satisfaction
• Create behavioral based upsell or cross-sell opportunity
Call data analysis
• Data size: 6TB
• Ingest 1.5 million rows of logs per day with 30million texts and 3million calls
• Call and network quality analysis
• Provide higher quality customer services based on data
Why MariaDB ColumnStore
• ColumnStore support time based partitioning and time-series analysis
• Fast data load for real-time analytics
• MPP architecture handles analytics on big data with high speed
• Easy to analyze the archived data with SQL based analytics
Writing Data
ETL
32
Bulk Data Load
cpimport, LOAD DATA INFILE
Bulk Data Export
mysql client, odbc, jdbc
Integration with MariaDB
ColumnStore cpimport and sql
interface
Bulk Data Load: cpimport
• Fastest way to load data into MariaDB ColumnStore
• Load data from CSV file
cpimport dbName tblName [loadFile]
• Load data from Standard Input
mysql -e 'select * from source_table;' -N db2 | cpimport destination_db
destination_tbl -s 't‘
• Load data from Binary Source file
cpimport -I1 mydb mytable sourcefile.bin
• Multiple tables in can be loaded in parallel by launching multiple jobs
• Read queries continue without being blocked
• Successful cpimport is auto-committed
• In case of errors, entire load is rolled back
Traditional way of
importing data into
any MariaDB storage
engine table
Bulk Data Load:
LOAD DATA INFILE
Up to 2 times slower
than cpimport for
large size imports
mysql> load data infile '/tmp/
outfile1.txt' into table destinationTable;
Query OK, 9765625 rows affected
(2 min 20.01 sec)
Records: 9765625 Deleted:
0 Skipped: 0 Warnings: 0
Either success or
error operation can
be rolled back
Sizing
Minimum Spec
UM
4 core,
32 G RAM PM
4 core,
16 G RAM
Typical Server spec
PM
8 core 64G RAM
UM
8 core, 264G RAM
Data Storage
External Data Volumes
• Maximum 2 data volume per IO
channel per PM node server
• up to 2TB on the disk per data
volume ≈ Max 4 TB per PM node
Local disk
Up to 2TB on the disk per
PM node server
DETAILED SIZING GUIDE
based on data size
and workload
Thank you
Ad

More Related Content

More from Kangaroot (20)

EDB Postgres in Public Sector
EDB Postgres in Public SectorEDB Postgres in Public Sector
EDB Postgres in Public Sector
Kangaroot
 
Deploying NGINX in Cloud Native Kubernetes
Deploying NGINX in Cloud Native KubernetesDeploying NGINX in Cloud Native Kubernetes
Deploying NGINX in Cloud Native Kubernetes
Kangaroot
 
Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.  Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.
Kangaroot
 
Zimbra at Kangaroot / OPEN{virtual}
Zimbra at Kangaroot / OPEN{virtual}Zimbra at Kangaroot / OPEN{virtual}
Zimbra at Kangaroot / OPEN{virtual}
Kangaroot
 
NGINX Controller: faster deployments, fewer headaches
NGINX Controller: faster deployments, fewer headachesNGINX Controller: faster deployments, fewer headaches
NGINX Controller: faster deployments, fewer headaches
Kangaroot
 
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot EDB Webinar Best Practices in Security with PostgreSQLKangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot
 
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Kangaroot
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
Kangaroot
 
There is no such thing as “Vanilla Kubernetes”
There is no such thing as “Vanilla Kubernetes”There is no such thing as “Vanilla Kubernetes”
There is no such thing as “Vanilla Kubernetes”
Kangaroot
 
Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)
Kangaroot
 
Hashicorp Vault - OPEN Public Sector
Hashicorp Vault - OPEN Public SectorHashicorp Vault - OPEN Public Sector
Hashicorp Vault - OPEN Public Sector
Kangaroot
 
Kangaroot - Bechtle kadercontracten
Kangaroot - Bechtle kadercontractenKangaroot - Bechtle kadercontracten
Kangaroot - Bechtle kadercontracten
Kangaroot
 
Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 8Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 8
Kangaroot
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
Kangaroot
 
Kubecontrol - managed Kubernetes by Kangaroot
Kubecontrol - managed Kubernetes by KangarootKubecontrol - managed Kubernetes by Kangaroot
Kubecontrol - managed Kubernetes by Kangaroot
Kangaroot
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
10 - MongoDB
10 - MongoDB10 - MongoDB
10 - MongoDB
Kangaroot
 
9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud
Kangaroot
 
8 - OpenShift - A look at a container platform: what's in the box
8 - OpenShift - A look at a container platform: what's in the box8 - OpenShift - A look at a container platform: what's in the box
8 - OpenShift - A look at a container platform: what's in the box
Kangaroot
 
7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic
Kangaroot
 
EDB Postgres in Public Sector
EDB Postgres in Public SectorEDB Postgres in Public Sector
EDB Postgres in Public Sector
Kangaroot
 
Deploying NGINX in Cloud Native Kubernetes
Deploying NGINX in Cloud Native KubernetesDeploying NGINX in Cloud Native Kubernetes
Deploying NGINX in Cloud Native Kubernetes
Kangaroot
 
Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.  Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.
Kangaroot
 
Zimbra at Kangaroot / OPEN{virtual}
Zimbra at Kangaroot / OPEN{virtual}Zimbra at Kangaroot / OPEN{virtual}
Zimbra at Kangaroot / OPEN{virtual}
Kangaroot
 
NGINX Controller: faster deployments, fewer headaches
NGINX Controller: faster deployments, fewer headachesNGINX Controller: faster deployments, fewer headaches
NGINX Controller: faster deployments, fewer headaches
Kangaroot
 
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot EDB Webinar Best Practices in Security with PostgreSQLKangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
Kangaroot
 
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Kangaroot
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
Kangaroot
 
There is no such thing as “Vanilla Kubernetes”
There is no such thing as “Vanilla Kubernetes”There is no such thing as “Vanilla Kubernetes”
There is no such thing as “Vanilla Kubernetes”
Kangaroot
 
Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)
Kangaroot
 
Hashicorp Vault - OPEN Public Sector
Hashicorp Vault - OPEN Public SectorHashicorp Vault - OPEN Public Sector
Hashicorp Vault - OPEN Public Sector
Kangaroot
 
Kangaroot - Bechtle kadercontracten
Kangaroot - Bechtle kadercontractenKangaroot - Bechtle kadercontracten
Kangaroot - Bechtle kadercontracten
Kangaroot
 
Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 8Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 8
Kangaroot
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
Kangaroot
 
Kubecontrol - managed Kubernetes by Kangaroot
Kubecontrol - managed Kubernetes by KangarootKubecontrol - managed Kubernetes by Kangaroot
Kubecontrol - managed Kubernetes by Kangaroot
Kangaroot
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
10 - MongoDB
10 - MongoDB10 - MongoDB
10 - MongoDB
Kangaroot
 
9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud
Kangaroot
 
8 - OpenShift - A look at a container platform: what's in the box
8 - OpenShift - A look at a container platform: what's in the box8 - OpenShift - A look at a container platform: what's in the box
8 - OpenShift - A look at a container platform: what's in the box
Kangaroot
 
7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic
Kangaroot
 

Recently uploaded (20)

Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdfComputer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
fizarcse
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
DNF 2.0 Implementations Challenges in Nepal
DNF 2.0 Implementations Challenges in NepalDNF 2.0 Implementations Challenges in Nepal
DNF 2.0 Implementations Challenges in Nepal
ICT Frame Magazine Pvt. Ltd.
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Scientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal DomainsScientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal Domains
syedanidakhader1
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
AI and Meaningful Work by Pablo Fernández Vallejo
AI and Meaningful Work by Pablo Fernández VallejoAI and Meaningful Work by Pablo Fernández Vallejo
AI and Meaningful Work by Pablo Fernández Vallejo
UXPA Boston
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
HusseinMalikMammadli
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdfComputer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
fizarcse
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Scientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal DomainsScientific Large Language Models in Multi-Modal Domains
Scientific Large Language Models in Multi-Modal Domains
syedanidakhader1
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
AI and Meaningful Work by Pablo Fernández Vallejo
AI and Meaningful Work by Pablo Fernández VallejoAI and Meaningful Work by Pablo Fernández Vallejo
AI and Meaningful Work by Pablo Fernández Vallejo
UXPA Boston
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
HusseinMalikMammadli
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Ad

Delivering fast, powerful and scalable analytics #OPEN18

  • 1. Big Data Analytics with MariaDB AX Maria Luisa Raviol Senior Sales Engineer EMEA
  • 2. Agenda • The Task - Analytics – Why and what • The Requirements – What do we need for analytics • The Solution – Column Based Storage • The Product – MariaDB AX and MariaDB ColumnStore • The Uses – MariaDB ColumnStore in action
  • 3. Why Analytics ? • Get the most value of your data asset • Faster Better decision making process • Cost reduction • New products and services
  • 4. What is likely to happen? Why is it happening? Types of analytics What is happening? What should I do about it?
  • 5. Descriptive: What happened ? ● Reports ○ Sales Report ○ Expense summary ● Ad-hoc requests to analyst
  • 6. Diagnostics: Why did it happen • Aggregates: aggregate measure over one or more dimension – Find total sales – Top five product ranked by sales • Roll-ups: Aggregate at different levels of dimension hierarchy – given total sales by city, roll-up to get sales by state • Drill-down: Inverse of roll-ups – given total sales by state, drill-down to get total by city • Slicing and Dicing: – Equality and range selections on one or more dimensions
  • 7. Predictive: What is likely to happen • Sales Prediction – Analyze data to identify trends, spot weakness or determine conditions among broader data sets for making decisions about the future • Targeted marketing – what is likelihood of a customer buying a particular product based on past buying behavior
  • 8. Big Data Analytics Use Cases By industry Finance Identify trade patterns Detect fraud and anomolies Predict trading outcomes Manufacturing Simulations to improve design/yield Detect production anomolies Predict machine failures (sensor data) Telecom Behavioral analysis of customer calls Network analysis (perf and reliability) Healthcare Find genetic profiles/matches Analyze health vs spending Predict viral oubreaks
  • 9. The analytics workload • Deals with data from a high level perspective • Handles data in large groups of rows – SELECTs data by date, customer location, product id etc. – Data is loaded in batch or streamed in – Data is mostly just INSERTed • Dealing with individual data items is usually ineffective • Data structures are optimized for analytics use and performance • Data is sometimes purged, but just as often not • Contains structured, semi-structured and sometimes unstructured data • Data often comes from many different sources, internal and external • Queries are ad-hoc, largely • Transactions and ACID requirements are relaxed
  • 10. Analytics database requirements • Fast access to large amounts of data • Scalable as data grows over time – Analytics requirements increasing – Regulatory requirements – New data sources are added • Load performance must be fast, scalable and predictable • Data loading should be very flexible due to the different sources of data – Some data loaded in batch, other is streamed • Query performance also need to be scalable • Data compression is a requirement – Data size constraints, as well as read performance from disk
  • 11. In summary, what do we need • Something that can compress data A LOT • Something that can be written to with fast and predictable performance • Something that doesn't necessarily support transactions – It doesn't hurt, but performance is so much more important • Something that can support analytics queries – Ad-hoc queries – Aggregate queries • Something that can scale as data grows • Something that can still have a level of high availability • Something that works with analytics tools, like Tableau, R etc.
  • 12. Existing Approaches Limited real time analytics Slow releases of product innovation Expensive hardware and software Data Warehouses Hadoop / NoSQL LIMITED SQL SUPPORT DIFFICULT TO INSTALL/MANAGE LIMITED TALENT POOL DATA LAKE W/ NO DATA MANAGEMENT Hard to use
  • 14. Row-oriented vs. Column-oriented format • Row oriented – Rows stored sequentially in a file – Scans through every record row by row • Column oriented: – Each column is stored in a separate file – Scans only the relevant columns ID Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F ID 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F SELECT Fname FROM People WHERE State = 'NY'
  • 15. Single-Row Operations - Insert Row oriented: new rows appended to the end. Column oriented: new value added to each file. Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs. column. Batch load on column-oriented is faster (compression, no indexes).
  • 16. Single-Row Operations - Update Row oriented: Update 100% of rows means change 100% of blocks on disk. Column oriented: Just update the blocks needed to be updated Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F
  • 17. Single-Row Operations - Delete Row oriented: new rows deleted Column oriented: value deleted from each file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F 6 Marvin Martian CA 91602 (818) 761-9964 26 M
  • 18. Changing the table structure Row oriented: requires rebuilding of the whole table Column oriented: Create new file for the new column Column-oriented is very flexible for adding columns, no need for a full rebuild required with it. Key Fname Lname State Zip Phone Age Sex Active 1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y 2 Yosemite Sam CA 95389 (209) 375-6572 52 M N 3 Daffy Duck NY 10013 (212) 227-1810 35 M N 4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y 5 Witch Hazel MA 01970 (978) 744-0991 57 F N Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Active Y N N Y N
  • 20. MariaDB ColumnStore High performance columnar storage engine that supports a wide variety of analytical use cases in highly scalable distributed environments Parallel query processing for distributed environments Faster, More Efficient Queries Single Interface for OLTP and analytics Easy to Manage and Scale Easier Enterprise Analytics Power of SQL and Freedom of Open Source to Big Data Analytics Better Price Performance
  • 21. Easier Enterprise Analytics ANSI SQL Single SQL Front-end • Use a single SQL interface for analytics and OLTP • Leverage MariaDB Security features - Encryption for data in motion , role based access and auditability Full ANSI SQL • No more SQL “like” query • Support complex join, aggregation and window function Easy to manage and scale • Eliminate needs for indexes and views • Automated horizontal/vertical partitioning • Linear scalable by adding new nodes as data grows • Out of box connection with BI tools
  • 22. Faster, More Efficient Queries Optimized for Columnar storage • Columnar storage reduces disk I/O • Blazing fast read-intensive workload • Ultra fast data import Parallel distributed query execution • Distributed queries into series of parallel operations • Fully parallel high speed data ingestion Highly available analytic environment • Built-in Redundancy • Automatic fail-over Parallel Query Processing
  • 23. MariaDB ColumnStore Architecture Columnar Distributed Data Storage User Connections User Module nUser Module 1 Performance Module n Performance Module 2 Performance Module 1 MariaDB Front End Query Engine User Module Processes SQL Requests Performance Module Distributed Processing Engine
  • 24. Storage Architecture • Columnar storage – Each column stored as separate file – No index management for query performance tuning – Online Schema changes: Add new column without impacting running queries • Automatic horizontal partitioning – Logical partition every 8 Million rows – In memory metadata of partition min and max – Query engine performs partition elimination. – No partition management for query performance tuning • Compression – Accelerate decompression rate – Reduce I/O for compressed blocks Column 1 Extent 1 (8 million rows, 8MB 64MB) Extent 2 (8 million rows) Extent M (8 million rows) Column 2 Column 3 ... Column N Data automatically arranged by • Column – Acts as Vertical Partitioning • Extents – Acts as horizontal partition Vertical Partition Horizontal Partition ... Vertical Partition Vertical Partition Vertical Partition Horizontal Partition Horizontal Partition
  • 25. High Performance Query Processing Horizontal Partition: 8 Million Rows Extent 2 Horizontal Partition: 8 Million Rows Extent 3 Horizontal Partition: 8 Million Rows Extent 1 Storage Architecture reduces I/O • Only touch column files that are in filter, projection, group by, and join conditions • Eliminate disk block touches to partitions outside filter and join conditions Extent 1: ShipDate: 2016-01-12 - 2016-03-05 Extent 2: ShipDate: 2016-03-05 - 2016-09-23 Extent 3: ShipDate: 2016-09-24 - 2017-01-06 SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’ GROUP BY Item Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode 1 1 1 Laptop 5 1000 Dell 2016-01-12 G 2 1 2 Monitor 5 200 LG 2016-01-13 G 3 2 1 Mouse 1 20 Logitech 2016-02-05 M 4 3 1 Laptop 3 1600 Apple 2016-01-31 P ... ... ... ... ... ... ... ... ... 8M 2016-03-05 8M+1 2016-03-05 ... ... ... ... ... ... ... ... ... 16M 2016-09-23 16M+1 2016-09-24 ... ... ... ... ... ... ... ... ... 24M 2017-01-06 ELIMINATED PARTITION ELIMINATED PARTITION
  • 27. Healthcare / Life Science Industry Genome analysis • In-depth genome research for the dairy industry to improve production of milk and protein. • Fast data load for large amount of genome dataset (DNA data for 7billion cows in US - 20GB per load) Healthcare spending analysis • Analyze 3TB of US health care spending for 155 conditions with 7 years of historical data • Used sankey diagram, treemap, and pyramid chart to analyze trends by age, sex, type of care, and condition Why MariaDB ColumnStore • Strong security features including role based data access and audit plug in • MPP architecture handles analytics on big data with high speed • Easy to analyze archived data with SQL based analytics • Does not require DBA to index or partition data
  • 28. Telecommunication Industry Customer behavior analysis • Analyze call data record to segment customers based on their behavior • Data-driven analysis for customer satisfaction • Create behavioral based upsell or cross-sell opportunity Call data analysis • Data size: 6TB • Ingest 1.5 million rows of logs per day with 30million texts and 3million calls • Call and network quality analysis • Provide higher quality customer services based on data Why MariaDB ColumnStore • ColumnStore support time based partitioning and time-series analysis • Fast data load for real-time analytics • MPP architecture handles analytics on big data with high speed • Easy to analyze the archived data with SQL based analytics
  • 31. Bulk Data Load cpimport, LOAD DATA INFILE Bulk Data Export mysql client, odbc, jdbc Integration with MariaDB ColumnStore cpimport and sql interface
  • 32. Bulk Data Load: cpimport • Fastest way to load data into MariaDB ColumnStore • Load data from CSV file cpimport dbName tblName [loadFile] • Load data from Standard Input mysql -e 'select * from source_table;' -N db2 | cpimport destination_db destination_tbl -s 't‘ • Load data from Binary Source file cpimport -I1 mydb mytable sourcefile.bin • Multiple tables in can be loaded in parallel by launching multiple jobs • Read queries continue without being blocked • Successful cpimport is auto-committed • In case of errors, entire load is rolled back
  • 33. Traditional way of importing data into any MariaDB storage engine table Bulk Data Load: LOAD DATA INFILE Up to 2 times slower than cpimport for large size imports mysql> load data infile '/tmp/ outfile1.txt' into table destinationTable; Query OK, 9765625 rows affected (2 min 20.01 sec) Records: 9765625 Deleted: 0 Skipped: 0 Warnings: 0 Either success or error operation can be rolled back
  • 34. Sizing Minimum Spec UM 4 core, 32 G RAM PM 4 core, 16 G RAM Typical Server spec PM 8 core 64G RAM UM 8 core, 264G RAM Data Storage External Data Volumes • Maximum 2 data volume per IO channel per PM node server • up to 2TB on the disk per data volume ≈ Max 4 TB per PM node Local disk Up to 2TB on the disk per PM node server DETAILED SIZING GUIDE based on data size and workload
  翻译: