Data Modeling In Cassandra
" Read Speed = Minimize Seeks To The Disk "
- Data model: logical structure of the database and fundamentally determines in which manner data can be stored, organized and manipulated.
- Compare RDBMS Vs Cassandra
RDMB:
- Normalize state and we remove redundancy, duplicates and finetune the state
- References make by keys -private key, public key
- Performance tuning by partition and index-store
- data stores in DB file and seek time is too high because of random read on disk because data is not stored sequentially (spend one seek per reading)
Cassandra:
- Denormalize data(because no concept of joins )
- Performance tuning (top-down approach, we have to find the query pattern and then build model data)
- keyspace:database(logical grouping of tables) column family:Table(structure given to data: we don't need to define the structure(optional:static and dynamic) )
- Rows in RDBMS(64 KB) and column family (2-4 GB)
- read/ write good performance because it just appends records in sequentially
3. Keyspace
- logical grouping of column families
- Having following attributes:
a. Replication factor
b. Replication placement Strategy
b.1- Simple strategy
b.2-Networks strategy
How my data and replicas stores in Cassandra ring?
- 1st copy decided my partitioner (this is on cluster level ex. Random partitioner(make sure data is equally divided)). Random partitioner: It will make sure data is equally divided by using key range (hash range by using MD-5 hash algorithm-128 bit long )
- The replica is decided by replication statement strategy
- Simple strategy(It assume our cluster is a single rack and placed all replica in rack)
- Network strategy: that our network may have data and each data may have racks.
Cassandra writes:
Examples Of Data modelling:
1.Weather Station
create table temperature
(
weather_id text,
event_time timestamp,
tempreture text
Primary KEY (weather_id,event_time)
)
Primary KEY
1/ weather_id
2/ event_time Order
So Cassandra parttion by weather_id and order by event_time
2.Stock Data
create table stock_ticks
(
symbol text,
date int,
trade timeuuid,
trade_details text,
PRIMARY_KEY ((symbol, date), trade)
) with clustering order by (trade desc)
ex: composite primary key :(symbol, date)
Partation by composite primary key and order by trade
select * from stock_ticks where symbol='x' and date='20180201' limit 5
Result: rows with latest trades because Cassandra store data in descending order