Data Modeling In Cassandra

Data Modeling In Cassandra

" Read Speed = Minimize Seeks To The Disk "

  1. Data model: logical structure of the database and fundamentally determines in which manner data can be stored, organized and manipulated.
  2. Compare RDBMS Vs Cassandra

RDMB:

  • Normalize state and we remove redundancy, duplicates and finetune the state 
  • References make by keys -private key, public key
  • Performance tuning by partition and index-store 
  • data stores in DB file and seek time is too high because of random read on disk because data is not stored sequentially (spend one seek per reading)

Cassandra:

  • Denormalize data(because no concept of joins )
  • Performance tuning (top-down approach, we have to find the query pattern and then build model data)
  • keyspace:database(logical grouping of tables) column family:Table(structure given to data: we don't need to define the structure(optional:static and dynamic) )
  • Rows in RDBMS(64 KB) and column family (2-4 GB)
  • read/ write good performance because it just appends records in sequentially 

3. Keyspace 

  • logical grouping of column families
  • Having following attributes:

 a. Replication factor 

  b. Replication placement Strategy

   b.1- Simple strategy 

b.2-Networks strategy 

How my data and replicas stores in Cassandra ring?

  1. 1st copy decided my partitioner (this is on cluster level ex. Random partitioner(make sure data is equally divided)). Random partitioner: It will make sure data is equally divided by using key range (hash range by using MD-5 hash algorithm-128 bit long )
  2. The replica is decided by replication statement strategy
  • Simple strategy(It assume our cluster is a single rack and placed all replica in rack)
  • Network strategy: that our network may have data and each data may have racks. 

Cassandra writes:

Examples Of Data modelling:

1.Weather Station

create table temperature
(
weather_id text,
event_time timestamp,
tempreture text
Primary KEY (weather_id,event_time)
)

Primary KEY
1/ weather_id
2/ event_time Order

So Cassandra parttion by weather_id and order by event_time

2.Stock Data

create table stock_ticks
(
 symbol text,
 date  int,
 trade timeuuid,
 trade_details text,
 PRIMARY_KEY ((symbol, date), trade)
) with clustering order by (trade desc)

ex: composite primary key :(symbol, date)
Partation by composite primary key and order by trade

select * from stock_ticks where symbol='x' and date='20180201' limit 5

Result: rows with latest trades because Cassandra store data in descending order

To view or add a comment, sign in

More articles by Vijay Kadel (ヴィジェイ カデル)

  • Machine Learning

    It`s not a Rocket Science!!!!!!!! Machine Learning : In layman language, we can say Its a type of artificial…

    4 Comments
  • Apache Drill Architecture

    Its a low latency distributed query engine for large-scale datasets. Drill is designed to scale to several thousands of…

    1 Comment
  • YARN – Walkthrough

    Hadoop is divided into two parts: a)HDFS b)MapReduce MapReduce :Job Tracker and Task Tracker HDFS : Name Node and Data…

    7 Comments
  • Data Security And Governance On Data Lake

    Data lake: Collecting data from heterogeneous sources and storing data in decent format into lake (any file system)…

    9 Comments
  • Why Python For Big Data ?

    Believe Its a Game Changer :-) PyTh-On BigData engineers spends 80 % of time to analyze data and for code it is just of…

    5 Comments
  • Cassandra Architecture ,Installation And Datamodeling

    Cassandra Architecture First word i have to say about this lady,she is adorable and i am in love with her. This is…

  • Apache Hadoop Namenode High Availability

    Introduction To NN HA Prior to Hadoop 2.x, the NameNode was a single point of failure (SPOF) in an HDFS cluster.

  • Hadoop 2.x installaion through Cloudera(CDH)

    Steps for Cloudera Disable SELinux enabled by editing editing Download cludera manager and start by super user "sudo…

Insights from the community

Others also viewed

Explore topics