Scaling PostgreSQL Databases with Spring Boot: A Journey into Application-Level Sharding
Summary:
In our quest to handle millions of inserts per hour into our PostgreSQL database, we embarked on a journey to scale it horizontally. The solution? Application-level sharding using Spring Boot. We delved into two sharding techniques: table-level Partitioning and host-level sharding. With custom routing and Hibernate magic, we achieved high scalability, ensuring our database could handle the ever-growing load and complexities.
Imagine dealing with the challenge of handling millions of inserts per hour into a PostgreSQL database. As the data kept pouring in, we needed a scalable solution that could gracefully manage this ever-increasing load and growing complexities. Our answer? Application-level sharding, powered by Spring Boot. In this article, we'll dive into our journey of implementing table-level partitioning and host-level sharding, ensuring that our database remains robust and responsive.
Table Level Partitioning: Customizing SQL Queries
To implement table-level partitioning, we utilized the StatementInspector interface of org.hibernate.resource.jdbc.spi. This interface allows us to customize SQL queries on the fly based on certain conditions. Here's a snippet of the Java code:
@Slf4j
public class DynamicTableNamePartitioning implements StatementInspector {
@Override
public String inspect(String sql) {
final Optional<String> optionalShardKey = Optional.ofNullable(MDC.get(SHARD_KEY));
if (optionalShardKey.isPresent()) {
sql = sql.replace("table_name", "table_name_" + optionalShardKey.get());
}
return sql;
}
}
In this code, we dynamically modify SQL queries by appending the SHARD_KEY to table names, allowing us to distribute data across different tables. To integrate this inspector into Hibernate, we registered it in Hibernate properties as follows:
properties.put(STATEMENT_INSPECTOR, DynamicTableNamePartitioning.class);
Additionally, we can enable SQL query logging with:
properties.put(SHOW_SQL, true);
Database Sharding: Distributing Tables Across Nodes
Taking it a step further, we aimed to distribute these sharded tables over multiple independent database nodes. For this purpose, we employed AbstractRoutingDataSource, a powerful DataSource implementation that routes getConnection() calls to various target DataSources based on a lookup key. Here's how we achieved it:
Recommended by LinkedIn
public class CustomRoutingDataSource extends AbstractRoutingDataSource {
@Override
protected Object determineCurrentLookupKey() {
final String shardKey = Optional.ofNullable(MDC.get(SHARD_KEY)).orElse("0");
return Integer.parseInt(shardKey) % 2;
}
}
In the actual implementation, we will be going with a fixed shard key to database node mapping so that we don’t need to redistribute tables on adding new nodes.
In this code, we determine the current lookup key, which is essentially the database node where the data will be stored. By calculating the modulo of the shard key, we evenly distribute data across two database nodes.
To complete the setup, we registered the routing data source in the entity-manager and transaction manager configurations. Here's a simplified version of how it's done:
import static org.hibernate.cfg.AvailableSettings.*;
@Configuration
@EnableTransactionManagement
@EnableJpaRepositories(
basePackages = "...",
entityManagerFactoryRef = "multiEntityManager",
transactionManagerRef = "multiTransactionManager"
)
public class DataSourceConfig {
private final String PACKAGE_SCAN = "...";
@Value("${app.datasource-1.url}")
private String dataSource1Url;
@Value("${app.datasource-2.url}")
private String dataSource2Url;
@Value("${app.datasource.username}")
private String username;
@Value("${app.datasource.password}")
private String password;
private DataSource dataSourceDb1() {
return buildDataSource(dataSource1Url, username, password);
}
private DataSource dataSourceDb2() {
return buildDataSource(dataSource2Url, username, password);
}
private static HikariDataSource buildDataSource(String url, String username, String password) {
HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setUsername(username);
hikariConfig.setPassword(password);
hikariConfig.setJdbcUrl(url);
hikariConfig.setKeepaliveTime(Duration.ofMinutes(2).toMillis());
hikariConfig.setMaximumPoolSize(16);
return new HikariDataSource(hikariConfig);
}
@Bean(name = "multiRoutingDataSource")
public DataSource multiRoutingDataSource() {
Map<Object, Object> targetDataSources = new HashMap<>();
targetDataSources.put(0, dataSourceDb1());
targetDataSources.put(1, dataSourceDb2());
CustomRoutingDataSource multiRoutingDataSource = new CustomRoutingDataSource();
multiRoutingDataSource.setTargetDataSources(targetDataSources);
return multiRoutingDataSource;
}
@Bean(name = "multiEntityManager")
public LocalContainerEntityManagerFactoryBean multiEntityManager() {
LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
em.setDataSource(multiRoutingDataSource());
em.setPackagesToScan(PACKAGE_SCAN);
HibernateJpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter();
em.setJpaVendorAdapter(vendorAdapter);
em.setJpaProperties(hibernateProperties());
em.setPersistenceUnitName("multiEntityManager");
return em;
}
@Bean(name = "multiTransactionManager")
public PlatformTransactionManager multiTransactionManager() {
final LocalContainerEntityManagerFactoryBean localContainerEntityManagerFactoryBean = multiEntityManager();
JpaTransactionManager transactionManager = new JpaTransactionManager(localContainerEntityManagerFactoryBean.getObject());
transactionManager.setPersistenceUnitName(localContainerEntityManagerFactoryBean.getPersistenceUnitName());
return transactionManager;
}
private Properties hibernateProperties() {
Properties properties = new Properties();
properties.put(DIALECT, "org.hibernate.dialect.PostgreSQLDialect");
properties.put(STATEMENT_BATCH_SIZE, 200);
properties.put(ORDER_INSERTS, true);
properties.put(ORDER_UPDATES, true);
properties.put(BATCH_VERSIONED_DATA, true);
properties.put(HBM2DDL_AUTO, "none");
properties.put(STATEMENT_INSPECTOR,DynamicTableNamePartitioning.class);
properties.put(SHOW_SQL, true);
return properties;
}
}
With this setup, we successfully implemented both table-level sharding and host-level sharding using Spring Boot, ensuring high scalability and performance for our PostgreSQL database.
This journey into application-level sharding allowed us to handle immense data loads effectively and efficiently. By customizing SQL queries and distributing data across multiple database nodes, we've set the stage for our database to handle even greater complexities in the future.
Why Opt for Fixed Shard Key Mapping Over Dynamic Mapping?
Drawbacks
References:
Don't miss out on more insightful articles and updates on database scaling and other tech-related topics. Subscribe to the newsletter for future publications and stay ahead of the curve!
hospital at meenakshi hospital
1yThis is a great learning. Hopefully you achieve all goodness 🙏 in life
Paid Marketing Coordinator at Spotlight Marketing + Branding | Marketing Specialist | Digital Marketing
1yGreat post! Thank you for sharing!
Hiring | Scaling US stocks and Options at Pluang | ex-Paytm | DTU(formerly DCE)
1yMust read article. Thanks for sharing🙌🏼