Typically, data replication in PostgreSQL is done between an active (primary) database and one or more standby databases. While this is usually enough for many applications and to enable high availability, sometimes you need to replicate your data across more than one active database. With a multiactive database cluster, you can distribute not only your read queries but also your inserts and updates to multiple databases in a cluster. This enables parallel workloads and the possibility of bringing the data closer to the end users, leading to lower latency and modernized, evenly distributed architectures.
[ Learn best practices for implementing automation across your organization. Download The automation architect's handbook. ]
PostgreSQL version 9.6, released in 2016, included a community extension called BDR that had some initial bidirectional replication support. The BDR extension was not updated or maintained in subsequent versions of PostgreSQL. Other databases provide support for multiactive clusters, and some products provide support for PostgreSQL, but there has not been a community-licensed, Postgres-native solution for multiactive replication. That changed following the recent launch of pgEdge Distributed PostgreSQL, a fully distributed database optimized for the network edge based on the standard and popular open source PostgreSQL database.
Technical background
Physical replication uses exact block addresses and byte-by-byte replication. This has been commonly used in PostgreSQL for creating a read replica that can be used as a hot standby or an additional read-only database for the application.
By contrast, logical replication involves replicating data objects and their changes by using their primary key. Rather than shipping the write-ahead log (WAL) files for all current states of all objects in the database to an exact matching database in recovery mode, logical replication uses publishers and subscribers to replicate inserts, updates, and deletes on specified objects. As a result, logical replication can be configured to be more finely grained, making it a powerful tool for modern databases.
Why logical replication enables multiactive replication
Logical replication allows you to limit replication to a specific database and provides options for row-level filtering. Logical replication therefore can be configured to replicate from database a to database b, and back from database b to database a. This multidirectional logical replication means that neither database has to be in recovery mode, and writes can happen to each with bidirectional replication between them to keep them in sync.
This means having multiple write endpoints for the application. In addition to providing a multiactive cluster, the version of the database becomes less important, meaning you could have a version 14 database replication and a version 15 database while being able to write to both, reducing downtime.
[ Learn about upcoming webinars, in-person events, and more opportunities to increase your knowledge at Red Hat events. ]
What Spock brings to the table
pgEdge's Spock extension introduces asynchronous multiactive replication with enhanced conflict resolution and conflict avoidance. It also provides better management, monitoring statistics, and integration.
You need conflict resolution when updates are happening on multiple databases at the same time. Updating a row in database a and performing a different update to the same row on database b creates conflict. With Spock, the last update wins, and the row will contain the value of the update from the latest commit without any failures. Spock also provides a resolutions table where conflict resolutions are recorded and can be monitored and analyzed.
Another conflict can arise from updates to an incrementing or sum field. For example, if 5 is added to a field on database a and 10 is added to that same field on database b, using the last-update-wins option would leave a total of plus 5 or 10, rather than the expected plus 15. Spock accounts for this with conflict-free delta-apply columns, altering this column with the delta of the update. The logical replication will ship the delta to the other database, so that the final value of the field in the above example will be the correct plus 15.
Spock also provides support for partitioned tables. Spock allows you to add either the parent table or specific partition tables to replication. This allows for geosharding, where certain partitions can be replicated between countries while other partitions remain only on the original country.
What's next?
Spock is open and pgEdge Community Licensed, which is similar to the Confluent Community License. This license allows unlimited end-user usage, including in production, but prevents third parties from packaging and selling a competitive cloud product.
Spock has many more features on the way. Right now, Spock can recover from intermittent outages: The streaming replication will persist, and the database will catch up and synchronize again. Planned improvements will make it easy to spin up full replacement nodes after a catastrophic node failure with near zero downtime.
Spock is a part of pgEdge Distributed PostgreSQL, available as either a managed database as a service called pgEdge Cloud or the self-hosted pgEdge Platform software.
The code and documentation for Spock can be found in the pgEdge Git repository.
[ Try OpenShift Data Science in our Developer sandbox or in your own cluster. ]
About the authors
Cady is a Software Engineer with pgEdge who has spent the past ten years working with PostgreSQL and listening to podcasts.
Denis is a serial postgres entrepreneur and the co-founder and CTO of pgEdge.
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit