Syncing with Postgres: Logical Replication vs. ETL

By Philippe Noël on August 8, 2025
If your organization uses Postgres as its primary data store, you’re in good company. Many teams gravitate towards Postgres for its performance, stability, and familiarity. But even the strongest Postgres champions recognize its limits when it comes to powering specialized workloads like search, analytics, or warehousing.
To support these kinds of workloads, most organizations introduce a dedicated system isolated from the primary. Typical examples include Elasticsearch for search; ClickHouse for analytics; Snowflake for data warehousing. These systems rely on ETL (Extract, Transform, Load) pipelines to receive data from Postgres.
Interestingly, there are also use cases where these systems can be Postgres themselves. A common example is for reporting, where a separate Postgres instance tuned for OLAP (online analytical processing) can be used. In such a world, you aren’t confined to using ETL. An alternative opportunity is to use logical replication — a feature native to Postgres — to directly mirror changes from the primary.
When choosing a specialized system, the method of data ingestion is an often-overlooked factor that has long-term implications for maintainability, reliability, and performance. Over the past two years, we’ve interviewed and worked with dozens of companies who have dealt with ETL and logical replication. In this piece, we’ll explore the advantages and some non-obvious but common pitfalls of each approach.
What is ETL?
ETL stands for Extract, Transform, and Load. When used as the interface between Postgres and non-Postgres systems:
- Extract means pulling data out of the source system (Postgres).
- Transform refers to reshaping that data to fit the schema or model of the destination (like Elasticsearch documents).
- Load is the final step: writing the transformed data into the sink system.
Where Does ETL Shine?
TLDR: Use when the sink isn't Postgres or you need heavy reshaping/multi-destination delivery.
ETL is the right tool when flexibility and interoperability between Postgres and non-Postgres systems are priorities. It offers several compelling advantages:
- Opportunity to enrich source data: ETL presents a natural entry point for you to tailor incoming data exactly as needed downstream by applying custom transformations. This is valuable in use cases that involve remodeling or augmenting data. Tools like dbt are often used to manage these transformations.
- Flexibility to optimize data model: ETL lets you reshape relational data into a structure optimized for the target system. Consider the advantages of denormalizing for document search or pre-aggregating for analytics: in these cases, ETL may be the most practical way to hit performance targets without contorting the source schema.
- Batch and stream support: ETL pipelines are most commonly run in batch mode, which can be the simplest and most cost-effective approach when real-time data isn’t required. That said, many tools also support real-time streaming when freshness is a priority.
- Ability to manage schema drift explicitly: With a dedicated transform layer, teams can design logic to handle schema changes (e.g., ignore, rename, or reshape fields) before they hit the sink.
Where Does ETL Falter?
- Maintenance can be daunting: There’s a common misconception that ETL is “set it and forget it.” In reality, ETL pipelines are fragile. Small schema changes or version mismatches between Postgres and downstream systems like Elasticsearch can silently break data flows. As a result, ETL pipelines require ongoing vigilance, so much so that the prospect of expanding the ETL surface area can have a chilling effect.
- Lack of built-in transaction handling: Most event-based ETL systems do not have an out-of-the-box method for handling deduplication, resolving update races, or replaying failed batches. With ETL, you’re handed a firehose of changes — and it's on you to ensure the downstream system doesn’t flood. Some teams embrace this control, building custom idempotency keys, versioning schemes, or last-write-wins logic. But these solutions are easy to get subtly wrong, especially in real-time systems where race conditions and partial failures are common.
- Backfill complexity is high: With logical replication, everything is inherently replayable. With ETL, this is not the case. Replaying often means re-extracting from the source and manually reapplying transformations. These backfills are time-consuming and error-prone — avoiding duplication or stale data requires careful coordination across every stage of the pipeline.
What is Logical Replication?
Logical replication is a built-in Postgres feature that streams row-level transactional changes from one Postgres instance to another. It works by decoding changes recorded in the Postgres write-ahead log (WAL) and sending them over a replication slot to replicas, where the changes are replayed in the order received.
Where Does Logical Replication Shine?
TLDR: Use when you want a faithful replica and minimal pipeline complexity.
If your sink is Postgres-based and you can leave the source data model untouched, logical replication promises a far simpler stack. Under logical replication, the sink is a literal replica of the source, which means:
- No third-party service required: Logical replication is built into Postgres, so you don’t need to manage external connectors like Kafka, Debezium, or an ETL tool. This reduces your dependency footprint and simplifies your system architecture.
- Lower operational costs: With fewer moving parts to deploy and monitor, teams spend less time firefighting and more time building. There's also no need to provision separate infrastructure for ETL pipelines or intermediate messaging layers.
- Near real-time data sync by default: Changes are streamed from the source Postgres instance to the replica as they happen. This allows downstream systems to access fresh data almost immediately, without waiting for a batch job or a transformation window. If real-time is an absolute requirement, logical replication can also be configured to be fully synchronous.
- Transactional and ordered: Logical replication will automatically pause if it encounters an unexpected change — like an insert that would create a primary key conflict — making it easier to detect and debug divergence. Additionally, changes are applied in commit order and within the boundaries of each transaction, which eliminates many classes of bugs that arise in event-based pipelines.
- Minimal backfill complexity: Backfilling tables in a Postgres replica can be configured to happen automatically when a subscription is created. Postgres bulk-copies each table from a consistent snapshot, then streams any changes that occurred during the copy until the replica is fully caught up.
Where Does Logical Replication Falter?
Logical replication isn’t for every use case, and has some limitations of its own.
- Schema coupling limits flexibility: Aside from the ability to specify row filters and column lists, the replica must mirror the source schema exactly. There’s no sophisticated transformation layer to reshape data for any downstream need.
- Single-thread apply process limits throughput1: Each subscription applies changes in a single thread, regardless of how many tables are replicated inside a subscription. A common pitfall is placing too many tables into one subscription instead of splitting across multiple subscriptions.
- It doesn’t replicate DDL2: Only row-level changes (INSERT, UPDATE, DELETE) are replicated—schema changes like
CREATE INDEX
orALTER TABLE
are not. The workaround is to run the same migration scripts on both your primary Postgres and the ParadeDB replica.
What Databases Support Postgres Logical Replication?
Some databases advertise “Postgres wire compatibility,” meaning they can accept SQL over the Postgres frontend/backend protocol. However, these systems typically still require ETL pipelines for ingestion. Logical replication relies on the replication sub-protocol and Postgres’s write-ahead log (WAL) format, which most wire-compatible systems do not implement.
A much smaller set of systems actually speak the replication protocol and can consume Postgres logical decoding output. In one camp are ETL/CDC tools themselves, whose job is to tap the logical replication stream and hand the changes off to another system. In another camp are databases built on actual Postgres internals that inherit full logical replication support:
Use Case | Postgres Variant |
---|---|
Data Warehouse | Redshift, Greenplum |
Search and Analytics | ParadeDB |
Vector Search | pgvector |
Time Series | Timescale |
For instance, ParadeDB is an Elasticsearch alternative built on Postgres. It’s designed for Elastic-style search and analytical queries and is fully compatible with Postgres logical replication.
If you’re curious how it works, check out our architecture — and don’t hesitate to give us a star.
Postgres mailing lists hint at improvements to parallel logical apply workers coming in Postgres 19.
DDL replication is one of Postgres’ most highly requested features, so it may land in future releases.