Alibaba brings full text search to its data warehouse with ParadeDB. Read their story.

Full Text Search over Postgres: Elasticsearch vs. Alternatives

By Ming Ying on July 31, 2024

Over the past six months, we’ve spoken to 50+ companies who are looking for a full text search (FTS) solution over data stored in Postgres. For many of these companies, the decision boils down to Elasticsearch vs. native Postgres full text search.

By working with these companies, we've identified the areas where each solution shines. Here’s what we’ve learned.

What is Full Text Search (FTS)?

Full text search is a technique that finds entries in a collection of text based on the presence of specific keywords and phrases. Most search engines like Elasticsearch use the BM25 algorithm to rank search results. BM25 considers how often a term appears and how unique that term is across all documents.

Full text search is different from similarity search, also known as vector search, which searches for and ranks results by semantic meaning. Many modern applications use a combination of full text and similarity search. This practice is called hybrid search and can yield more accurate results.

Postgres Full Text Search

Postgres FTS is a native functionality available to all Postgres databases. It leverages the tsvector data type, which stores text as searchable tokens, and the GIN index, which improves search speeds.

The Good

  1. Simplicity. Postgres FTS requires no additional infrastructure and is available on all managed Postgres services like AWS RDS. In the long run, not having to orchestrate and manage an external search engine saves significant time and headache.
  2. Real-Time Search. With Postgres FTS, data is searchable immediately upon commit. This can be incredibly useful to companies building user-facing or latency-sensitive search experiences — for instance, e-commerce sites or fintechs.
  3. Postgres Transactions and MVCC. Postgres’ ACID transactions and multi-version concurrency control (MVCC) ensure that FTS results are reliable under concurrent access and frequent updates.

The Bad

  1. Feature Incomplete. The limited feature set of Postgres FTS may be a deal breaker for some companies. Missing features include BM25 scoring, relevance tuning, custom tokenizers, and faceting.
  2. Poor Performance Over Large Datasets. While Postgres FTS performs well over tables with a few million rows, performance degrades considerably over tables with tens of millions of rows.
  3. Transactional Overhead. Creating a GIN index over a column adds a small amount (usually milliseconds) of latency to transactions that affect that column.

Key Takeaway

Postgres FTS is ideal for search over small to medium-sized tables that do not require sophisticated FTS queries. We’re being intentionally vague about what “medium-sized” and “sophisticated” entail — the specifics vary depending on your performance requirements. Luckily, testing and migrating to/from Postgres FTS is fairly straightforward.

Elasticsearch

While Elastic today offers a wide variety of products, its core product, Elasticsearch, is a data store and full text search engine.

The Good

  1. Comprehensive Feature Set. Elasticsearch is capable of handling almost any FTS query. The Elastic Query DSL (domain-specific language) is the gold standard for full text search capabilities.
  2. Performant. Our benchmarks show that Elasticsearch can query billions of rows in milliseconds thanks to its underlying battle-tested Lucene search engine and distributed architecture.
  3. More than Search. Beyond FTS, Elasticsearch is also an analytical query engine, vector database, security and observability platform. Many organizations enjoy the simplicity of consolidating several services within Elasticsearch.

The Bad

  1. Not a Reliable Data Store. We’ve talked to many companies who have tried and regretted their decision to use Elasticsearch as their primary data store. This is not a practice we recommend — Elasticsearch’s lack of ACID transactions and MVCC can lead to data inconsistencies and loss, while its lack of relational properties and real-time consistency makes many database queries challenging.
  2. Requires ETL Pipelines. Because Elasticsearch is not a reliable data store, organizations that use Postgres typically extract, transform, and load (ETL) data from Postgres to Elasticsearch. Because failures in ETL pipelines can lead to all sorts of production outages, these pipelines need to be carefully maintained to avoid breaking changes in the underlying Postgres schema.
  3. Loss of Data Freshness. ETL jobs are time-consuming and run at periodic intervals. Data that reaches Elasticsearch often lags several hours behind Postgres, which may be prohibitive for applications that perform real-time search over Postgres tables.
  4. Expensive. We were surprised to hear from several enterprises that Elasticsearch had grown to become their largest software expense. As the cost of their Elasticsearch clusters ballooned, many of these enterprises switched from Elasticsearch Cloud to self-managed. While this reduced their cloud spend, it introduced a new problem: Elasticsearch is notoriously difficult to run, tune, and manage. These organizations then hired (expensive) engineers to manage their Elasticsearch clusters.

Key Takeaway

Elasticsearch delivers excellent search performance at the price of operational overhead and data freshness. We recommend Elasticsearch if more lightweight alternatives cannot get the job done or if you intend on using other Elasticsearch services.

Alternative Search Engines

Over the years, a modern breed of search engines like Algolia, Meilisearch, and Typesense has emerged. These engines are commonly used to build user-facing search experiences. Hacker News search, for instance, is built on Algolia.

While each service differentiates around the edges, there’s an important caveat for developers looking for search over Postgres: none of these solutions are built specifically for Postgres. Users of Postgres will likely experience many of the same problems with these services as they would with Elasticsearch.

Is the Best of Both Worlds Possible?

ParadeDB is a full text search engine built for Postgres. Powered by an extension called pg_search, ParadeDB embeds Tantivy, a Rust-based Lucene alternative, inside Postgres. Like native Postgres FTS, ParadeDB plugs into any existing, self-managed Postgres database with no additional infrastructure. Like Elasticsearch, ParadeDB provides the capabilities of an advanced full text search engine.

Compatibility with managed Postgres services like Amazon RDS is coming soon. To get notified when it’s ready, we invite you to fill out our interest form. In the meantime, you can follow our GitHub repository and give us a star!