Why You Might Not Need A Vector Database When PostgreSQL Can Power Your RAG

Why You Might Not Need A Vector Database When PostgreSQL Can Power Your RAG

By Yuriy Zhar 8 min read
Want to build RAG without dragging a new vector database into your stack? In this guide we walk through how to turn plain old PostgreSQL into a serious RAG backend, step by step, using pgvector, Elixir and a single dataset_chunks table to power BM25 text search, dense vector similarity and even binary Hamming and Jaccard searches, so you can see what actually works on your data before you add more infrastructure.

You have probably seen this recipe for RAG a hundred times already:

  1. Pick an embedding model
  2. Pick a vector database
  3. Build a retrieval pipeline on top

Step 2 is usually treated like a law of nature. Of course you need a vector database.

Except you often do not.

If you are already on PostgreSQL, you can store your chunks, manage embeddings, and run several kinds of similarity search in the same place where the rest of your data lives. No extra service, no extra backup story, no extra operational headache.

In this article I will walk through how to set up PostgreSQL + pgvector in an Elixir project, and how to use different search strategies for RAG:

  • BM25 full text search
  • Dense vector search in several flavors: cosine, L2, L1, dot product
  • Binary similarity search: Hamming and Jaccard over bitstrings

I will use the postgres_rag_bench repo as a reference, but the focus here is on the setup and the concepts, not the repo structure.

Prepare PostgreSQL for vector search

I will assume you are on PostgreSQL 16 or newer. You need the vector extension installed.

If you use a managed Postgres where vector is available, you usually only need:

CREATE EXTENSION IF NOT EXISTS vector;

You run that once per database. In Ecto migrations you normally wrap it with execute so it is idempotent:

def change do
  execute(
    "CREATE EXTENSION IF NOT EXISTS vector",
    "DROP EXTENSION IF EXISTS vector"
  )
end

If you are self hosting Postgres and CREATE EXTENSION vector fails, then you simply do not have pgvector installed at the server level. At that point you either:

  • install pgvector from your distro packages or
  • rebuild Postgres with the extension, or
  • switch to an image that already includes it

There is no way around that. If the extension is not installed on the server, the rest of this article will not work. No amount of Elixir magic fixes a missing C extension.

Once CREATE EXTENSION vector works in psql, you are ready to move on.

Wire pgvector into Elixir and Ecto

On the Elixir side, you need three things:

  1. The dependency in mix.exs
  2. A custom Postgrex type definition
  3. Repo configuration telling Ecto to use those types

In mix.exs:

defp deps do
  [
    {:ecto_sql, "~> 3.11"},
    {:postgrex, ">= 0.0.0"},
    {:pgvector, "~> 0.3"}
  ]
end

Then you define a module that tells Postgrex about the vector type:

# lib/my_app/postgres_types.ex
Postgrex.Types.define(
  MyApp.PostgresTypes,
  [Pgvector.Extensions.Vector],
  []
)

Finally, you point your Repo at this type module:

# config/config.exs or config/runtime.exs
config :my_app, MyApp.Repo,
  database: "my_app_dev",
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  types: MyApp.PostgresTypes

If you forget the types: setting, Ecto has no idea what to do with your vector column and you will hit cryptic decode errors when you try to query it.

At this point, Pgvector.Ecto.Vector is available as a field type in your schemas and you can start treating embeddings as first class citizens.

Create a proper RAG table in a migration

You want a table that holds:

  • where the chunk came from
  • the original document title
  • an index per chunk inside each document
  • the chunk text
  • a dense vector embedding
  • an optional binary representation of that embedding

Here is a minimal but practical migration:

defmodule MyApp.Repo.Migrations.CreateDatasetChunks do
  use Ecto.Migration

  def change do
    execute(
      "CREATE EXTENSION IF NOT EXISTS vector",
      "DROP EXTENSION IF EXISTS vector"
    )

    create table(:dataset_chunks) do
      add :source_path, :string, null: false
      add :document_title, :string, null: false
      add :chunk_index, :integer, null: false
      add :content, :text, null: false

      add :embedding, :vector, size: 384
      add :embedding_binary, :text

      timestamps(type: :naive_datetime_usec)
    end

    create unique_index(
      :dataset_chunks,
      [:source_path, :chunk_index],
      name: :dataset_chunks_source_chunk_index
    )

    create index(:dataset_chunks, [:document_title])

    create index(
      :dataset_chunks,
      ["to_tsvector('english', content)"],
      using: :gin,
      name: :dataset_chunks_content_tsv_idx
    )

    # cosine distance ivfflat index
    execute(
      """
      CREATE INDEX dataset_chunks_embedding_cosine_idx
      ON dataset_chunks
      USING ivfflat (embedding vector_cosine_ops)
      WITH (lists = 100)
      """,
      "DROP INDEX IF EXISTS dataset_chunks_embedding_cosine_idx"
    )

    # L2 distance ivfflat index
    execute(
      """
      CREATE INDEX dataset_chunks_embedding_l2_idx
      ON dataset_chunks
      USING ivfflat (embedding vector_l2_ops)
      WITH (lists = 100)
      """,
      "DROP INDEX IF EXISTS dataset_chunks_embedding_l2_idx"
    )

    # inner product ivfflat index
    execute(
      """
      CREATE INDEX dataset_chunks_embedding_ip_idx
      ON dataset_chunks
      USING ivfflat (embedding vector_ip_ops)
      WITH (lists = 100)
      """,
      "DROP INDEX IF EXISTS dataset_chunks_embedding_ip_idx"
    )
  end
end

If your pgvector build also includes vector_l1_ops and the bitstring operator classes, you can add indexes for those as in your original repo migrations. If it does not, trying to create those indexes will fail, so you either wrap them in anonymous DO blocks with exception handlers, or skip them entirely.

Run this with:

mix ecto.migrate

Now Postgres is ready for text search, cosine, L2, and inner product vector search, all on the same table.

Define the schema in Elixir

The schema is straightforward. You are mapping that migration into Ecto:

defmodule MyApp.DatasetChunk do
  use Ecto.Schema

  schema "dataset_chunks" do
    field :source_path, :string
    field :document_title, :string
    field :chunk_index, :integer
    field :content, :string

    field :embedding, Pgvector.Ecto.Vector
    field :embedding_binary, :string

    timestamps(type: :naive_datetime_usec)
  end
end

By using Pgvector.Ecto.Vector you tell Ecto to encode and decode between Elixir lists and the Postgres vector column.

embedding will be a list of floats like [0.034, -0.27, ...] in Elixir and a proper vector(384) on the database side.

Get embeddings into the table

You cannot magically ask Postgres to embed your text. The embedding step happens in your application code.

You have two realistic options:

  • Use a local model via Bumblebee, as in the repo
  • Call a remote API such as OpenAI and store the returned vector

The exact choice does not matter for the database part. What matters is that for each chunk of text you get a list of floats of fixed length.

A simplified ingestion flow looks like this:

defmodule MyApp.Rag.Loader do
  alias MyApp.{Repo, DatasetChunk}

  @embedding_dim 384

  def ingest_markdown(path) do
    content = File.read!(path)
    chunks  = split_into_paragraphs(content)

    Enum.with_index(chunks)
    |> Enum.each(fn {chunk_text, idx} ->
      embedding = embed(chunk_text)

      Repo.insert!(
        %DatasetChunk{
          source_path: path,
          document_title: guess_title(content, path),
          chunk_index: idx,
          content: chunk_text,
          embedding: embedding,
          embedding_binary: to_binary_bits(embedding)
        },
        on_conflict: {:replace_all_except, [:id, :inserted_at]},
        conflict_target: [:source_path, :chunk_index]
      )
    end)
  end

  defp split_into_paragraphs(content) do
    content
    |> String.split(~r/\n{2,}/, trim: true)
    |> Enum.map(&String.trim/1)
  end

  defp guess_title(content, path) do
    case Regex.run(~r/^#\s+(.+)$/m, content) do
      [_, title] -> title
      _ -> Path.basename(path)
    end
  end

  defp embed(text) do
    # Call your embedding model here
    # Return a list of floats length 384
  end

  defp to_binary_bits(embedding) when is_list(embedding) do
    embedding
    |> Enum.map(fn x -> if x > 0.0, do: "1", else: "0" end)
    |> Enum.join()
  end
end

This is conceptually what your repo already does, just stripped down and explained.

Once you have a few documents ingested, you can start playing with different search strategies.

BM25 full text search in PostgreSQL and Elixir

Start with the old workhorse: full text search.

Postgres can compute a BM25 like ranking using ts_rank_cd. Your migration already created a GIN index on to_tsvector('english', content).

The basic SQL looks like this:

SELECT
  id,
  document_title,
  chunk_index,
  ts_rank_cd(
    to_tsvector('english', content),
    plainto_tsquery('english', $1)
  ) AS score,
  ts_headline(
    'english',
    content,
    plainto_tsquery('english', $1),
    'MaxFragments=2, MaxWords=15, MinWords=5'
  ) AS snippet
FROM dataset_chunks
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
ORDER BY score DESC
LIMIT 5;

In Ecto you usually wrap this in a function:

defmodule MyApp.Rag.Search do
  import Ecto.Query
  alias MyApp.{Repo, DatasetChunk}

  def search_bm25(query, opts \\ []) do
    limit = Keyword.get(opts, :limit, 5)

    ts_query = fragment("plainto_tsquery('english', ?)", ^query)

    base =
      from c in DatasetChunk,
        where: fragment("to_tsvector('english', ?) @@ ?", c.content, ^ts_query),
        select: %{
          chunk: c,
          score: fragment("ts_rank_cd(to_tsvector('english', ?), ?)", c.content, ^ts_query),
          snippet:
            fragment(
              "ts_headline('english', ?, ?, 'MaxFragments=2,MaxWords=15,MinWords=5')",
              c.content,
              ^ts_query
            )
        },
        order_by: [desc: fragment("ts_rank_cd(to_tsvector('english', ?), ?)", c.content, ^ts_query)],
        limit: ^limit

    Repo.all(base)
  end
end

Use BM25 when your users type actual words from the documents, when you want human readable snippets, and when lexical precision matters.

For RAG this is already enough for some use cases. Many people overcomplicate retrieval when a tuned BM25 setup would be fine.

Dense vector search, several metrics, same table

Now to the semantic side. You already stored embeddings in the embedding column. pgvector gives you several distance operators.

In pgvector:

  • <-> is L2 (Euclidean) distance
  • <=> is cosine distance
  • <#> is inner product

You choose the metric in your ORDER BY clause and you align your index operator class with that metric.

Cosine distance search

Cosine distance is the default in many RAG tutorials.

Conceptually:

  1. Embed the query text into a vector q
  2. Run ORDER BY embedding <=> q and take the smallest distances

In Elixir:

def search_embeddings_cosine(query, opts \\ []) do
  with {:ok, vector} <- embed_query(query) do
    do_search_cosine(vector, opts)
  end
end

defp embed_query(text) do
  case MyApp.Embeddings.embed(text) do
    {:ok, list} when is_list(list) ->
      {:ok, Pgvector.new(list)}

    error ->
      error
  end
end

defp do_search_cosine(vector, opts) do
  import Ecto.Query
  alias MyApp.{Repo, DatasetChunk}

  limit = Keyword.get(opts, :limit, 5)

  q =
    from c in DatasetChunk,
      where: not is_nil(c.embedding),
      select: %{
        chunk: c,
        distance: fragment("? <=> ?", c.embedding, ^vector)
      },
      order_by: fragment("? <=> ?", c.embedding, ^vector),
      limit: ^limit

  Repo.all(q)
end

This uses the vector_cosine_ops ivfflat index from the migration when Postgres decides it is worth it.

Use cosine when you want classic semantic similarity, when your embeddings are normalized, and you care about directions rather than magnitudes.

L2 distance search

Switching to L2 is just a different operator and a matching index.

The SQL idea:

SELECT ...
FROM dataset_chunks
ORDER BY embedding <-> $1
LIMIT 5;

In Elixir:

def search_embeddings_l2(query, opts \\ []) do
  with {:ok, vector} <- embed_query(query) do
    import Ecto.Query
    alias MyApp.{Repo, DatasetChunk}

    limit = Keyword.get(opts, :limit, 5)

    q =
      from c in DatasetChunk,
        where: not is_nil(c.embedding),
        select: %{
          chunk: c,
          distance: fragment("? <-> ?", c.embedding, ^vector)
        },
        order_by: fragment("? <-> ?", c.embedding, ^vector),
        limit: ^limit

    Repo.all(q)
  end
end

You already created a vector_l2_ops ivfflat index, so Postgres can accelerate this.

Choose L2 if your model or use case expects Euclidean geometry, or if you empirically see better results in your benchmarks.

L1 distance search

L1 (Manhattan) distance is less commonly used in tutorials, but in your benchmark repo it turned out to be very fast.

Unfortunately pgvector does not give L1 a dedicated operator in all versions, so the L1 story depends on the operator class support you compiled in. In your repo you use vector_l1_ops and a separate search module, which is fine if your pgvector build supports it.

The pattern on the Elixir side is the same as L2 and cosine. Embed the query, then ORDER BY a distance expression that matches the operator class you indexed on. If your pgvector version does not have a clean L1 operator, you either:

  • upgrade pgvector, or
  • skip L1 entirely and stick to cosine / L2 / inner product

There is no point forcing L1 if your extension does not support it. You will just fight the database.

Inner product search

For inner product you use <#>. This is handy when you want a similarity score rather than a distance.

Inner product grows when vectors point in a similar direction and have larger magnitudes. With normalized embeddings it acts like cosine similarity.

In SQL you might write:

SELECT
  id,
  -1 * (embedding <#> $1) AS score
FROM dataset_chunks
ORDER BY score DESC
LIMIT 5;

Multiplying by -1 flips the sign so you can sort descending on higher scores.

In Elixir:

def search_embeddings_dot(query, opts \\ []) do
  with {:ok, vector} <- embed_query(query) do
    import Ecto.Query
    alias MyApp.{Repo, DatasetChunk}

    limit = Keyword.get(opts, :limit, 5)

    score_expr = fragment("-1 * (? <#> ?)", ^vector, field(DatasetChunk, :embedding))

    q =
      from c in DatasetChunk,
        where: not is_nil(c.embedding),
        select: %{
          chunk: c,
          score: fragment("-1 * (? <#> ?)", c.embedding, ^vector)
        },
        order_by: [desc: fragment("-1 * (? <#> ?)", c.embedding, ^vector)],
        limit: ^limit

    Repo.all(q)
  end
end

You also set up a vector_ip_ops ivfflat index in the migration so this can scale.

Use dot product when some downstream logic expects a similarity score and when you know your embeddings are normalized or comparably scaled.

Binary Hamming and Jaccard search

Dense vectors are not the only way to search.

You can quantize each embedding into a bitstring and compare those.

For a 384 dimensional vector you can create a bitstring of length 384. A simple rule is: value greater than zero becomes 1, otherwise 0. That is what to_binary_bits/1 earlier did.

To make that useful for search, you need two elements on the database side:

  1. A column to hold the bitstrings (embedding_binary)
  2. Functions and operators that compute distances between two bitstrings

Your migrations from the repo already contain those. A simplified version looks like this:

def change do
  execute(
    """
    CREATE OR REPLACE FUNCTION binary_hamming_distance(text, text)
    RETURNS integer
    LANGUAGE SQL
    IMMUTABLE
    AS $$
      SELECT bit_count(CAST($1 AS bit(384)) # CAST($2 AS bit(384)));
    $$;
    """,
    "DROP FUNCTION IF EXISTS binary_hamming_distance(text, text)"
  )

  execute(
    """
    CREATE OR REPLACE FUNCTION binary_jaccard_distance(text, text)
    RETURNS double precision
    LANGUAGE SQL
    IMMUTABLE
    AS $$
      WITH a AS (SELECT CAST($1 AS bit(384)) AS bits_a),
           b AS (SELECT CAST($2 AS bit(384)) AS bits_b)
      SELECT
        CASE
          WHEN bit_count((a.bits_a | b.bits_b)) = 0 THEN 0.0
          ELSE 1.0 - (
            bit_count((a.bits_a & b.bits_b))::double precision /
            NULLIF(bit_count((a.bits_a | b.bits_b)), 0)
          )
        END
      FROM a, b;
    $$;
    """,
    "DROP FUNCTION IF EXISTS binary_jaccard_distance(text, text)"
  )

  execute(
    """
    CREATE OPERATOR <~> (
      PROCEDURE = binary_hamming_distance,
      LEFTARG = text,
      RIGHTARG = text
    )
    """,
    "DROP OPERATOR IF EXISTS <~> (text, text)"
  )

  execute(
    """
    CREATE OPERATOR <%> (
      PROCEDURE = binary_jaccard_distance,
      LEFTARG = text,
      RIGHTARG = text
    )
    """,
    "DROP OPERATOR IF EXISTS <%> (text, text)"
  )
end

If your pgvector build also includes bit_hamming_ops and bit_jaccard_ops, you can create ivfflat indexes on embedding_binary using those operator classes.

Searching is then conceptually the same as with dense vectors.

For Hamming:

def search_embeddings_hamming(query, opts \\ []) do
  with {:ok, bits} <- embed_and_quantize(query) do
    import Ecto.Query
    alias MyApp.{Repo, DatasetChunk}

    limit = Keyword.get(opts, :limit, 5)

    q =
      from c in DatasetChunk,
        where: not is_nil(c.embedding_binary),
        select: %{
          chunk: c,
          distance: fragment("? <~> ?", c.embedding_binary, ^bits)
        },
        order_by: fragment("? <~> ?", c.embedding_binary, ^bits),
        limit: ^limit

    Repo.all(q)
  end
end

defp embed_and_quantize(text) do
  with {:ok, vec} <- MyApp.Embeddings.embed(text) do
    {:ok, to_binary_bits(vec)}
  end
end

For Jaccard you only swap the operator to <%>.

Why bother with this at all? Mainly to experiment. Binary search can be faster or lighter in some setups, and it gives you a cheap way to play with quantization and approximate matching without bringing in another system.

It will not magically outperform a well tuned dense vector search on every dataset, so treat it as an option, not a silver bullet.

Choosing a search strategy in practice

You now have several tools in the same database:

  • BM25 full text search
  • Cosine, L2, L1, inner product over dense vectors
  • Hamming and Jaccard over binary embeddings

You do not need to pick a single winner for all time. The sane thing to do is:

  • start with BM25 plus one vector metric (cosine or L2)
  • benchmark on your real documents and your real queries
  • look at both speed and relevance, not just one

In your own benchmark repo, L1 came out faster than the others for that dataset. That is a useful data point, but it does not mean L1 is magically best everywhere. Each dataset, index, hardware combination behaves slightly differently.

If you want to keep things simple:

  • use BM25 when the user is clearly doing keyword search
  • use cosine or L2 when you want semantic search
  • optionally, blend them by fetching some results from each and re ranking

You can always layer something more clever on top later.

Do you still need a separate vector database?

Time for the blunt answer.

If you have:

  • millions of chunks, not billions
  • typical SaaS traffic, not TikTok scale
  • a team already comfortable with Postgres operations

then PostgreSQL with pgvector is usually enough to build and run a solid RAG system.

You get:

  • one backup story
  • one access control story
  • transactions between your business data and your RAG data
  • less operational noise

You pay for it with:

  • less exotic index structures than some specialized engines
  • some manual tuning if you want top tier performance

On the other hand, if you are planning:

  • hundreds of millions or billions of embeddings
  • crazy low latency under heavy concurrent load
  • multi tenant isolation that lives inside the vector engine itself

then yes, a dedicated vector database can still be a reasonable choice. You trade complexity for features and scale.

The point of this whole Postgres setup is not to pretend that vector databases never make sense. It is to make you run your own experiments before you add another box to your architecture diagram.

You now have everything you need to do that:

  • a migration that turns Postgres into a RAG store
  • an Elixir setup with pgvector wired in
  • clear SQL and Ecto patterns for BM25, dense vector search, and binary search

Load your real documents, run a few searches, measure latency, look at the quality of results, and only then decide if a separate vector database is worth the extra moving parts.

Windsurf
Recommended Tool

Windsurf

All my projects and even this website is build using Windsurf Editor. Windsurf is the most intuitive AI coding experience, built to keep you and your team in flow.

Share this article:
Yuriy Zhar

Yuriy Zhar

github.com

Passionate web developer. Love Elixir/Erlang, Go, TypeScript, Svelte. Interested in ML, LLM, astronomy, philosophy. Enjoy traveling and napping.

Get in Touch

If you need a developer who delivers fast, reliable, real-world solutions, reach out. Let’s turn your idea or project into something that works.

Stay updated

Subscribe to our newsletter and get the latest articles delivered to your inbox.