Startup did a heavy ingest from MongoDB, transformed documents, inserted a read model into ETS, then the node switched into normal Phoenix traffic. CPU dropped, logs looked healthy, but RSS stayed high. Triggering GC did not change anything. It looked like the VM was refusing to free memory.
That is almost never what is happening. On BEAM, the question that matters is: what is still referenced, and where did it become long lived.
What BEAM is actually doing with your memory
BEAM does not have a single global heap that gets cleaned up in one sweep. Each process has its own heap and is garbage collected independently, using a generational copying collector. This is why a single long lived process can hold on to a lot of memory even if everything else looks idle.
So I started by refusing to guess. I checked the VM’s own memory buckets:
:erlang.memory()
If :processes is big, you are looking for big heaps. If :ets is big, you stored a lot. If :binary is big, you are usually dealing with binary lifetimes, and that is where my real problem was.
The binary trap that keeps big buffers alive
Once :binary is dominant, you need one concept tattooed into your brain: sub binaries.
A sub binary is a slice that points into another binary. It is fast because it avoids copying, but it can keep the original parent binary alive. That parent might be a very large buffer that came from IO, decoding, or driver internals.
My pipeline made this easy to trigger. A driver decodes a large payload, I extract smaller binary fields, and then I store those values somewhere long lived. If those fields are sub binaries, they can pin the larger buffer. At that point GC is doing its job correctly: the data is not garbage, because I am still holding references to it.
That is why “I forced GC and nothing happened” is a common report in binary retention cases.
Proving who is holding the binaries
Before changing code, I wanted a concrete owner.
The quickest proof is to ask every process what binaries it references, sum them, and sort. This does not require any external tooling and is often enough to identify the guilty processes:
Process.list()
|> Enum.map(fn pid ->
info = Process.info(pid, [:registered_name, :memory, :binary, :message_queue_len]) || []
bins = Keyword.get(info, :binary, [])
bin_bytes = Enum.reduce(bins, 0, fn {size, _used, _refc}, acc -> acc + size end)
%{
pid: pid,
name: Keyword.get(info, :registered_name),
memory: Keyword.get(info, :memory, 0),
binary_bytes: bin_bytes,
queue: Keyword.get(info, :message_queue_len, 0)
}
end)
|> Enum.sort_by(& &1.binary_bytes, :desc)
|> Enum.take(20)
In my case, the top offenders were exactly the processes involved in loading and the processes that received full documents and kept them around.
When you need a stronger lens, recon is designed for this kind of diagnosis and includes bin_leak/1, which forces GC and observes how many binary references are released per process. It is not magic, it is a measurement tool that makes binary retention visible.
The real fix: copy binaries at the boundary where data becomes long lived
My hard boundary was ETS. Once a term goes into ETS, it is meant to live. If I accidentally insert sub binaries into ETS, I might be keeping huge parent buffers alive for the lifetime of the cache.
So I forced binaries to become standalone at the moment they crossed that boundary. The heart of the fix was simple: :binary.copy/1.
Here is the compacting function I used (simplified). The goal is to walk nested terms and copy binaries, while leaving structs alone:
def compact_binaries(term) when is_binary(term), do: :binary.copy(term)
def compact_binaries(list) when is_list(list),
do: Enum.map(list, &compact_binaries/1)
def compact_binaries(map) when is_map(map) and not is_struct(map) do
Map.new(map, fn {k, v} -> {compact_binaries(k), compact_binaries(v)} end)
end
def compact_binaries(tuple) when is_tuple(tuple) do
tuple
|> Tuple.to_list()
|> Enum.map(&compact_binaries/1)
|> List.to_tuple()
end
def compact_binaries(other), do: other
I applied it at two points that matter:
- right before inserting transformed values into ETS
- right before sending full document payloads to long lived processes via message passing
After that change, :binary stopped plateauing after load. The memory came back because the large transient buffers from decoding were no longer pinned by tiny slices I saved for later.
Why GC tweaks did not help until after the structural fix
This is the part that saves you days.
BEAM GC is per process and it only frees unreachable data. If you store a binary in ETS or in a GenServer state, it is reachable. If that binary is a sub binary, the parent is also reachable. Manually calling GC does not change the fact that you still hold references.
Only after breaking references did GC knobs become useful, and only for a few specific long lived processes that had a predictable “big burst then mostly idle” lifecycle.
Two knobs were worth keeping in my toolbox:
:erlang.process_flag(:fullsweep_after, N)for processes that allocate heavily and then live forever, so old garbage is swept more often (at some CPU cost).- returning
:hibernatefrom a GenServer after a known heavy phase, so the process can shed a bloated heap once it is truly done with that temporary data.
The part that stayed high on purpose: ETS
After fixing binary pinning, memory did not drop to “small”, because ETS was now the honest owner of the read model.
ETS reports memory in words, so I convert it using the runtime word size:
word_size = :erlang.system_info(:wordsize) ets_bytes = (:ets.info(:boats, :memory) || 0) * word_size ets_mb = ets_bytes / (1024 * 1024)
At that point optimization is not about GC. It is about what you store. I reshaped the cached payload to match read paths, and stopped storing document shaped blobs when only a small subset was used.
Monitoring: what I use in development and what I trust in production
In development, I want maximum visibility, even if the tooling is heavy. In production, I want low overhead, remote friendly tools, and metrics that let me detect the same class of regression early.
Development monitoring
Observer is still the fastest way to build intuition. It lets you see process memory, ETS tables, binary usage, message queues, and schedulers in one place. I use it when I want to confirm a hypothesis quickly: is memory dominated by ETS, by binaries, or by a few bloated processes.
When I need to understand GC behavior instead of guessing, I trace garbage collection events for a specific PID and watch it behave under load:
pid = self() :erlang.trace(pid, true, [:garbage_collection, :monotonic_timestamp]) # run the code path that allocates heavily # then turn tracing off :erlang.trace(pid, false, [:garbage_collection])
That gives me timestamps for GC start and end events, and it prevents the classic mistake of assuming “GC is not running” just because RSS is flat.
Production monitoring
In production I avoid GUI based tools unless I am on a controlled debugging session. My baseline is metrics plus targeted snapshots.
First, I lean on Phoenix Telemetry and the Telemetry supervisor. Phoenix ships with a Telemetry setup and can poll VM metrics on an interval via :telemetry_poller. That gives you continuous signals instead of one off shell pokes.
A simple pattern is to attach a poller measurement that emits VM memory buckets periodically, then export them to your metrics backend:
# in your telemetry supervisor
children = [
{:telemetry_poller,
measurements: [
{__MODULE__, :vm_memory, []}
],
period: :timer.seconds(10)}
]
def vm_memory do
mem = :erlang.memory()
:telemetry.execute([:vm, :memory], mem, %{})
end
Once you have [:vm, :memory] events, you can turn them into metrics (Prometheus, StatsD, OpenTelemetry) and alert on trends like binary growth, ETS growth, or processes growth.
Second, when I need process level truth in production, I reach for recon and observer_cli. recon’s bin_leak/1 is especially useful when binary memory is high and you want the top processes that release binary references after GC. observer_cli packages this kind of diagnosis into a production friendly interface.
The practical workflow looks like this:
- dashboards tell me which bucket is growing (binary, ETS, processes)
- if it is binary, I run a quick process snapshot or
bin_leak/1to identify owners - then I look for the boundary where data becomes long lived (ETS insert, GenServer state, message queue backlog)
- I fix references first, and only then consider tuning GC flags for a small number of processes
That workflow is what kept this problem from returning later as the codebase evolved.
Windsurf
All my projects and even this website is build using Windsurf Editor. Windsurf is the most intuitive AI coding experience, built to keep you and your team in flow.
Get in Touch
Have a question or want to work together? Drop a message below.