This is the second article in the Spectre series. The first one introduced Spectre Kinetic, the small planning organ that lets a model express intent while Elixir decides what reality accepts. Now we move from action to perception, because an agent that can use tools but cannot see the world is just a very expensive monk in a dark room.
The Browser Is Not the Interface Agents Need
Humans need browsers. We need buttons, scrollbars, tabs, tiny cookie banners, and that little moment of shame when we click “accept all” because lunch is in six minutes. Agents do not need a browser in that human sense. They need perception.
That sounds poetic, so let’s drag it back into engineering before someone starts a keynote. Perception means the agent needs to know what is on the page, what changed, what can be clicked, what can be submitted, what is probably important, and what is decorative nonsense wearing a CSS gradient.
The usual approach is to throw raw HTML, a DOM dump, or a screenshot at the model and whisper, “Good luck, little transformer.” This is not architecture. This is hazing. The DOM was built for browsers, not for language models. It contains everything, which means it explains almost nothing. Navigation, forms, sidebars, tracking pixels, hidden inputs, hydration leftovers, marketing fluff, ten layers of divs named nothing. Beautiful. Useless. Haunted.
Spectre Lens exists because agents should not have to lick the raw DOM to understand a page. They need a lens. Something that turns browser state into agent-readable context without pretending the model is secretly Chrome with opinions.
After Kinetic, Lens Is the Next Organ
Spectre is not one giant agent framework with a cape and a personality disorder. It is a series of focused libraries. Kinetic handles tool planning. Lens handles browser perception. Later pieces can handle memory, orchestration, and whatever other production demon starts chewing through the floorboards.
That separation matters. Tool planning is not browsing. Browsing is not memory. Memory is not execution. When agent frameworks blend all of that into one glowing abstraction, the demo looks smooth and the production system starts making little clicking sounds at night.
Spectre Lens is the part that says: the agent does not need a “browser automation API” first. It needs a readable view of the page. It needs actions it can refer to. It needs forms, links, semantic structure, screenshots, page maps, and exported artifacts. It needs to ask, “What does this page mean?” before it asks, “What can I click?”
Browser automation clicks buttons. Spectre Lens asks what the button means.
That is the split. Automation is motion. Perception is understanding enough to move without immediately walking into a glass door.
Lightpanda Drives, Spectre Lens Translates
Spectre Lens currently controls Lightpanda through CDP, which is the Chrome DevTools Protocol. CDP is the wire-level machinery browsers use for inspection and automation. Useful? Absolutely. Something you want your agent thinking about directly? Please no. That is like teaching a child to make toast by handing them the electrical diagram of the kitchen.
The public contract is not “here is some CDP, enjoy the wires.” The public contract is SpectreLens.Protocol: page views, actions, exports, page maps, watchers, and agent context. Lightpanda is the current driver. The lens is the idea. That distinction keeps the system clean. Drivers can change. Perception should stay stable.
Here is the basic ritual, and notice how boring it feels. Boring is good. Boring means the agent gets a clean view instead of a haunted pile of browser internals.
{:ok, lens} = SpectreLens.open(instances: 2)
{:ok, tab} = SpectreLens.new_tab(lens, url: "https://example.com")
{:ok, view} =
SpectreLens.look(tab,
include: [:markdown, :semantic_tree, :interactive, :forms, :links, :structured_data]
)
view.markdown
view.actions
view.llms_context
This is the important part: the agent does not receive “the browser.” It receives a shaped view. Markdown for readable content. Actions for possible interaction. LLM context for the parts of the page that want to speak machine. The model does not need to parse your frontend’s emotional damage. Lens already filtered the room.
Page Maps Are Where It Gets Delicious
Raw page content is useful, but pages are spatial. A pricing section is not just text. It is a region. A contact form is not just inputs. It is an intention trap politely asking for your email. A navigation bar is not just links. It is the page saying, “Here are the doors.”
Spectre Lens gives you page maps through zoom_out and zoom_in. That sounds fancy, but the practical value is simple. The agent can ask for the big picture, focus on one region, then return to the full page. Like a human scanning a website, except without the caffeine and questionable posture.
{:ok, map} = SpectreLens.zoom_out(tab)
map.description
{:ok, focused} = SpectreLens.zoom_in(tab, "#contact")
A zoomed-out map can describe the page in words: navigation at the top, hero section first, content in the middle, forms near the bottom, footer after that. A zoom-in can focus on something like "#contact" or "#pricing". This matters because agents do better when context is shaped around the task instead of dumped into the prompt like a storage unit after a breakup.
Then there is goal-scoped discovery. This is the grown-up version of crawling. The agent has a goal, like “find API reference,” and Lens explores a small same-origin frontier, scores links, and returns compact candidates.
{:ok, discovery} = SpectreLens.discover(tab, goal: "api reference")
discovery.text
discovery.candidates
Not the whole internet. Not every page since the beginning of time. Just enough movement to find useful context without turning your runtime into a caffeinated spider.
The Agent Can Act, But It Still Does Not Own Reality
Eventually perception wants movement. The agent sees a search field. It sees a submit button. It wants to fill and click. Fine. Let it ask. Let Lens translate that into browser actions. But do not confuse clicking with wisdom. A button press is still a consequence wearing a friendly border radius.
:ok = SpectreLens.act(tab, {:fill, ref: "#q", value: "spectre"})
:ok = SpectreLens.act(tab, {:click, ref: "button[type=submit]"})
The nice thing here is that actions are explicit. Fill this reference. Click this reference. The model does not need to invent a browser session in its head. The runtime has a tab. The tab has a page. The page has references. The agent proposes movement, and the library performs the boring mechanics with actual state.
Spectre Lens also supports exports, because production systems need evidence, not just confidence. You want the screenshot. You want the markdown. You want artifacts you can save, debug, inspect, attach, or compare later when the agent says, “I definitely saw the button,” and you need to know whether the button was real or just transformer fan fiction.
{:ok, "screenshots/example.png"} =
SpectreLens.export(tab, :screenshot, path: "screenshots/example.png")
:ok = SpectreLens.close(lens)
And yes, close the lens. We are not animals. Processes deserve endings. Browsers deserve supervision. Your laptop deserves mercy.
The Runtime Should Be Calm Even When the Page Is Cursed
Lens gives the agent eyes, but not the illusion that eyes are the same thing as judgment. This distinction matters. A model can observe, describe, choose, and suggest. Your application still decides what is allowed, what is logged, what is retried, and what gets blocked before it becomes a support ticket with screenshots.
There is also support for llms.txt and llms-full.txt, which is wonderfully practical. If a site exposes agent-oriented documentation, Lens can discover it and include that context during look. That means an agent can read the page and also receive the site’s own “please understand me this way” documentation. Finally, a website saying something useful to machines instead of just yelling at Lighthouse scores.
And when something goes wrong, Lens should return agent-friendly errors at public API edges instead of turning the whole thing into a crash ritual. Element not found? Say that. Unsupported export? Say that. Retryable? Hint available? Operation target known? Good. That is the difference between an agent that can recover and an agent that falls over because a button moved three pixels to the left and spiritually betrayed it.
This is the Spectre pattern again. Let the model be expressive. Let Elixir be boring. Kinetic makes tool intent inspectable. Lens makes browser state legible. One handles action planning. The other handles perception. Together they make an agent less like a raccoon with API keys and more like a small, supervised creature that can read the room before touching anything expensive.
Send via.chat
Receive form leads, send login codes, and route important alerts through WhatsApp or Telegram.
Get in Touch
Have a question or want to work together? Drop a message below.