Topic HubReference

Embeddings in Advertising.

Vector representations are becoming the semantic infrastructure layer for agentic advertising, contextual targeting, lookalikes, clean rooms, and cross-media decisioning.

Embeddings turn words, pages, shows, ads, users, households, products, and behaviors into numerical vectors that can be compared by similarity. In advertising, that means audiences, contexts, creatives, and signals can be matched by meaning — not only by IDs, keywords, or fixed taxonomies.

Start with the framework Read the original essay See related playbooks

Objects become vectors; vectors drive advertising decisions — under a governance rail.

Embeddings are not a privacy shortcut. They are a semantic matching layer that still needs governance, provenance, consent, evaluation, and standards.

Fast read

What it is: A topic hub for embeddings, vectors, semantic similarity, and their role in advertising infrastructure.
Best for: AdTech, MarTech, data, clean-room, AI, DSP, SSP, publisher, measurement, and product leaders trying to understand how embeddings change targeting, activation, and agentic workflows.
Core idea: Embeddings make similarity computable. Advertising can use that for audiences, contexts, creative, signals, and measurement.
Main risk: Treating embeddings as anonymous, portable, or explainable by default.
Where it connects: Agentic Audiences / UCP, AdCP, Signal Containerization, Enterprise Data Collaboration, Semantic Infrastructure, DSP / Agentic Buying, and BI / MMM.
Best next read: Embeddings: The Next Frontier in Advertising?

In one view

Embeddings in one view.

The simplest way to understand embeddings: they turn messy real-world objects into vectors so machines can compare similarity at scale.

Near does not mean identical. Far does not mean irrelevant. Similarity is useful only when the training data, model, and objective are understood.

Definition

What embeddings are.

An embedding is a vector representation of an object — text, an image, an audio signal, a user profile, a content page, a product, a show, a household behavior pattern, or a campaign. Similar objects tend to sit closer together in the embedding space, which makes similarity computable.

Plain English

Embedding = meaning compressed into numbers.

Technical

A learned vector representation where distance or angle can approximate relatedness, similarity, or context — depending on the model and training objective.

Embeddings are not always interpretable dimension by dimension. They are useful because the overall geometry can preserve relationships — not because each coordinate has an obvious human label.

Concept	Plain meaning	Advertising example
Vector	list of numbers	household viewing profile encoded as numbers
Embedding space	where vectors are compared	sports fans cluster near sports content
Cosine similarity	angle-based similarity	creative and page context are close
Centroid	average point for a group	seed-audience summary
Nearest neighbor	closest items	most similar audiences / contexts
Translation layer	maps one vector space to another	publisher vectors to buyer vectors

Why it matters

Why embeddings matter in advertising.

They make meaning computable
A buyer can move beyond exact keywords and fixed categories into semantic relationships.
They reduce dependence on brittle IDs
Where identity is limited, embeddings can support contextual, cohort, or signal-based similarity. They do not eliminate privacy obligations.
They connect different media surfaces
Text, video, audio, CTV viewing, app behavior, commerce signals, and creative assets can be represented in comparable forms — if the system is designed properly.
They support agentic workflows
Agents need objects they can discover, compare, activate, and evaluate. Embeddings help agents reason over meaning and similarity.
They improve retrieval and recommendation
Search, ranking, content matching, creative selection, and signal discovery all benefit from vector-based retrieval.
They create new governance questions
Vectors can encode sensitive patterns, inferred traits, and behavioral signals. They need policy, retention, access control, and evaluation.

Use cases

Core advertising use cases.

Lookalike audiences without direct ID matching

Seed a group of converters, watchers, buyers, or high-value users; represent their behavior as vectors; find similar users, contexts, or households where permitted.

Watch-out: Similarity is not consent. Validate data rights and avoid sensitive-inference risk.
Contextual targeting 3.0

Match creative, page, video, app, show, or bid context by semantic similarity rather than only keyword match.

Watch-out: Contextual does not automatically mean low-risk if the context reveals sensitive categories.
Cross-media similarity

Represent web, app, CTV, audio, commerce, and content signals in ways that support comparison across media surfaces.

Watch-out: Vector spaces may not align without common models or translation layers.
Creative-to-context matching

Embed ad creative, landing pages, product metadata, and publisher content to choose better-fit creative or environments.

Watch-out: Creative fit should be evaluated against outcome and brand safety, not just proximity.
Signal discovery

Use embeddings to search signal catalogs by meaning — a natural-language brief can retrieve related audience, content, and contextual signals.

Watch-out: Returned signals need provenance, freshness, and policy metadata.
Suppression and exclusion

Identify users, contexts, or inventory that are semantically misaligned with a campaign or likely to waste budget.

Watch-out: Wrong thresholds can suppress valuable reach or introduce bias.
Clean-room output interpretation

Compare aggregated outputs, audience descriptions, product categories, or campaign cohorts without exposing raw data.

Watch-out: Clean-room governance still applies; embeddings should not become an uncontrolled export channel.
Agentic activation

Agents can use embeddings to compare brief intent with available signals, contexts, inventory, and measurement outputs.

Watch-out: Agents still need permissions, output policy, human approval, and audit.

One representation layer; many advertising decisions around it.

Privacy

The privacy reality: embeddings are not anonymization by default.

Embeddings can reduce raw-data movement, but they do not automatically remove privacy risk. A vector can still encode patterns about people, households, content, behavior, or sensitive interests. Whether it is personal data depends on context, identifiability, linkage risk, and how the vector is used. Under EU/UK GDPR concepts, pseudonymized data is still personal data, and anonymization requires that re-identification is no longer reasonably likely — a high, evidence-based bar.

Claim	Better framing	Why it matters
Embeddings are anonymous	Embeddings may be lower-risk than raw data, but they are not anonymous by default.	They can still support inference, linkage, or profiling.
No IDs means no privacy issue	Privacy risk can exist without direct identifiers.	Behavioral vectors can still describe or single out people or households.
Vectors are safe to share	Vector sharing needs purpose limits, access control, retention rules, and re-identification-risk review.	Vectors can leak meaning or enable similarity matching across datasets.
Embedding equals pseudonymization	Embedding and pseudonymization are different concepts — and pseudonymized data is still personal data.	Both can still require GDPR / privacy compliance depending on context.
Contextual is always safe	Contextual targeting can still be sensitive when context reveals protected or sensitive categories.	Policy and brand safety still apply.

These are EU/UK GDPR concepts applied to embeddings by analogy; regulators have not ruled on embeddings specifically. US frameworks (CCPA/CPRA) and Apple's ATT are separate regimes — embeddings do not bypass any of them.

Governance checklist

Source data rights
Consent / lawful basis
Purpose limitation
Sensitive-category review
Re-identification risk
Linkage risk
Retention policy
Vector deletion / update
Access control
Model provenance
Query audit
Output policy
Explainability + appeal path where needed

Privacy is a property of the whole pipeline, not of the vector alone.

Interoperability

The portability problem.

Embeddings are powerful inside one system. They get harder when multiple companies need to exchange or compare them: a vector from one model may not mean the same thing as a vector from another. Model choice, dimensionality, training data, normalization, and distance metric all matter — embedding spaces are usually not directly fungible, so translation layers, benchmark sets, and common standards may be needed.

Problem	What breaks	Possible solution
Different model	vectors do not align	shared model, adapter, projection
Different dimensions	cannot compare directly	transformation / projection layer
Different training data	semantic drift	benchmark and calibration
Different objectives	similarity means different things	document objective and use case
Different privacy rules	unsafe exchange	output policy and governance
Different freshness	stale vectors	timestamp and refresh policy

Bridging two vector spaces: common model, translation layer, benchmark set, signal container, Agentic Audiences / UCP, or governance policy.

Agentic

Embeddings in agentic advertising.

Agents need more than natural language. They need a way to compare meaning across briefs, signals, contexts, creative, inventory, and outcomes. Embeddings can become the similarity layer that lets agents reason over advertising objects.

Buyer brief
Parse intent
Search signal catalog
Retrieve similar signals
Inspect provenance and policy
Activate signal
Monitor status
Evaluate outcome
Improve the next brief

Signal containerization packages embeddings with the missing layers: provenance, policy, activation path, allowed outputs, and evaluation logic.

Agent prompt → embedding query → signal discovery → governance check → activation → measurement → feedback.

Standards

Standards and protocols.

Embeddings only become market infrastructure when systems agree on how they are represented, exchanged, governed, and evaluated.

Agentic Audiences / UCP

IAB Tech Lab's Agentic Audiences, formerly the User Context Protocol, defines how agents exchange identity, contextual, and reinforcement signals. It uses embeddings — officially described as dense vectors of 256–1024 dimensions. As of June 2026 it is an initial proposal / draft; validate the current version against official IAB Tech Lab documentation.

AdCP

A separate, non-IAB agentic workflow layer (over MCP): discovery, activation, status, governance, creative, and media buying. Embeddings can support signal discovery and semantic matching inside AdCP-style workflows; AdCP does not mandate embeddings.

AAMP

IAB Tech Lab's broader agentic-advertising management initiative — foundations (ARTF, in public comment), protocols (incl. Agentic Audiences), and trust — built on existing standards (OpenRTB, AdCOM, OpenDirect, Deals API).

Signal containerization

A practical way to package embeddings with semantic meaning, provenance, policy, activation logic, and evaluation.

OpenRTB / bidstream

Embeddings may support bidstream scoring, contextual alignment, or signal enrichment — but real-time use requires latency, governance, and protocol design.

Clean rooms

A governed collaboration environment where vector-safe analysis and governed outputs still sit under clean-room policy.

Layer	Role	Embeddings connection
Agentic Audiences / UCP	signal exchange	embeddings as a compact signal representation
AdCP	workflow / tasks	discovery, activation, status, governance
AAMP	standards umbrella	agentic protocols, trust, runtime
Signal Containerization	product / operating model	packaging vectors with policy and activation
OpenRTB	real-time transaction	potential bidstream scoring / signal extension
Clean rooms	collaboration environment	governed output and vector-safe analysis

Operating model

Embedding infrastructure operating model.

A serious embedding program is not just a model call and a vector database. It needs data rights, model choice, storage, retrieval, governance, evaluation, and business ownership.

Source — content, behavior, CRM, campaign, CTV, app, commerce, creative, product metadata
Embedding — model, dimensionality, normalization, distance metric, version
Storage / retrieval — vector index, metadata, filters, freshness, deletion
Governance — consent, access, retention, sensitive category, output policy, audit
Activation — DSP, SSP, clean room, CDP, BI, agent, recommendation system
Evaluation — precision, recall, lift, relevance, bias, waste reduction, revenue outcome

Source → embedding → retrieval → governance → activation → evaluation → feedback.

Reading path

Key terms.

Sources

Sources and validation.

Embeddings, privacy, and agentic standards evolve quickly. Validate official documentation, standards versions, and legal guidance before implementation.

Primary sources checked 18 sources

Embeddings: The Next Frontier in Advertising? ↗ No Fluff writing

No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

The originating essay — why vector representations matter for advertising and where they create value and risk. Supports: POV, Use cases.
AI Agents in Ads Need a "Common Language" ↗ No Fluff writing

No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

Why agents need shared representations and protocols to interoperate across the advertising stack. Supports: Agentic framing.
Agentic Advertising Protocols: A Unified Map of What’s Next ↗ No Fluff writing

No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

How AdCP, UCP / Agentic Audiences, and related efforts fit together as layers. Supports: Standards map.
Signal Containerization (essay) ↗ No Fluff writing

No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

Packaging signals (including embeddings) with provenance, policy, activation path, and evaluation. Supports: Operating model, Governance.
AdCP — Advertising Context Protocol (reference) ↗ No Fluff reference

No Fluff Advisory · checked 2026-06-07 · Supporting

The No Fluff reference page for the AdCP agentic workflow layer. Supports: Standards.
IAB Agentic Standards (reference) ↗ No Fluff reference

No Fluff Advisory · checked 2026-06-07 · Supporting

The No Fluff reference page for AAMP, ARTF, Agentic Audiences, and the Agent Registry. Supports: Standards.
Vector embeddings (API guide) ↗ Official docs

OpenAI · checked 2026-06-07 · Primary

An embedding is a vector of floating-point numbers; distance measures relatedness; cosine similarity for retrieval; model dimensions (e.g. 1536 / 3072) are vendor-specific and adjustable. Supports: Definition, Cosine similarity, Dimensionality.
Embeddings — Machine Learning Crash Course ↗ Official docs

Google for Developers · checked 2026-06-07 · Primary

Embedding = vector representation in a lower-dimensional space; distance interpreted as relative similarity; word embeddings often 256 / 512 / 1024 dimensions. Supports: Definition, Embedding space, Dimensionality.
Understand embeddings (Azure OpenAI / Foundry) ↗ Official docs

Microsoft Learn · checked 2026-06-07 · Primary

A vector of floating-point numbers whose distance correlates with semantic similarity; cosine similarity often used; powers vector similarity search. Supports: Definition, Cosine similarity, Vector search.
What is Embedding? (Embeddings in ML explained) ↗ Official docs

Amazon Web Services · checked 2026-06-07 · Primary

Numerical representations of real-world objects learned via neural networks; text, image, and graph embeddings; cross-modal matching (text ↔ image). Supports: Definition, Modalities.
What is Embedding? / What is Vector Embedding? ↗ Context only

IBM · checked 2026-06-07 · Supporting

Embeddings learned from data; cosine / Euclidean / dot-product metrics; nearest-neighbor vector search; per-dimension features usually implicit, not human-labeled; text/image/audio + multimodal. Supports: Metrics, Vector search, Interpretability.
Guidelines 01/2025 on Pseudonymisation ↗ Privacy regulator

European Data Protection Board (EDPB) · checked 2026-06-07 · Primary

Pseudonymised data remains personal data and stays in scope; identifiability assessed on means reasonably likely to be used; singling-out and linkage are re-identification vectors. Supports: Pseudonymisation != anonymisation, Re-identification.
10 Misunderstandings related to Anonymisation ↗ Privacy regulator

AEPD + EDPS · checked 2026-06-07 · Primary

Anonymisation is not automatic and rarely zero-risk; pseudonymisation is not anonymisation; removing direct identifiers is masking only; inference of sensitive traits is possible. Supports: Anonymity bar, Overclaims to avoid.
Pseudonymous data: processing personal data while mitigating risks ↗ Privacy regulator

European Data Protection Supervisor (EDPS) · checked 2026-06-07 · Supporting

Pseudonymised data qualifies as personal data under the GDPR; pseudonymisation mitigates risk but does not remove obligations. Supports: Pseudonymisation status.
Agentic Audiences (formerly UCP) ↗ Official standards page

IAB Tech Lab · checked 2026-06-07 · Primary

Formerly UCP; donated by LiveRamp; encodes identity, contextual, and reinforcement signals as dense vectors officially described as 256–1024 dimensions; status is an initial proposal / draft. Supports: Agentic Audiences, Embeddings in standards.
agentic-audiences (GitHub) ↗ Official standards page

IAB Tech Lab · checked 2026-06-07 · Primary

README: "formerly the User Context Protocol"; "initial proposal"; embeddings encode identity/contextual/reinforcement signals; 256–1024 dims vs thousands of raw features. Supports: Status caution, Signal types.
Agentic Advertising and AI / AAMP ↗ Official standards page

IAB Tech Lab · checked 2026-06-07 · Primary

AAMP umbrella across foundations (ARTF — public comment), protocols (incl. Agentic Audiences), and trust (Agent Registry), built on OpenRTB / AdCOM / OpenDirect / Deals API + taxonomies. Supports: AAMP framing, Status.
Ad Context Protocol (AdCP) ↗ Official standards page

adcontextprotocol (project) · checked 2026-06-07 · Supporting

Separate, non-IAB agentic workflow layer over MCP (discovery, media buy, creative, signals activation); can use embeddings for signal discovery but does not mandate them. Supports: AdCP separation.

Platform capabilities and naming change quickly. Last validated: June 7, 2026. Check current documentation before implementation.

Next step

Building semantic infrastructure for advertising?

Embeddings become useful when they are connected to data rights, signal design, activation paths, governance, and outcome measurement. That is where the operating model matters.

Scope this work Open Enterprise Data Collaboration