Day 2 — AI Text Generation & Customer Acquisition

AI & Marketing

Simon Blanchard

This Lecture

  • Part 1: Case — HubSpot & Motion AI
  • Part 2: How Large Language Models Work
  • Part 3: AI & Advertising

Case Prep Quiz

HubSpot & Motion AI: Chatbot-Enabled CRM (HBS #518-067)

Paper and pencil. Closed notes. 5 minutes.

Quizzes will be collected before discussion begins.

No makeups given.

Lowest quiz grade across all three cases is dropped.

This Lecture

  • Part 1: Case — HubSpot & Motion AI
  • Part 2: How Large Language Models Work
  • Part 3: AI & Advertising

Should HubSpot Replace Its Chat Reps with Chatbots?

The situation — September 2017

HubSpot acquires Motion AI.

Motion AI has built 80,000 bots for brands including T-Mobile, Kia, and Sony.

HubSpot’s own chat reps currently handle lead qualification and funnel conversion for its B2B sales process.

The question on the table:

Replace them — or not?

Pick a side.

Not “it depends.” Not “both.”

Yes — replace human chat reps with chatbots.

No — keep humans in the process.

We will add nuance in a moment. First, give me a position.

HubSpot’s Sales Funnel

The numbers

  • 100 million website visitors per year
  • 4% self-identify → 4 million leads
  • 40% qualify (score 7–10) → 1.6 million qualified leads
  • Chat reps convert 12.5% to demos → 200,000 opportunities
  • Salespeople close 20% → 40,000 customers

It takes 2,500 visitors to yield 1 customer.

Where would you put a bot?

Where would you never put one?

Walk me through your logic at each stage.

HubSpot’s Sales Funnel — Bot or Human at Each Stage?

The numbers

  • 100 million website visitors per year
  • 4% self-identify → 4 million leads
  • 40% qualify (score 7–10) → 1.6 million qualified leads
  • Chat reps convert 12.5% to demos → 200,000 opportunities
  • Salespeople close 20% → 40,000 customers

It takes 2,500 visitors to yield 1 customer.

Stage Bot vs. human
🔝 ToFu — Attract & Identify Bot advantage. Volume too high for humans. Customers not ready for salespeople. 24/7 availability can raise the 4% self-identification rate.
〰️ MoFu — Educate & Nurture Mixed. Bots handle simple FAQs; handoff required as questions grow in complexity and specificity.
🔻 BoFu — Qualify & Close Human advantage. 45-day consultative cycle. $2,400/month product. Bots do triage only — scheduling, pre-qualification.
Post-sale — Onboard & Retain Contested. Bots reduce service cost. Humans may reduce churn. SaaS model makes getting this wrong expensive.

The Economics of Replacing Humans with Bots

Cost to acquire 1 customer — with human chat reps

Line item Calculation Cost
Lead generation $50/lead × 100 leads $5,000
Salesperson $120K ÷ 12 months ÷ 5 customers/month $2,000
Chat rep $60K ÷ 12 ÷ 5 × 3 reps per salesperson $3,000
Total $10,000

100 leads → 40 qualified → 5 demos → 1 customer

Cost to acquire 1 customer — with AI chatbot

Line item Calculation Cost
Lead generation $50/lead × 100 leads $5,000
Salesperson Same as above $2,000
Chatbot One-time fixed cost $0
Total $7,000

Savings: $3,000 per customer acquired

Reinvested at $50/lead → 60 more leads → 0.48 incremental customers

Net result: 1.28 customers for the same spend

Despite a 20% drop in conversion rate (10% vs. 12.5%), bots produce a 28% gain in customers per dollar.

In-Class Exercise — Customer Lifetime Value

The formula

\[CLV = m \times \frac{r}{1-r} - AC\]

(simplified — zero discount rate)

  • \(m\) = annual profit per customer
  • \(r\) = retention rate (1 − annual churn)
  • \(AC\) = acquisition cost per customer

Acquisition costs from the previous slide:

Humans: \(AC = \$10{,}000\)

Bots: \(AC = \$7{,}000\)

Assumptions

Parameter Value Source
Avg. annual revenue / customer $11,660 $271M ÷ 23,226 customers (Exhibit 1)
Gross margin 74% B2B SaaS median benchmark
Annual profit / customer (m) $8,628 $11,660 × 0.74
Annual churn rate 10% B2B SaaS SMB benchmark
Retention rate (r) 0.90 1 − 0.10
Discount rate (d) 0 Simplified

Calculate:

  1. What is CLV with human chat reps?
  2. What is CLV with AI chatbots?

In-Class Exercise — Breakeven Questions

CLV answers from the previous slide

Humans Bots
\(m\) (annual profit/customer) $8,628 $8,628
\(r/(1-r)\) (lifetime multiplier) 9.0 9.0
\(m \times r/(1-r)\) $77,652 $77,652
Acquisition cost (AC) $10,000 $7,000
CLV $67,652 $70,652

The $3,000 difference is entirely the acquisition cost saving. Under identical assumptions, bots always win by exactly \(AC_\text{humans} - AC_\text{bots}\).

Now assume bot-acquired customers have weaker relationships.

Q1 — Retention breakeven

How much would annual churn need to rise — for bot-acquired customers only — before bot CLV equals human CLV?

Hint: set bot CLV = $67,652 (human CLV) and solve for \(r^*\)

Q2 — Margin breakeven

How much would annual profit per customer need to fall — for bot-acquired customers only — before bot CLV equals human CLV?

Hint: set bot CLV = $67,652 and solve for \(m - \Delta m\)

In-Class Exercise — Breakeven Answers

Q1 — Retention breakeven

Set bot CLV = human CLV = $67,652 and solve for \(r^*\):

\[m \times \frac{r^*}{1-r^*} - AC_\text{bots} = \$67{,}652\]

\[8{,}628 \times \frac{r^*}{1-r^*} - 7{,}000 = 67{,}652\]

\[8{,}628 \times \frac{r^*}{1-r^*} = 74{,}652\]

\[\frac{r^*}{1-r^*} = 8.652 \quad \Rightarrow \quad r^* = 0.896\]

\[\text{Churn} = 1 - 0.896 = \mathbf{10.4\%}\]

Churn only needs to rise 0.4 percentage points — from 10% to 10.4% — before the bot advantage disappears entirely.

Q2 — Margin breakeven

Set bot CLV = human CLV = $67,652 and solve for \(\Delta m\):

\[(m - \Delta m) \times \frac{r}{1-r} - AC_\text{bots} = \$67{,}652\]

\[(8{,}628 - \Delta m) \times 9.0 - 7{,}000 = 67{,}652\]

\[(8{,}628 - \Delta m) \times 9.0 = 74{,}652\]

\[8{,}628 - \Delta m = 8{,}295\]

\[\Delta m = \mathbf{\$333/\text{year}} \quad (3.9\%)\]

Annual profit only needs to fall $333 per customer per year — just 3.9% — before bots stop being worth it.

The insight

Both breakevens are very small. The $3,000 acquisition saving is fragile on both dimensions.

The risk is not that bot customers leave sooner. It is that they spend less and cost more to serve while they stay.

The Bot Decision in One Slide

How Human Should the Bot Be? — Disclose or Conceal?

Three design decisions every company faces

1. Disclose or conceal?

Should customers know they are talking to a bot?

2. Brand voice or customer mirroring?

Should the bot speak in a consistent brand voice — or dynamically adjust its tone to match the customer?

3. Functional UI or conversational UI?

Get things done efficiently — or build a relationship through natural dialogue?

On disclosure — the uncanny valley

People prefer human-like bots — up to a point. When a bot that seems human suddenly fails, the reaction shifts from engagement to revulsion.

“The more human-like a system acts, the broader the expectations — and so do the disappointments.”

In 2017, bots failed 70% of the time and could handle less than 20% of an interaction before handoff.

High human-likeness + high failure rate = the worst possible combination for trust.

Open question: Does disclosure reduce satisfaction — or does it protect it by calibrating expectations?

How Human Should the Bot Be? — Voice

Three design decisions every company faces

1. Disclose or conceal?

Should customers know they are talking to a bot?

2. Brand voice or customer mirroring?

Should the bot speak in a consistent brand voice — or dynamically adjust its tone to match the customer?

3. Functional UI or conversational UI?

Get things done efficiently — or build a relationship through natural dialogue?

On voice — a genuine tradeoff

Humans naturally mirror their conversational partners — it is a foundation of relationship building.

But mirroring a frustrated customer’s frustration back at them may amplify the problem.

LLMs can detect sentiment in real time and adjust tone dynamically. The capability now exists.

Open question: Does a bot that adapts its tone feel more relational — or more manipulative?

How Human Should the Bot Be? — UI

Three design decisions every company faces

1. Disclose or conceal?

Should customers know they are talking to a bot?

2. Brand voice or customer mirroring?

Should the bot speak in a consistent brand voice — or dynamically adjust its tone to match the customer?

3. Functional UI or conversational UI?

Get things done efficiently — or build a relationship through natural dialogue?

On UI — speed vs. warmth

B2B buyers are busy. They often just want the answer. But a purely functional UI is essentially a phone tree with a chat interface.

The key finding: customers want outcome speed, not conversation quality.

A bot that resolves in 47 seconds outperforms a bot that has a warm conversation for 3 minutes and fails.

Open question: Does conversational warmth improve outcomes — or does it just slow resolution down?

These are empirical questions. Your projects test them.

This Lecture

  • Part 1: Case — HubSpot & Motion AI
  • Part 2: How Large Language Models Work
  • Part 3: AI & Advertising

The Bot HubSpot Acquired in 2017

What Motion AI’s bots actually were

Rule-based, with limited ML.

A human coder scripted the decision tree. The bot followed it. When customer input did not match an anticipated path — it failed.

This is exactly the rule-based AI from Day 1.

Customer: "Any way to get a discount?"
Bot:       I didn't understand that.
           Press 1 for pricing
           Press 2 for a demo
           Press 3 for a rep

The problem was not the rules.

The problem was that language does not follow rules.

Infinite variation. Ambiguity. Context. Sarcasm. Slang.

No rule set can enumerate all the ways a customer can ask about a discount.

What changed

The chatbot in HubSpot’s 2024 product is not a decision tree.

It is a large language model — trained on billions of documents, capable of responding fluently to inputs no human ever explicitly scripted.

Why does it work where the 2017 bot failed?

The answer is not that someone wrote better rules.

The answer is that someone built a completely different kind of system — one that does not start with rules at all.

We are going to open that system and look inside.

Same framework as Day 1. Same six concepts. One new running example throughout.

The Framework We Will Fill In

The same six concepts from Day 1

On Day 1 we built a framework for understanding any AI system. We used two examples — the cat feeder and the Home Depot return chatbot — to trace how each concept maps onto a real deployed system.

Today we do the same thing for a large language model.

Same framework. New system. One running example throughout.

By the end of this section you will be able to trace exactly what happens to this sentence — from the moment it arrives to the moment the bot responds.

Our running example

A prospect messages HubSpot’s chatbot:

“Is there any way to get a discount before I commit to the annual plan?”

Day 1 concept This system
Perception ?
Representation ?
Model ?
Constraints ?
Algorithm ?
Action ?

Perception — Tokenization

The model does not read words.

It reads tokens — integer IDs representing chunks of text.

“Is there any way to get a discount before I commit to the annual plan?”

16 tokens · 70 characters

Each word or word-fragment maps to one integer in the model’s vocabulary. The model never sees letters.

311 appears twice — once for “to get” and once for “to the.” Same ID. Completely different meaning. The model resolves that from surrounding tokens.

Framework — Perception row filled:

Day 1 concept This system
Perception Token IDs from the input message
Representation ?
Model ?
Constraints ?
Algorithm ?
Action ?

Perception — Token IDs

The model does not read words.

It reads tokens — integer IDs representing chunks of text.

“Is there any way to get a discount before I commit to the annual plan?”

16 tokens · 70 characters

[3031, 1354, 1062, 2006, 316, 717, 261, 11522, 2254, 357, 8737, 316, 290, 12355, 3496, 30]

The model never sees the word “discount.” It sees 11522.

Every operation from here is linear algebra on these integers — tokenized, then converted to vectors.

Framework — Perception row filled:

Day 1 concept This system
Perception Token IDs from the input message
Representation ?
Model ?
Constraints ?
Algorithm ?
Action ?

Two Kinds of Representation

This distinction is almost always skipped. It causes more confusion than anything else.

The word “representation” appears twice in how LLMs work — in two completely different roles.

Role 1 — Training representation

During training, billions of sentences from the internet are processed. Text like:

“Customers who ask about discounts before committing to an annual plan are often in the final evaluation stage.”

This is what the model learns from. It builds the model’s knowledge of language, context, and meaning.

During training, the embeddings are being updated. The weights are changing with every correction.

Role 2 — Input representation (Perception)

During inference, the specific message arrives:

“Is there any way to get a discount before I commit to the annual plan?”

This is what the model is perceiving right now. It uses the knowledge built during training to interpret this input.

During inference, the embeddings are fixed. The weights are frozen.

The same mechanism, two different roles

Training Inference
What Billions of sentences Your specific message
Role Build the model Use the model
Weights Updating Frozen
Day 1 term Training phase Prediction phase

Why this matters for managers

The model does not know HubSpot’s specific discount policy. But it has seen millions of sentences about discounts and annual plans. It learned the statistical patterns of how those conversations unfold.

That is very different from knowing the actual policy.

Representation — Embeddings as Numbers

Each token ID is converted to a vector.

A vector is a list of numbers — a very long one.

For text-embedding-3-small, each token becomes 1,536 numbers.

Here is what the embedding for “Limited-time offer on unbelievably good deals!” actually looks like:

"embedding": [
  -0.02082209,
  -0.0050799586,
  -0.058835678,
   0.017880306,
  -0.02006153,
   0.027121812,
   ...
  -0.0008462113,
   0.029016035,
  -0.0007385851,
   0.06859379,
   0.0150533235,
  -0.009506985,
   0.00739751,
   0.018597813
]

1,536 values. Every token. Not assigned by a human. Learned from billions of prediction-correction cycles.

What do these numbers mean?

Each number is a coordinate in a 1,536-dimensional space. The position encodes meaning — learned by predicting text billions of times, not defined by a human.

For our discount question:

  • discount lands near promo, coupon, offer, deal
  • commit lands near buy, subscribe, purchase
  • annual plan lands near subscription, contract

“Any discount available?”, “any promos?”, and “is there a deal?” all land in the same neighborhood.

The 2017 bot required a human to script each phrasing. The LLM learned the equivalence from data.

Framework — Representation row filled:

Day 1 concept This system
Perception Token IDs
Representation Learned embedding vectors
Model / Constraints / Action next slides

Representation — Self-Attention

A bag of vectors is not a sentence.

Each token has an embedding. But “not good” means something completely different from “good” — even though the individual embeddings for “not” and “good” are the same in both cases.

Self-attention is the mechanism that makes context matter.

For each token, the model computes: how much should every other token in this sentence influence my meaning?

For “discount” in our sentence:

Attends to Weight
“annual plan” 0.88
“commit” 0.72
“any way” 0.52
“before” 0.36
“Is” 0.10

“Discount” in the context of “annual plan commitment” means something specific — pre-purchase pricing inquiry.

Without attention: “discount” is just a vector. With attention: “discount in an annual plan commitment context” is a richer, context-aware vector.

Why this matters for marketing language

“The plan is not discounted.” vs. “Is the plan discounted?”

Same tokens. Opposite meanings. Self-attention captures the difference because “not” attends strongly to “discounted,” inverting the semantic direction.

Tone, urgency, frustration, sarcasm — all live in the attention relationships between tokens.

“I guess I’ll just go with the monthly plan then.”

“Guess” and “just” signal reluctance and implicit downgrade intent. A model with strong self-attention reads this as a retention risk.

The 2017 bot had no equivalent mechanism.

How a Large Language Model Is Trained

Training — step by step

How an LLM Generates a Response

Inference — step by step

Framework Summary — The Raw LLM

The same six concepts. A new system.

The steppers just showed you all six rows — in order.

The key insight from training:

The model was never told what “discount” means or that “commit to annual plan” signals buying intent. All of it fell out of predicting text billions of times.

The algorithm forced the knowledge. The representation encoded it geometrically. The model locked it in.

The raw LLM has no guardrails.

The context window limits how much the model can see at once. That is the only structural constraint the raw LLM has.

Nothing prevents it from inventing a discount code, quoting outdated pricing, or making promises that contradict policy.

Guardrails do not exist at the LLM level. They are a system-level addition. Which is exactly what we need to build next.

The completed framework — raw LLM

Day 1 concept Raw LLM
Perception Token IDs from the input message
Representation Learned embedding vectors
Model The system logic: perceive → embed → predict → generate
Constraints Context window only — no guardrails
Algorithm Next-token prediction + backpropagation
Action Generated text — one token at a time

Five things the raw LLM cannot do:

Limitation Business consequence
Knowledge cutoff Quotes expired promotions or wrong pricing
No memory Prospect repeats themselves every session
Cannot act “I’ve flagged this” — but nothing was logged
Hallucinates Prospect expects a discount that does not exist
No private data Cannot personalize to this prospect

The solution is not a smarter LLM. It is a better system around it.

Worked Example — The DC Tax Calculation

The prospect’s question

“What’s the final price after the 20% discount, with sales tax? I’m in DC.”

What a raw LLM generates:

“With a 20% discount the Professional plan would be $640/month. DC sales tax is 6%, bringing it to $678.40.”

Why this is a problem

  • Did the LLM calculate $800 × 0.80? Or did it predict the token “640” because that looks plausible after those inputs?
  • Is DC’s SaaS tax rate actually 6%? Training data may be outdated or wrong.
  • The LLM generates the same confident answer whether it is right or wrong.

The solution: give the system a calculator.

The Simple System in Action

Steps 1–3

1. Perception: Message + tool definition arrive. calculator(expression: string) → float

2. Representation: Context window assembled.

3. Model — LLM pass 1: Predicts a tool call:

{"tool": "calculator",
 "input": "800 * 0.80 * 1.06"}

Steps 4–6

4. Algorithm: Calculator runs: 800 × 0.80 × 1.06 = 678.4

5. Representation updated: [TOOL RESULT] calculator → 678.4

6. Model — LLM pass 2:

“With the 20% discount and DC’s 6% SaaS tax, your monthly cost would be $678.40.”

Correct because the calculator verified it.

From Simple to Full — What Is Still Missing

The simple system solved arithmetic.

It did not solve the actual business problem.

The prospect asked:

“Is there any way to get a discount before I commit to the annual plan?”

What the simple system still cannot answer:

  • Is there currently a promotion running for the annual plan?
  • Is this prospect eligible for a discount?
  • What is their account status and lead score?
  • Has a sales rep already been in contact with them?
  • What did they discuss last week?

None of these can be answered by a calculator.

They require retrieval, memory, and access to private data.

What needs to be added

Retrieval (RAG) Before the LLM generates anything, the system queries: - Promotions database → is ANNUAL20 active? - Pricing documentation → what are the current tiers? - Policy document → what discounts can be self-served?

Memory The prospect’s conversation history, previous interactions, and account status persist across sessions.

CRM integration This prospect’s lead score, the plan they are evaluating, and whether a sales rep has already engaged.

Guardrails Policy rules enforced before any response is sent: no discount above X% without approval, always escalate if the request exceeds bot authority.

Tools (beyond calculator) Apply a promotion code, log a CRM note, schedule a sales call, send a follow-up email.

The simple system had one tool. The full system has a toolkit.

Worked Example — The Full System (Steps 1–2)

Same question. Full system.

“Is there any way to get a discount before I commit to the annual plan?”

Step 1 — Retrieval (before LLM)

System detects discount + annual plan intent. Queries three sources:

Promotions DB  → ANNUAL20: 20% off,
                  new customers, expires Apr 30
Pricing docs   → Professional: $800/month
Policy doc     → Up to 20% self-serve;
                  above 20% needs approval
CRM record     → Lead score: 82
                  Plan: Professional
                  Status: new customer

Step 2 — Context injection

All retrieved data injected into context window.

LLM now perceives: question + actual policy + active promotion + prospect’s account status.

Worked Example — The Full System (Steps 3–4)

Step 3 — LLM generates (grounded)

“Yes — we have a current promotion for new customers on annual billing: 20% off your first year. For the Professional plan, that’s $640/month — or $7,680/year. Would you like me to apply that to your account?”

Not because the LLM knew any of this. Because the system retrieved it and put it in context.

Step 4 — Tool calls (if prospect says yes)

apply_promotion(account_id, "ANNUAL20")
log_crm_note(lead_id, "Discount applied — 20%")

Guardrail check: 20% within self-serve authority. ✓

What changed from the raw LLM:

  • ANNUAL20 retrieved, not hallucinated
  • $640 verified by calculator, not predicted
  • Policy check confirmed self-serve authority
  • CRM note actually logged

The LLM generated the words. The system provided the facts, enforced the policy, and took the action.

Framework Summary — The Full System

Raw LLM — what it could do alone

Day 1 concept Raw LLM
Perception Token IDs from input only
Representation Learned embeddings
Model Perceive → embed → predict → generate
Constraints Context window only
Algorithm Next-token prediction
Action Generated text only

Generates plausible text. Cannot verify it. Cannot act. Cannot remember.

Full system — what the architecture adds

Day 1 concept Full GenAI system
Perception Input + retrieved context + memory
Representation Embeddings + structured retrieved facts
Model Extended flowchart: retrieve → LLM → tools → guardrail → output
Constraints Context window + policy rules + tool interception
Algorithm Next-token prediction + semantic search + tool execution
Action Text + CRM updates + emails + escalations

Generates grounded, policy-consistent responses. Remembers across sessions. Takes real-world actions.

This Lecture

  • Part 1: Case — HubSpot & Motion AI
  • Part 2: How Large Language Models Work
  • Part 3: AI & Advertising

HubSpot Was One Slice of a Larger Problem

The HubSpot case covered one narrow moment in customer acquisition: what happens after someone becomes a lead.

The chatbot qualified interest. The LLM drafted responses. The system applied a discount code.

But before any of that happened, something had to bring that prospect to HubSpot’s website in the first place.

That is customer acquisition.

And AI enters it at every stage — not just at the lead qualification step.


Where we were in the funnel

Awareness
    │
Intent
    │
Conversion  ← HubSpot chatbot lives here
    │
Retention

The question is not what HubSpot should do with its chatbot. The question is how AI enters the full acquisition system.

Where are CMOs actually investing in AI?

The Acquisition Funnel

Four stages. Different objectives at each.

Awareness The prospect does not know you exist yet. Goal: get seen by the right people.

Intent The prospect is actively looking for a solution. Goal: be findable when they search.

Conversion The prospect is evaluating options. Goal: remove friction and close.

Retention (covered in a later session) The customer has purchased. Goal: reduce churn, expand revenue.

Today: awareness, intent, and conversion.

What marketers actually do at each stage

Stage Activities
Awareness Display · paid social · video · sponsorships · PR
Intent Paid search · SEO · comparison content · email capture
Conversion Landing pages · retargeting · lead scoring · chat · nurture
Retention (Week 5 — PittaRosso case)

The funnel is not abstract. It is a set of tasks, each with its own data, budget, and performance metric.

AI does not enter “marketing.” It enters specific tasks at specific stages — with different data requirements, different risks, and different economics at each.

Awareness — Decision 1: Who Should See the Ad?

The Container Store — cookieless lookalike

Type: Supervised learning

Problem: iOS14 and cookie deprecation eliminated third-party signals. Standard lookalike audiences stopped working.

Approach: LiveRamp’s identity graph matched first-party customer data to behavioral signals across the open web, without third-party cookies.

Result:

  • 37% more purchases
  • 1.4× ROI maintained in a cookieless environment

Source: Total Retail / LiveRamp, 2024

What the model does: Learns the profile of past purchasers. Finds new users who match that profile. Shows them the ad.

Awareness — Decision 1: The Structural Problems

Problem 1 — The seed encodes the past

The model’s training data is existing customers. That is who it learns to find more of.

A 25-year-old who just moved into their first apartment has never visited The Container Store. They need home organization products. The lookalike model cannot find them — they look nothing like anyone in the seed.

The model reproduces the existing customer base. It does not find the customers you haven’t reached yet.

If your customer base skews toward a particular demographic or geography, the model concentrates spend there, not by design, but because the seed never included anyone else.

Problem 2 — Incrementality

The model finds likely buyers — not incremental buyers.

Without a holdout test (a control group who did not see the ad) there is no way to know how many of those 37% would have purchased anyway.

Supervised learning and the Day 1 conditions:

Condition Container Store
Volume high? ✓ Millions of impressions
Signal measurable? ✓ Purchases tracked
Task well-defined? ✓ Find likely converters
Training data representative? ✗ Seed = past customers only

The incrementality gap is what the Artea case forces you to measure next week.

Awareness — Decision 2: Which Creative Should They See?

Haleon / Panadol — DCO for pain relief (2023)

Type: Reinforcement learning (multi-armed bandit)

Context: Panadol (owned by Haleon) wanted to reposition from “headache tablet” to solution for all pain types. Hong Kong market.

Stack: Zenith (strategy) · Innovid (DCO) · The Trade Desk (targeting)

Approach: 5 base creative templates. Innovid’s engine generated 600+ ad versions matched to pain type, daily moment, and audience.

Example: Office worker + back pain → image of man at desk + Panadol Joint Extend.

Result:

  • 30% above benchmark performance
  • 20× lower creative production cost vs. manual

Source: Innovid / Haleon case study, 2023

Awareness — Decision 2: How DCO Works

The bandit learning loop

Step 1 — Segment
Audience split by pain type and context:
  back pain · headache · fever · joint pain

Step 2 — Select
Choose the creative combination with the
highest predicted completion rate for
this segment. Initially: explore variants.
Over time: exploit what works.
  audience:  office worker, back pain
  image:     man at desk, hands on back
  product:   Panadol Joint Extend
  message:   "Back to work, not to pain"

Step 3 — Serve
Ad rendered in real time.
600+ variants from 5 templates.

Step 4 — Update
Completion rate observed.
Winning combinations earn more impressions.
Losing combinations fade out.

This is a multi-armed bandit: a simplified form of RL where the agent learns which arm (creative variant) to pull more often, based on observed reward. No labeled training data required.

What makes DCO different from generative AI

DCO selects and assembles from human-designed components. Generative AI creates new content.

DCO Generative AI
Source Human templates Model output
Brand control High Requires guardrails
Scale High Very high
Accuracy risk Low Higher

The constraint that makes DCO work:

Human creative teams built the 5 base templates. Brand safety, product accuracy, and regulatory compliance live in the template design. The model cannot change them.

Where DCO goes wrong:

The model optimizes completion rate — not brand perception, purchase intent, or long-term equity.

A version that drives clicks may not drive the right association. The metric is a proxy. Treat it as one.

Awareness — Decision 3: What Creative Should We Make?

JPMorgan Chase — AI-generated copy (Persado)

Type: Generative AI with explicit constraint layer

Problem: Testing ad copy variants at scale requires writing hundreds of versions manually.

Approach: Persado’s platform generates and tests language variants, predicting emotional response and engagement per phrase × audience combination. Compliance filters and brand voice rules applied before any output is shown.

Result: Up to 450% higher CTR vs. human-written copy in controlled tests.

Source: Persado / JPMorgan Chase

Why it worked:

  • Task narrow: headline optimization
  • Signal measurable: CTR
  • Constraint layer explicit: compliance rules, brand voice, prohibited financial claims
  • Feedback fast: A/B results in days

Awareness — Decision 3: When Generative Creative Fails

McDonald’s Netherlands — AI holiday ad (2024)

Depicted festive chaos: cyclists in snow, Santa in traffic, family disaster.

Pulled after launch due to public backlash. Viewers called it “AI slop.”

Coca-Cola — AI holiday ads (2024 and 2025)

AI recreation of the classic 1995 “Holidays Are Coming” ad. Criticized as “soulless” and “creepy.”

Coca-Cola ran a second AI campaign in 2025 despite the 2024 backlash.

Source: Nielsen Norman Group, Dec 2025

NNG’s diagnosis:

“Audiences can perceive when the narrative is shaped around what the technology can do rather than what the story should be.”

The Day 1 conditions applied

Condition JPMorgan McDonald’s
Task well-defined? ✓ Headline CTR ✗ Brand narrative
Signal measurable? ✓ Click rate ✗ Emotional resonance
Feedback fast? ✓ A/B results ✗ Brand equity is slow
Constraints defined? ✓ Compliance layer ✗ Model decides story

The technology did not fail. The application did.

Conversion — Decision 4: Who Do We Retarget, and With What Offer?

Criteo — La Redoute dynamic retargeting

Type: Supervised learning + reinforcement learning

La Redoute is a French fashion and home retailer with millions of SKUs. Users browse, do not buy, leave.

Approach: Criteo’s engine tracks product views, predicts purchase probability per user-product combination, and serves dynamic ads across 19,000+ publisher sites, showing the right product at the right moment.

The model also surfaces related products the user had not viewed, predicting adjacent demand.

Result:

  • 2× CTR vs. standard display
  • 28% of incremental sales from products the user had never viewed

Source: Criteo / La Redoute

The model found demand the user did not yet know they had. That is the retargeting opportunity lookalike targeting cannot reach.

Conversion — Decision 4: The Retargeting Traps

Retargeting vs. lookalike: different signal, different risk

Lookalike (D1) Retargeting (D4)
Stage Awareness Conversion
Signal Past purchasers Current browse behavior
Intent window Weeks Hours
Model knows Profile similarity Product interest

The RL component: timing and offer

Not every retargeted user needs a discount. The model learns: who converts with an incentive, what size, and what delay?

The coupon trap

If the model learns that discounts convert abandoned carts, it serves discounts consistently. Users learn to browse, abandon, and wait.

The model optimized conversion rate and trained customers to expect a discount.

The model optimizes what it measures. The goal was profitable conversion. The metric was conversion rate. Those are not the same thing.

Conversion — Decision 5: How Do We Qualify and Respond to the Lead?

HubSpot: 2017 vs. 2024

Type: All three: supervised, retrieval, generative + RL escalation policy

2017 Motion AI 2024 Breeze AI
Architecture Rule-based LLM + retrieval + tools
Language handling Scripted menus Arbitrary inputs
Failure mode Loud: “I don’t understand” Quiet: confident wrongness
Content generation Human-authored Near-zero marginal cost
Failure rate ~70% (Facebook data) Much lower in narrow domains

The same decision from Part 1 — should HubSpot replace its chat reps? — now has a different technical answer.

Source: HubSpot Breeze AI

Conversion — Decision 5: What Changed, and What Didn’t

What GenAI fixed

The relational intelligence gap narrowed. The 70% failure rate dropped for narrow tasks. Content generation became nearly free.

What GenAI did not fix

  • Only 20% of Gen Z and 4% of Boomers prefer chatbots over humans for service interactions
  • Hallucination rates 3–27%. A wrong pricing claim in a B2B sales conversation carries real legal consequences.
  • Reward hacking: optimize for conversion and the bot learns to say whatever closes the deal.

The task-fit condition has not changed.

High-volume, narrow task: GenAI helps. Complex relational judgment, emotionally sensitive, ambiguous goal: human advantage persists.

Three new failure modes the 2024 system introduced that the 2017 system did not have

Stale retrieval

Quotes a promotion that expired; no one updated the database.

Constraint gaps

Guardrails only cover cases someone anticipated. Policy gaps are invisible until a customer finds one.

Confident wrongness

The LLM generates fluently whether retrieval succeeded or not. A rule-based system fails loudly. An LLM-based system fails quietly.

Your project tests one design response to these risks. You will answer it with data.

Was the Motion AI Acquisition a Smart Move?

The case for yes

  • Gave HubSpot early access to the conversational interface layer before competitors moved
  • Built internal capability and data infrastructure that proved valuable when LLMs arrived
  • The 2024 Breeze architecture builds directly on the CRM integration work that began with Motion AI
  • First-mover positioning in chatbot-enabled CRM created customer lock-in and platform stickiness

The case for no

  • Motion AI’s rule-based technology had a 70% failure rate and was effectively obsolete within 5 years
  • HubSpot could have bought or built an LLM-native product in 2022–23 without the 2017 infrastructure
  • The actual technology acquired was not what made the 2024 product work

Was the Motion AI Acquisition a Smart Move?

The harder question: was HubSpot buying a product — or buying a bet?

The bet: that conversational interfaces would become the primary channel for B2B sales interactions, and that owning that layer early would be worth riding through one technology generation.

  • What did HubSpot learn from 2017–2022 that it could not have learned by entering in 2022?
  • Did early adoption create data advantages that compounded?
  • Would a 2022 cold start have been faster and cheaper than six years of low-quality bot data?

There is no clean answer. The case asks you to make the argument.

Group Assignment 1 — Artea Targeting Strategy

Due before Week 3 (April 1) — submit via Canvas

Artea ran an A/B test on 5,000 customers. Half received a 20% off coupon. You will analyze the results and recommend a targeting policy for the next campaign of 6,000 customers.

The data: two Excel tabs

AB_test — 5,000 customers, acquisition channel, cart status, past behavior, and outcomes (transactions, revenue) one month later.

Next_Campaign — 6,000 new customers, same variables, no outcomes. These are the customers Artea needs to decide whether to target.

Five questions

Q1 — The Experiment (10 pts) Why does this require a randomized control group? What is one important limitation?

Q2 — Overall Effect (10 pts) What did the coupon do? Are you confident enough to act on it?

Q3 — Heterogeneity (25 pts) Does the effect differ by acquisition channel and cart status? Would you build a targeting policy around it?

Q4 — Targeting Policy (50 pts) State the rule · Predict the effect · Break-even ($0.50 cost, 20% off, $65 avg transaction) · Justify vs. send-all or send-none

Q5 — What Your Policy Cannot Tell You (15 pts) What assumption might not hold, and what happens to your predictions if it is wrong?

What We Covered Today

Part 1 — HubSpot case

Should HubSpot replace its chat reps with bots? You worked through the CLV math, the breakeven, and the three design questions: disclose or conceal, voice, and interface. The economics can work. The design choices determine whether they do.

Part 2 — How LLMs work

Six steps from raw message to generated response: tokenize, embed, attend, distribute, sample, output. The raw LLM has no guardrails, no memory, no tools. Adding retrieval, tool calls, and a guardrail layer is what turns an LLM into a system.

Part 3 — AI across the acquisition funnel

Five decisions. Three ML types. Five examples. The model optimizes what it measures. Defining the right metric and the right constraints is the marketer’s job, not the model’s.

HubSpot Breeze closes the loop: the same system from Part 2, deployed at the conversion stage — with three failure modes the rule-based bot never had.

Part 3 — AI across the acquisition funnel

Five decisions. Three ML types. Five examples.

Decision ML type
Who sees the ad? Supervised
Which creative to show? RL (bandit)
What creative to make? Generative
Who to retarget? Supervised + RL
How to qualify the lead? All three

The model optimizes what it measures. Defining the right metric and the right constraints is the marketer’s job, not the model’s.

Before next class

Read the Artea case (HBS #521-021).

Group Assignment 1 due before class. Submit on Canvas before you walk in.

There is a quiz at the start of class. Paper and pencil, closed notes.

⚠️ Do not be late.