Day 1 Study Guide — AI & Marketing

Part 4 The Engineering Definition of AI

In this course, AI is defined not by whether a machine "thinks" — but by whether it produces intelligent behavior through a chain of engineering decisions.

PerceptionRaw input

RepresentationEncoded form

ModelStructure

AlgorithmProcedure

ConstraintsRules & limits

BehaviorObservable output

Perception

The system acquires raw input from its environment — pixels from a camera, text from a customer message, sensor readings, clicks.

Examples from class

Cat feeder: a camera captures a face at the bowl — raw pixel values, nothing more.

Return chatbot: a customer types "my drill is busted" — a string of characters the system cannot yet act on.

Why it mattersMost AI failures in practice are perception failures, not algorithm failures. A model trained on clean studio photos will fail when customers upload dark, blurry phone snapshots. If the input is bad, everything downstream is wrong.

Representation

Raw input is transformed into a structured form that computation can act on. The unifying intuition: a representation is a compression that preserves what matters for the task and discards what does not. The task defines what "matters."

Key insight: the goal defines the representation. A "cat detector" (cat: yes/no) cannot distinguish Maurice from Garfield. A different goal — whose cat is this? — requires a completely different representation, even with identical input pixels.

Feature Engineering — Two Ways to Build a Representation

Turning raw input (pixels, words, audio) into something a model can use is called feature engineering. There are two broad approaches, and the tradeoff between them comes up constantly in practice.

Interpretable Features (hand-crafted)

A human decides what to measure. For a photo of a cat at the feeder, you might extract:

brightness:     0.72
contrast:       0.88
face_width_px:  142
ear_pointiness: 0.91
fur_color_rgb:  [180,140,90]

Each number has a meaning you can explain and debug. The limit: you can only capture what you thought to measure. Variation you did not anticipate — a new angle, different lighting — may break the system.

Embeddings (learned automatically)

The system learns its own compressed representation from many examples. The same photo becomes:

[-0.24, 0.87, 0.03, -0.61,
  0.44, -0.12, 0.79, 0.35,
  ... 512 values total]

No individual number has a clear meaning. But two photos of Maurice — different angles, different lighting — produce vectors that are close together in space. The system captures variation you never thought to encode, at the cost of interpretability.

For text: words are mapped to vectors where similar meanings land nearby. "Broken," "damaged," and "defective" all cluster together — so the system recognizes that "my drill is busted" and "I received a faulty drill" express the same intent, even though they share no words.

For video: each frame is encoded as an image representation; the audio track is converted to a spectrogram (a visual map of sound frequencies over time) and treated similarly. Representations across frames are combined to capture motion and sequence.

Why it mattersInterpretable features are transparent and auditable but limited to what you anticipated. Embeddings are powerful but opaque — you cannot easily explain a specific decision. This tradeoff between performance and explainability is real in every deployed marketing system, and it matters for regulation, debugging, and trust.

Model

A simplified, explicit structure that supports prediction or decision. Every AI system is a model — a selective simplification. What is left out matters as much as what is included.

Examples from class

Cat feeder: "Is this a cat?" — two decision points, two outcomes. Simple flowchart.

Return chatbot: a sequence — is this a return request? is the order valid? does the photo match? — leading to an action.

Why it mattersKnowing what a model cannot do is as important as knowing what it can. Every model handles a defined subset of situations. Outside that subset, it fails — often silently, without error messages.

Algorithm

A sequence of steps that operates over the representation, respects the constraints, and produces an output. For the cat feeder: compare input embedding to registered embeddings → find nearest match → check confidence → check feeding schedule → output a decision. Each step is computable because the representation made it computable.

In learning-based systems, the weights inside the algorithm are learned from data — not written by a human. That is the key difference from a hand-coded rule system.

Why it mattersYou can inspect the algorithm's steps. You cannot directly read the learned weights. When a deployed system fails, diagnosing it requires understanding the training data and the representation — not just the procedure.

Constraints

Rules that limit which outputs are allowable. A return chatbot cannot issue a label for an order outside the 90-day window — but only if that constraint is encoded and the representation carries the required fields.

Key insight: constraints drive representation design. If you need to enforce a rule, your data must carry the fields that make the rule checkable.

Examples from class

Cat feeder: Maurice is recognized — but last_fed: 1h ago means eligible: no. Identity alone is not enough. The no-overfeeding constraint required adding new fields to the representation.

Return chatbot: return window = 90 days; product must match the order. Both constraints require specific fields in the representation to be enforceable at all.

Why it mattersIn marketing: an AI that optimizes for click-through rate without a frequency constraint will show the same ad to the same user until they unsubscribe. The constraint is where your business rules have to go.

Behavior

What the system does over time — the observable output of the entire pipeline. Maurice gets fed. Garfield is blocked. A return label is issued or denied. Behavior is what users experience; everything else is invisible infrastructure.

Why it mattersEvaluating AI in a business context means evaluating behavior: does it produce the right outputs, for the right users, under the right conditions? This is exactly what the course project measures.

Go deeper

Video · 20 min3Blue1Brown — But What Is a Neural Network? — best visual intuition for how representations and models connect
ArticleHBR — AI for the Real World — frames AI as a tool for specific tasks, not general intelligence

Common Confusions Questions That Came Up in Class

"But ChatGPT seems to understand things — doesn't that mean it thinks?"

It helps to distinguish between two things that often get conflated: a large language model (LLM) and a deployed AI system like ChatGPT.

An LLM is a model that predicts the next token — the next word, or piece of a word — based on patterns in billions of text examples. It does not reason, retrieve, or act. It generates plausible continuations of text. The output feels like understanding because human language is the training data. But ask it to do precise arithmetic or recall something after its training cutoff, and it fails — because those tasks require something other than pattern completion.

ChatGPT is a full AI system built on top of an LLM. It adds tool use (web search, code execution, calculators), memory, safety filters, and other components. In the course framework: the LLM is the model/algorithm layer, and ChatGPT wraps it in perception (your message), constraints (safety and policy layers), and behavior (response plus actions taken by tools). Saying "ChatGPT thinks" conflates the underlying model with the full system — and misses where the real engineering decisions live.

"Isn't the engineering definition too narrow? It seems to leave out real intelligence."

Yes, deliberately. Whether machines are "really" intelligent is a philosophical question with no operational answer. The engineering definition is useful precisely because it is narrow: it tells you what you need to build, what can go wrong, and how to evaluate whether a system is working. For business purposes, behavior is what matters — if the system produces the right outputs under the right conditions, it is doing its job, regardless of inner experience.

"Running a regression is not AI, but a recommendation engine is — where exactly is the line?"

The line is the perception-action loop. A regression model applied to a spreadsheet produces an output that a human then acts on — the model is not in the loop. A recommendation engine perceives a user's behavior in real time, represents it, models preferences, and directly changes what that user sees — closing the loop between perception and action. The distinction is not about the algorithm; it is about whether the system itself is embedded in the environment it affects.

Go deeper

Short readIEEE Spectrum — The Turbulent Past and Uncertain Future of AI — survey of definitions from practitioners
ClassicTuring, A. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433–460. — the original "can machines think?" paper; surprisingly readable

Part 5a Rule-Based vs Learning-Based Systems

Expert systemIF-THEN rules Knowledge acquisition bottleneck Statistical learningGeneralization

Dimension	Rule-Based	Learning-Based
How it works	Human encodes knowledge as explicit IF-THEN rules	System induces patterns from labeled examples
Works well when	Domain is small, stable, and fully enumerable	Domain is large, variable, or hard to articulate
Fails when	World changes faster than rules can be updated; input varies infinitely	Training data is biased, scarce, or mislabeled
Transparency	Fully auditable — every rule is readable	Often opaque — learned weights are not interpretable
Marketing example	Phone tree: Press 1 for returns, Press 2 for status	Intent classifier: maps "my drill is busted" → `start_return`

"Is a phone tree really AI? It feels like just a program."

This is a genuinely good question, and the honest answer is: it depends on how you define AI — and the definition keeps moving. By the engineering framework from class (perception → representation → model → behavior), a phone tree qualifies: it perceives input (button presses), applies a model (the menu script), and produces behavior (routing you to the right department). It is a very simple AI system.

But most people instinctively feel it is not AI. This intuition has a name: the AI effect. Once a technology becomes routine and understood, we stop calling it AI and start calling it "just software." Expert systems that could diagnose infections better than junior doctors were called AI in 1985. Today we would call them decision trees. The same technology, reclassified. Wikipedia — The AI Effect →

What makes the phone tree feel like "not AI" is that it has no learning, no generalization, and no ability to handle inputs outside its explicit menu. These are real limits — and they are exactly why learning-based systems replaced rule-based ones for open-ended tasks. But the framework applies to both. Recognizing this helps you evaluate any system someone calls "AI," not just the impressive ones.

"Is machine learning a subset of AI? That's what everyone seems to say."

This is one of those cases where the popular diagram — a set of nested circles with AI on the outside and ML inside — is convenient but not quite right. Two points worth keeping separate:

AI does not require machine learning. Expert systems — rule-based programs that encode human knowledge as IF-THEN logic — were the dominant form of AI from the 1970s through the late 1980s. They had no learning component whatsoever. A phone tree, a chess engine with hand-coded evaluation functions, or a medical diagnosis system built from clinical rules can all produce intelligent behavior without learning anything from data. Saying ML is necessary for AI would make all of those not AI — which is historically and practically wrong.

Not all machine learning is AI. Using a neural network to predict next quarter's sales from a spreadsheet — with a human looking at the output and deciding what to do — is machine learning. But by the framework from this course, it is not necessarily AI: there is no perception loop, no environment the system acts on, no behavior it produces autonomously. It is a sophisticated statistical model. Useful, but not the same thing. The moment that model is embedded in a system that perceives customer behavior and automatically adjusts pricing — now it is part of an AI system.

The cleaner framing: AI and ML overlap, but neither contains the other. Some AI uses ML. Some ML is part of an AI system. Both can exist without the other. What makes something AI is not the algorithm — it is whether the system closes the loop between perception and action in the world.

Both approaches coexist in practice. Many deployed systems layer them: a rule-based component enforces hard policy constraints while a learning-based component handles open-ended input.

Go deeper

ArticleIBM — What Are Expert Systems?
Short readHBR — The Simple Economics of Machine Intelligence (Agrawal, Gans, Goldfarb)

Part 5b Machine Learning Fundamentals

Training dataLabels Training phasePrediction phase GeneralizationData annotation Representativeness

The Core Idea

Machine learning estimates a function f from data: given inputs, produce outputs. The goal is not to memorize training examples but to generalize — to produce correct outputs on inputs the model has never seen.

From class

Garfield wearing a cardboard mask has never appeared in training data. The raw pixels look nothing like any training photo. But the face embedding is still close enough to "Garfield" that the model correctly identifies and blocks him. That is generalization.

Labels Are Expensive — and Human

Every supervised learning system requires labeled training data. Someone looked at each example and assigned the correct output. At scale, this is done by paid annotation workers via platforms like Amazon Mechanical Turk or Scale AI. Labels are not free, not perfect, and not neutral.

Real examples of labeling failures

Google Photos, 2015. The image classifier labeled photos of Black users as "gorillas." The root cause was not a broken algorithm — it was training data that did not include sufficient diversity of faces, combined with annotators who lacked the guidance and context to label fairly. Google removed the gorilla category entirely rather than fix it. The Guardian, 2015 →

Hiring algorithms. Multiple companies trained resume-screening systems on historical hiring data. Because most historical hires were male, the model learned to penalize resumes with signals associated with women — including attending women's colleges. The annotators did not intend this; they labeled accurately. The bias was in the data itself. Reuters — Amazon's scrapped AI recruiting tool →

Medical imaging. A dataset intended to classify chest X-rays was labeled by radiologists in a single country. The model performed well there and poorly in other regions — not because the algorithm failed, but because the annotators' definitions of "abnormal" reflected one clinical context. Nature Medicine — Underdiagnosis bias in AI systems →

Cost as a constraint. For a catalog of 10,000 products with 50 photos each, that is 500,000 individual labeling judgments. At a few cents per label, this costs tens of thousands of dollars — before quality checks. Companies routinely cut annotation budgets, which means more rushed decisions, more edge cases guessed rather than escalated, and more noise in the training data.

Why it mattersThe machine learns the labels the humans gave — including their mistakes, their biases, and the assumptions baked into the labeling instructions. Label quality is a business decision, not just a technical one. Faster and cheaper annotation produces noisier models. This tradeoff is real in every marketing AI deployment.

Training vs Prediction

Training is expensive — requires labeled data, compute, and engineering infrastructure. It happens offline, periodically. Prediction is cheap — the trained model is just a function evaluated on new input in milliseconds. Most deployed systems are trained periodically and then frozen.

Why it mattersA model is a snapshot. If customer behavior shifts after training — new products, seasonal language, changing preferences — the model degrades silently until someone notices the outputs are wrong.

What Makes a Good Training Dataset

Representative: covers the range of inputs the model will actually see in deployment — not just the clean, easy cases.
Accurately labeled: edge cases are where label quality degrades most.
Large enough: more variation in the world requires more examples.
Recent enough: product lines change; old training data may not reflect current inputs.

Go deeper

Video · 20 min3Blue1Brown — What Is a Neural Network?
ArticleMIT Tech Review — The Humans Behind AI's Data
InteractiveGoogle ML Crash Course — Overfitting

Part 5c Types of Machine Learning

Supervised learningUnsupervised learning Reinforcement learningClassification RegressionClustering Reward signalExplore vs exploit Reward hackingPolicy

Type	Feedback during training	What it learns	Marketing use
Supervised	Labeled examples (input + correct output)	A function mapping inputs to labels or values	Churn prediction, click probability, intent classification, product image matching
Unsupervised	None — just the data itself	Structure and groupings in the data	Customer segmentation, topic discovery in support tickets, anomaly detection
Reinforcement	Rewards/penalties from the environment	A policy: what action to take in each state	Real-time ad bidding, recommendation engines, chatbot escalation policy

Supervised Learning

The most common type of ML in marketing applications. You provide labeled examples — (input, correct output) pairs — and the system learns a function that maps new inputs to outputs it has never seen. The word "supervised" refers to the fact that human judgment is baked into every label.

There are two main tasks: classification (predicting a category — churn/no churn, match/no match, which intent) and regression (predicting a continuous value — predicted lifetime value, optimal bid price, expected revenue). The algorithm is different; the logic is the same.

Real examples

Spam filters. Gmail's spam classifier is trained on billions of labeled emails — spam or not spam — marked by users over years. When you mark something as spam, you are contributing a training label. The model learns which combinations of sender, subject, content, and metadata predict spam. Google's approach to email classification →

Netflix recommendations. Netflix's recommendation system uses supervised classification to rank content for each user. Labeled training data — which titles a user watched, completed, or skipped — feeds classifiers including logistic regression, support vector machines, neural networks, and gradient boosted decision trees. Each model learns to predict which content a user will engage with. The label is implicit behavior, not a human rating. Netflix Tech Blog — Beyond the 5 Stars (Part 2) →

From class — Maurice and the return chatbot. Both are classification problems. The feeder maps face embeddings to {Maurice, not Maurice}. The chatbot maps message embeddings to {start_return, check_status, damaged_item, …}. Same structure, different representations.

Why it mattersSupervised learning is powerful but bounded by its labels. It can only recognize what it has been shown. A churn model trained on data from 2022 may miss entirely new patterns of disengagement that emerge in 2024. The model does not know what it does not know.

→ Go deeper: 3Blue1Brown — Neural Networks (Video · 20 min) — shows visually how a network learns to classify handwritten digits, which is structurally identical to learning to classify Maurice vs. not Maurice. Start here if the idea of "learning from examples" still feels abstract.

Unsupervised Learning

No labels. No correct answers. The system finds structure in data on its own — groupings, patterns, anomalies — without being told what to look for. The most common task is clustering: partitioning examples into groups that are similar within and different across. Other tasks include dimensionality reduction (compressing a high-dimensional representation into something visualizable) and anomaly detection (finding examples that do not fit any pattern).

The key shift: in supervised learning, a human defines the categories in advance. In unsupervised learning, the algorithm proposes the categories and a human decides whether they are meaningful.

Real examples

Clustering at scale. Spotify uses clustering to group songs, artists, and users by similarity — without predefined categories. Each user's listening history is compressed into an embedding vector representing their position in "taste space." Users close together in that space receive similar recommendations. The clusters emerge from the data; no one decided in advance what the groups should be. Spotify Engineering — Data Science →

Topic discovery in customer feedback. Rather than tagging support tickets by hand into predefined categories, companies run topic modeling (a form of unsupervised learning) on tens of thousands of tickets to surface recurring themes. The algorithm finds that "charger," "battery," and "won't turn on" cluster together before any human labels them as "power issues."

From class — the chatbot intents. The return chatbot was built with six defined intent categories. Clustering 20,000 real conversations revealed five unanticipated categories: contractor bulk returns, gift returns, partial returns, return modifications, and compensation requests. The supervised classifier was silently mishandling all five — because the label set reflected what the design team anticipated, not what customers actually did.

Why it mattersUnsupervised learning is not a fallback for when you lack labels. It is the right tool when you do not yet know what the labels should be. Deploying a supervised system without ever running unsupervised analysis on production data means you are flying blind on everything your label set did not anticipate.

Reinforcement Learning

No labels, no fixed dataset. An agent takes actions in an environment, receives a reward signal (positive or negative), and learns a policy — a mapping from situations to actions that maximizes cumulative reward over time. The agent is not told the right action in advance; it discovers it through trial and error.

The central tradeoff is explore vs exploit: should the agent take the action that has worked best so far (exploit), or try something new that might work better (explore)? Too much exploitation means the agent gets stuck in a local optimum. Too much exploration means it never capitalizes on what it has learned.

Real examples

Starbucks Deep Brew. Starbucks uses reinforcement learning inside its mobile app to personalize drink and food recommendations for 16 million Rewards members. The agent learns which suggestions each customer is most likely to accept — based on order history, time of day, weather, and local store inventory — and updates its policy continuously from real purchase feedback. If a customer consistently orders dairy-free, the system learns to stop recommending anything with dairy without being explicitly programmed to do so. Microsoft Source — Starbucks Deep Brew →

Spotify personalization. Spotify uses reinforcement learning to optimize what it surfaces to each listener. The system learns a policy — which tracks, playlists, or podcasts to recommend in which order — from reward signals like whether a user plays, skips, or saves content. Rather than being told in advance what "good" looks like, the agent discovers it through continuous interaction with millions of listeners. Spotify Engineering — ML for personalization →

Real-time ad bidding. In programmatic advertising, the system must decide in milliseconds how much to bid for each impression. RL learns a bidding policy from the reward signal of conversions — bid too high and you overspend; bid too low and you lose the impression. The policy balances cost and outcome continuously across millions of auctions per day.

Reward hacking — promotional discounts. A retailer trains an RL agent to maximize short-term conversion rate. The agent learns that offering steep discounts on every interaction drives purchases — which is true. But customers learn to wait for discounts and stop buying at full price. The system optimized exactly what it was told to optimize. Long-term margin and brand equity were not in the reward function. This is reward hacking: technically correct behavior that violates the actual business intent.

From class — the cat feeder. Rather than fixed feeding times, an RL feeder learns Maurice's actual hunger rhythm. It tries dispensing at a new time (explore), observes whether Maurice eats (reward), and updates its policy. Over time it learns that Maurice is hungry at 6:47am, not 7:00am — something no rule could have specified in advance.

Why it mattersReward hacking is not a bug in RL — it is the system working exactly as designed. The agent optimizes for whatever you measure. In marketing: click-through rate without frequency constraints produces harassment. Engagement without content quality constraints produces outrage. The reward function is where your values have to go, not an afterthought.

→ Go deeper: Sutton & Barto — Reinforcement Learning: An Introduction, Chapter 1 (free online) — written for a general audience; the explore/exploit framing is explained clearly with no math required in the first chapter.

Reference All Resources at a Glance

Topic	Resource	Format
AI definition & framework	HBR — AI for the Real World	Article · 15 min
Philosophy of AI	IEEE Spectrum — The Turbulent Past and Uncertain Future of AI	Article · 10 min
Philosophy of AI	Turing (1950), Computing Machinery and Intelligence	Classic paper
Rule-based vs learning	HBR — Simple Economics of Machine Intelligence	Article · 10 min
Neural networks	3Blue1Brown — Neural Networks	Video · 20 min
Data annotation & bias	MIT Tech Review — Humans Behind AI's Data	Article · 12 min
Overfitting / generalization	Google ML Crash Course — Overfitting	Interactive · 15 min
Supervised learning	3Blue1Brown — Neural Networks	Video · 20 min
Supervised learning	Netflix Tech Blog — Beyond the 5 Stars (Part 2)	Article
Unsupervised learning	Spotify Engineering — Data Science	Article
Reinforcement learning	Spotify Engineering — ML for personalization	Article
Reinforcement learning	Microsoft Source — Starbucks Deep Brew	Article
Reinforcement learning	Sutton & Barto — RL: An Introduction, Ch. 1	Free online

Day 1 — Study Guide