The Perception-Representation Gap
In Day 1 you learned that AI systems do not see the world — they see a structured encoding of the world. Perception is what exists in the environment. Representation is what the system actually operates on. The gap between the two is not a technical limitation. It is a design choice, made by a person, usually early in a project.
PittaRosso's environment contained 39,000 individual SKUs across 200 stores with six million loyalty members. The representation that was actually built compressed all of that into 8 product groups, one aggregated store, and a single visit-volume figure. Each compression was a deliberate decision about what to keep and what to discard. What got discarded could not be recovered by any algorithm, however sophisticated.
Deciding what the model gets to see
The process of designing a representation — choosing which variables to include, how to encode them, and at what level of granularity — is called feature engineering. It happens before any model is trained. It is an analyst's judgment call, not an output of the algorithm.
There are two distinct decisions inside feature engineering. The first is what to include: which aspects of the environment make it into the data at all. The second is how to encode it: at what level of detail, and in what form. Both decisions have consequences that compound downstream.
Representation decisions are made early, by analysts under time pressure, and they are rarely revisited once the model is in production. By the time outputs are being used to run a business, the choices that constrained those outputs are invisible. A buyer looking at a markdown recommendation has no way to know it was produced by a model that treated 200 stores as one. The number looks like a fact. It is the product of a compression.
The practical implication: before deploying any AI pricing or optimization system, the right question to ask is not "is the algorithm correct?" but "what did we leave out of the representation, and does what we left out matter for the decisions this system will make?"
Zillow Offers, 2021 — when the representation excludes what actually drives value
Between 2018 and 2021, Zillow ran an iBuying business: its algorithm would offer to purchase homes directly from sellers, then resell them for a profit. The model that produced purchase offers was trained on observable features: square footage, number of bedrooms, location, and historical sales comps in the neighborhood. These are the variables that fit neatly into a structured representation.
What the representation did not encode: local bidding dynamics (whether a market was heating or cooling in real time), the behavioral component of buyer psychology in competitive auctions, the difficulty and cost variance of renovation at scale, and the liquidity risk of holding large inventories of an illiquid asset. None of these have clean fields in a property database. They are harder to measure, so they were left out.
The algorithm produced confident offers based on its representation. In a stable market, the representation was close enough to reality. When the housing market began shifting in 2021, the gap between what the representation encoded and what actually determined resale value widened. Zillow found itself holding inventory purchased at prices its model thought were right but the market did not. The company lost roughly $881 million in 2021 alone and shut the program down entirely.
The failure was not that the algorithm was wrong given its inputs. It was that the inputs excluded the dynamics that turned out to matter most.
Zillow's model excluded behavioral dynamics: momentum, bidding psychology, and liquidity. PittaRosso's model excluded behavioral dynamics too — whether customers had learned to wait for discounts, whether brand perception was eroding, whether size fragmentation was making inventory functionally unavailable. In both cases, what was left out of the representation was not random. It was the hardest thing to measure. And it was the thing that mattered most.
Resources for This Section
Best analytical account of the representation and estimation failures. Start here before the news pieces.
Strong secondary narrative account of the failure, with useful business context. Backup if Stanford link is slow.
Primary news reporting on the shutdown and financial losses.
Practitioner framework for representation design decisions before model training. Read the first two modules for the conceptual grounding.
Model Opacity
The numbers in the Impact tab looked like facts. They were not. They were estimates produced by a statistical model running on historical transaction data from a specific period of time, under a specific set of market conditions, using a specific set of representation choices. The model that produced them was invisible at the point of use. Its assumptions were invisible. Its uncertainty was invisible. What was visible was the number, and numbers in spreadsheets look like ground truth.
Numbers that look like facts but are estimates in disguise
Every coefficient in the Impact tab was the output of a regression model. A regression model answers a specific question: given the variation in my training data, what is the estimated relationship between this input and this output? That estimated relationship has three important properties that disappear when the coefficient is frozen into a tab.
It has uncertainty. The coefficient is a point estimate from a distribution. It has a standard error. For product categories with thin historical data (old casual shoes that rarely sold), the standard error is wide. The model might print -0.20, but the true value could plausibly be anywhere from -0.05 to -0.40. That uncertainty is gone once the number is frozen.
It is regime-dependent. The coefficient was estimated from data generated under a specific set of conditions (in PittaRosso's case, a world of persistent storewide discounting where customers had learned to wait for sales). If that regime changes, the coefficient changes. The frozen number does not know the regime has changed.
It reflects what the model observed, not what is true. If sports shoes were always discounted in the training data, the model estimated a price elasticity for sports shoes in a world where they were always discounted. That is not the same as the price elasticity of sports shoes in a world where they are sometimes sold at full price. The model cannot estimate a relationship it never observed.
The sports shoe elasticity in the Impact tab was estimated as highly negative, meaning discounts drove significant volume. That estimate was accurate in the training data. But the training data was a world where sports shoes were always discounted. The model had never seen sports shoes sold without a discount. It had no basis for estimating whether sports shoes would sell at full price, because that condition never appeared in its training set. When the system recommended discounting already fast-selling sports shoes, it was applying an estimate from a regime that the new strategy was designed to leave behind. The coefficient looked authoritative. It was obsolete.
Opacity is not just a problem for the people using the model's outputs. It is a problem for the organization deploying the system. When buyers or finance teams push back on a recommendation, they often cannot articulate the specific mechanism of their skepticism because the model's internals are not visible to them. Their intuition that something is wrong is correct, but they lack the vocabulary to diagnose it. The result is that model outputs get either trusted uncritically or rejected on instinct. Neither is the right response.
The right response is to ask: under what conditions was this coefficient estimated, and do those conditions still hold? That question requires knowing the model exists and having access to its training context. Both require organizational practices that most deployments do not build in.
Explainability tools (methods that help analysts see which variables drove a model's outputs) are genuinely useful for understanding what a model learned. But they do not solve the regime dependence problem. You can have complete transparency about how a model was built and still not know that it was trained on a world that no longer exists. Explainability tells you what the model learned from its training data. It does not tell you whether its training data is still a good description of the world the system is operating in. That requires monitoring outcomes, comparing predictions to actuals over time, and having a human in place who can notice when the gap is widening. Explainability is necessary but not sufficient.
Dutch childcare benefits (Toeslagenaffaire) — when opacity becomes institutional
Between roughly 2013 and 2019, the Dutch tax authority (Belastingdienst) used a risk-scoring system to flag families for fraud in childcare benefit claims. Caseworkers acted on the scores (initiating audits, withholding benefits, and demanding repayments) without being able to see how the scores were produced. The model's logic was opaque not just to the families it affected but to the officials administering it.
A parliamentary inquiry later found that the model had systematically flagged families with dual nationality at higher rates, and that its outputs had been treated as authoritative evidence rather than probabilistic estimates. Tens of thousands of families had benefits wrongly clawed back, in some cases losing their homes. The Dutch government eventually fell over the scandal.
The mechanism is the same as the Impact tab problem, at much higher stakes. A number produced by a model became an institutional fact. The uncertainty behind the number, the assumptions it encoded, and the conditions under which it was reliable were invisible at the point of use. The people acting on the scores could not interrogate them. They trusted them because they were numbers, and numbers look like ground truth.
The scale and consequences are very different, but the structure is identical. In both cases, a model produced outputs that were treated as authoritative at the point of use. In both cases, the people acting on those outputs had no visibility into the model's assumptions or the conditions under which it was reliable. In both cases, the system's failure was not detected until real harm had accumulated: in PittaRosso's case, a margin crisis; in the Netherlands, the financial destruction of tens of thousands of families. Opacity has a cost. The cost scales with the stakes of the decisions being made.
Resources for This Section
Covers the mechanism of failure, the political consequences, and the human toll. The clearest single account of the case for students.
Accessible overview of explainability methods including SHAP and LIME. Read the introduction for conceptual grounding on what explainability can and cannot do.
Maximization, Tradeoffs, and Organizational Conflict
Every optimization system maximizes something. Choosing what to maximize is not a technical decision. It is a statement about what the organization values, and different parts of every organization value different things.
In a pricing and promotion context, there are three natural objectives, each corresponding to a different function's priorities:
Human decision-making in organizations can hold these tensions in suspension. A buyer, a finance director, and a head of operations sitting around a table can negotiate, defer, and compromise. An algorithm cannot. It maximizes what it is given. Before you deploy, you have to choose, and the choice exposes conflicts that organizational politics normally keeps latent.
Maximizing is always maximizing at the expense of something else
The PittaRosso simulation made the tradeoffs visible in a way that spreadsheet negotiation usually does not. Each pure objective produced a coherent result, but each result looked catastrophic from the perspective of the other two functions.
Maximize revenue. The algorithm recommends aggressive discounting on fast-moving categories to drive volume. Sell-through is high. Revenue grows. But margin is thin because the discounts are deep, and a significant volume of slow-moving inventory remains unsold at end of season, a write-off liability on the balance sheet. Finance looks at this and sees a brand being trained to expect discounts and a margin problem that will compound over time.
Maximize margin. The algorithm recommends restraint on discounts: protect price on categories where customers will pay. Margin per unit is preserved. But sell-through is lower, and the stale inventory problem that motivated the whole project is barely addressed. Operations looks at this and sees a warehouse problem that next season makes worse. The problem that justified the system's deployment is not being solved.
Minimize unsold inventory. The algorithm treats clearing stock as the overriding priority and recommends discounts deep enough to move everything. Sell-through approaches 100%. But the margin outcome is negative: the system recommends selling below cost to clear the most stubborn inventory. The company is not running a business; it is running a liquidation sale. Marketing and finance look at this outcome with alarm, but it is the logical result of the objective they set.
A negative margin outcome seems obviously wrong. No manager would accept it. But that reaction is itself the lesson: the result is not wrong given the objective function. It is the mathematically correct answer to the question that was asked. If you tell an algorithm to minimize unsold inventory at any cost, it will recommend selling below cost. The algorithm did not fail. The objective was underspecified. It was missing a constraint: a margin floor below which clearance is not acceptable. The question is who should have specified that constraint, and when. The answer is before deployment, as part of the objective function design process, not after the system produces an alarming output during a live season.
The difficulty is not that organizations do not know their priorities. It is that their priorities are genuinely in conflict, and an AI system forces them to quantify and rank those priorities before they are ready to do so. In a traditional buying process, the margin target is set by finance, the clearance target is set by operations, the revenue goal is set by marketing, and a human buyer exercises judgment about how to balance them in each specific decision. No one has to write down the formula. The algorithm has no judgment. It needs the formula before it can run. Getting cross-functional agreement on that formula, in specific numerical terms, is organizationally difficult in a way that "we all want the business to succeed" is not.
Adding terms does not resolve the conflict. It translates it into a different negotiation. Instead of arguing about whether to prioritize revenue or margin, stakeholders now argue about the weights. Is revenue worth 60% and margin 40%, or the reverse? Is a unit of unsold inventory penalized at 50% of cost or 100%? These are the same underlying disagreements, reformulated as numbers. The advantage of making them explicit is that the tradeoffs become visible and testable: you can run the simulation under different weight combinations and see what each implies. The disadvantage is that organizations often cannot agree on weights any more easily than they can agree on priorities, and now they have to do it on a deadline.
Staples dynamic pricing, 2012 — when one function's objective becomes the algorithm
In 2012, ProPublica investigated Staples' online pricing algorithm and found that it offered different prices to customers based on their estimated proximity to a competitor store. Customers near a competitor saw lower prices; customers in areas with no nearby Staples competitor saw higher prices.
The algorithm was optimizing for competitive margin (finance's objective). It was doing this correctly and efficiently. But the output of that optimization was that customers in areas with fewer nearby competitors (which often correlated with lower-income zip codes) were systematically charged more for the same products. Marketing's objective (customer equity and brand trust) had no field in the objective function. Operations had signed off on the algorithm because it managed margin. No one had encoded the brand and fairness implications into the system before it ran.
The algorithm made that strategy visible at scale, applying it consistently across millions of transactions in a way that human pricing decisions never would have. The result was public scrutiny, regulatory attention, and reputational damage that the margin gains did not offset.
Staples' pricing team, finance team, and marketing team almost certainly had different views on whether location-based price discrimination was acceptable. Those views were never formally reconciled because the decision had previously been made incrementally by individual pricing managers, each exercising judgment in specific contexts. The algorithm removed that incremental judgment and applied a single consistent policy at scale, making the conflict between functions impossible to ignore. AI systems do not create organizational conflicts. They surface and operationalize them at a scale and visibility that makes them impossible to paper over.
Resources for This Section
Primary reporting on the mechanism and discovery of the Staples pricing disparity. The clearest account of how the objective function encoded one function's view at the expense of others.
The foundational principle: when a measure becomes a target, it ceases to be a good measure. Directly relevant to why optimizing E12 can diverge from what the business actually needs.
Practitioner framework for evaluating AI system reliability before and during deployment, including questions about objective function validity.
The Behavioral Loop — Is This Really AI?
The Day 1 definition of AI requires a closed perception-action loop: the system perceives the environment, acts on it, observes the consequences of its action, and updates. Each cycle feeds the next. The system learns from the world it is operating in, not just the world it was trained on.
By that definition, the PittaRosso system is not an AI system. It is a decision-support tool. The distinction is not semantic. It determines what the system needs from the organization to function, and what happens when those organizational supports are not available.
What would need to change for PittaRosso's system to be AI
Three things would need to be added to close the loop.
1. Automated action. Currently, the system produces a recommendation. A human reads it, decides whether to trust it, and implements prices and promotions manually. Closing the loop requires the system to act directly (updating prices in the e-commerce platform, triggering promotional campaigns) within approved constraints, without a human approving each change. This is not just a technical addition. It requires the organization to agree that the system's judgment is reliable enough to act without human review.
2. Closed feedback loop. Currently, when a recommendation is implemented and outcomes are observed, nothing feeds back to the demand estimation model automatically. A human (specifically Labate after Crisis 1) has to notice that predictions and outcomes have diverged, diagnose why, and manually update the Impact tab. In a closed-loop system, actual sales outcomes would automatically re-estimate the demand coefficients, so the model updates continuously from the consequences of its own recommendations.
3. Online learning. Currently, the demand estimation model is a snapshot: it ran once on historical data, produced coefficients, and was frozen. Online learning means the model re-runs as new data arrives, continuously, so that a change in customer behavior is detected and incorporated within days rather than discovered as a crisis weeks later.
After Crisis 1 (when sports shoes were selling far faster than the model predicted), Valentina Labate manually edited the demand estimation coefficients in the Impact tab to reflect sub-category differences in sales speed. She was doing by hand exactly what a closed-loop system would do automatically: observing that the model's predictions diverged from reality and updating the parameters accordingly. The fact that this required a specific person, with specific knowledge of both the business and the model's internals, noticing a specific problem and taking a specific manual action is the definition of a broken loop. In a full AI system, that loop closes itself.
What happens when the humans in the loop disappear
Decision-support tools require human judgment to function. The humans who provide that judgment develop skills and maintain expertise through use. When a system takes over more and more of the decision-making, the human expertise that would be needed to catch the system's errors, override its recommendations, and operate without it begins to atrophy.
PittaRosso's buying team was progressively sidelined as the system took over pricing and promotional decisions. Their manual expertise in reading inventory, interpreting sell-through patterns, and making seasonal pricing calls atrophied from disuse. On February 20, 2020, all stores closed due to COVID-19. The system was halted. As of late 2022 it had not been restarted.
COVID is an extreme case of distribution shift: the model was trained on a world where stores were open, customers browsed seasonally, and demand followed recognizable patterns. None of those conditions held after February 2020. But the organizational consequence (the atrophy of the human capability the system replaced) is not extreme. It is a predictable outcome of any successful deployment. The more an AI system works, the more the organization comes to depend on it and the less it maintains the fallback capabilities the system displaced.
A decision-support tool and a true AI system have different organizational requirements. A decision-support tool needs humans who understand the model well enough to catch its errors, people like Labate who can notice that predictions have diverged from reality and know how to fix it. When those humans are not available or their expertise has faded, the tool produces stale outputs that nobody questions because the numbers still look authoritative. A true AI system, by contrast, degrades more gracefully: its feedback loop continues to update from whatever data is available, and its errors are self-correcting within the scope of its design.
The implication for deployment decisions: if you build a decision-support system and call it AI, you will underinvest in the human expertise needed to maintain it, and you will be unprepared when it fails. Knowing precisely what kind of system you have built is not pedantry. It is risk management.
The outputs may look the same in normal operating conditions. The difference shows up in three specific situations. First, when the model drifts: when the world changes and the model's predictions become unreliable. A closed-loop system detects this automatically through the feedback loop. A decision-support tool requires a human to notice, which may happen weeks late or not at all. Second, when the human experts are unavailable: through turnover, organizational change, or a global pandemic. A closed-loop system continues to function. A decision-support tool produces outputs without the human oversight that makes those outputs trustworthy. Third, when the system needs to act at a speed or scale that human review cannot support. Millions of pricing decisions per day cannot go through human approval. A decision-support tool is not the right architecture for that problem. Knowing which you have determines how you build the surrounding organization, what expertise you maintain, and how you plan for failure.
Zara and Starbucks — what closed loops look like in retail
Zara's replenishment system receives daily point-of-sale data from every store globally. That data automatically triggers production and distribution signals: which items to restock, in what quantities, and to which stores. Most replenishment decisions are system-driven, with human oversight at higher levels. When a style sells faster than expected in Barcelona, the system perceives that signal and acts on it within days, without waiting for a buyer to notice and escalate. The perception-action loop closes continuously. The system is not making recommendations about replenishment; it is doing replenishment. This is structurally what PittaRosso's system would need to become to cross from decision-support into AI, and the contrast between Zara's operational capability and PittaRosso's manual implementation of recommendations is partly a story about the investment required to close the loop.
Starbucks Deep Brew (discussed in Day 1 in the context of reinforcement learning–style feedback and continuous personalization) provides the marketing-specific version of the same idea. The system updates drink and food recommendations for Rewards members continuously from purchase feedback. When a customer's behavior changes (they stop ordering dairy, they shift toward cold brew in summer), the system perceives that change through the reward signal and updates its policy. No one manually edits a customer preference file. The loop closes itself. The expertise that makes this possible is not in a person who monitors the model; it is in the architecture.
Both made specific architectural decisions: invest in closing the feedback loop, automate action within approved constraints, and build systems that update from outcomes rather than requiring humans to notice when something is wrong. Those decisions required organizational commitment and technical investment. But the alternative (a decision-support system that requires a Labate to function) is not free either. Its costs just show up later, when the system drifts and the expert who knew how to fix it is no longer available.
Resources for This Section
Primary source on the Deep Brew personalization system and its feedback loop architecture. More stable than the customer case page.
Operational detail on how Zara's data infrastructure and replenishment loop work in practice. More concrete than the Inditex corporate page.