How AI Works

January 7, 2026

In 2015, Pedro Domingos published The Master Algorithm, which remains one of the best guides to machine learning fundamentals. The book explores distinct "tribes" of machine learning, each with its own philosophy about how machines should learn. Domingos speculated about whether these tribes could ever be unified into a single, perfect learner—painting a picture of this quest as almost mythical, a Holy Grail that computer scientists might chase forever.

And then it basically happened.

Modern AI, particularly large language models, has delivered something close to that master algorithm. Not completely, but in a way that would have seemed impossible when Domingos was writing. The tools we use every day now combine elements from multiple tribes in ways that were purely theoretical a decade ago.

Understanding these different classes of learners is a key part of developing real AI expertise. Not all AI works the same way, and if you want to properly use a tool or understand how it works, you need to know something about the mechanics behind it.

Connectionists learn like a brain. Neural networks strengthen connections based on feedback, adjusting millions of internal settings to better recognize patterns in data. This is what powers Google's search results, Meta's news feed, and most image recognition. The key insight is that the machine doesn't need explicit rules; instead , it discovers patterns from data.

Connectionist architecture is the foundation of large language models like ChatGPT. These systems are built on neural networks with billions of parameters, organized in layers. During training, the network is exposed to massive amounts of text. When it makes a prediction and gets feedback on whether that prediction was right or wrong, it adjusts its internal weights through a process called backpropagation. Over time, these adjustments allow the network to recognize increasingly complex patterns in language.

Think of the connectionist layer as the memory and pattern-recognition engine. It's the structure that allows an LLM to "know" that certain words tend to follow other words, that certain phrasings are more common than others, and that certain concepts are related. The network doesn't store explicit rules. It stores patterns, encoded in the strength of connections between nodes.

The downside of connectionist systems is that it's often hard to explain why they learned what they learned. Ask a neural network why it made a particular decision and you'll get the computational equivalent of a blank stare. This lack of interpretability is one of the core tensions in modern AI.

Bayesians learn like statisticians. They start with a belief about how likely something is, then update that belief as new evidence arrives.

A classic example is using a model to predict whether the sun will rise tomorrow. If you have no historical data whatsoever, and there are only two possible outcomes (it rises or it doesn't), a statistician has to assume both outcomes have equal likelihood. That's your starting point: 50/50. But after one sunrise, you update. After ten, you update again. After thousands of consecutive sunrises, your confidence that the sun will rise tomorrow approaches certainty, though it never quite reaches 100%. Each new data point refines the probability.

This is the foundation of much ad platform AI. Google and Meta's automated bidding systems use Bayesian principles to get smarter over time. Every conversion you track feeds back into the system, reinforcing or adjusting its hypotheses about which users are likely to convert. More conversion data means stronger predictions. This is why these platforms push advertisers toward conversion-based bidding strategies, and why campaigns often improve as they accumulate data.

Bayesian approaches are also why marketing measurement is becoming more accessible. Traditional statistical methods often required massive sample sizes to draw confident conclusions. Bayesian methods reduce the number of trials and data points you need to reach "confident enough" conclusions. This is making tools like incrementality testing and marketing mix modeling viable for smaller advertisers who couldn't previously afford them.

Symbolists learn like philosophers. They deduce rules from examples using logic. If Socrates is human, and all humans are mortal, then Socrates is mortal. Once the machine learns that rule, it can apply it elsewhere. Decision trees fall into this category.

Eve, the robot scientist developed at the University of Manchester, used symbolist methods. In 2015, Eve discovered a link between triclosan, an antifungal ingredient commonly found in toothpaste, and its potential to fight malaria. The system worked by deducing logical connections between compounds and biological effects, then testing those hypotheses systematically. Unlike connectionist neural networks, symbolist systems are good at explaining their reasoning, but they need clean, structured inputs to work well.

Analogizers learn like lazy students. They find the closest match to a new problem and assume the answer is the same. This is often sufficient and helpful, but obviously prone to bias.

Imagine a doctor diagnosing a new patient. An analogy-based system would search its database for patients with the most similar symptoms and guess that the diagnosis will match. This sounds naive, but with enough data, it works remarkably well.

Meta’s Lookalike Audiences are a classic example of analogizer logic in action. You give the platform a list of your best customers. The system finds other users who "look like" those customers across hundreds of behavioral and demographic signals. Then it assumes those similar users will behave similarly. It's not deducing rules or updating probabilities. It's just finding the nearest match and betting the pattern holds.

This approach is susceptible to survivorship bias. The learner will only be as good as the data you feed it, and it won't understand what might be possible with a different dataset. If your ad campaigns were targeting the wrong audience and not attracting the highest-quality customers aligned with your long-term strategy, then the lookalikes will reflect those same limitations.

Savvy marketers are considerate of the conversion data they are feeding into any learning model, analogist or otherwise. Imagine a retailer of women's dresses uploading a dataset of last year's customers into Meta to build a lookalike audience. However, 70% of their dresses during that time frame were returned, which is surprisingly common in that category. Meta's analogist algorithms would overindex on individuals likely to buy many dresses, showing high conversion rates and average order values. But the data would be misleading, and therefore unintentionally generate target audiences who are even more likely to buy and return dresses, compounding the original problem and reducing profitability. The marketing team would likely benefit from scrubbing all orders that resulted in returns before building the audience, which will teach the system to “point at” higher quality (and lower headache) customer cohorts.

How the Tribes Came Together

In 2017, a research team at Google published a paper called "Attention Is All You Need." It laid the foundation for the Transformer architecture, which would eventually become the backbone of modern large language models.

Before the Transformer, AI systems used Recurrent Neural Networks (RNNs) that processed words sequentially, one at a time. This was slow, and it made it difficult for models to understand relationships between words that were far apart in a sentence. The Transformer changed this in three important ways:

First, it could process entire sequences of data at once, which allowed for dramatically faster training. Second, it introduced a mechanism called self-attention, which allows the model to weigh the importance of every word in a sentence simultaneously. And third, it enabled the creation of models with hundreds of billions of parameters, a scale that was previously impractical.

What's interesting is how the Transformer architecture combines multiple classes of learners.

The underlying structure is connectionist. The Transformer is built on neural networks that learn through backpropagation, adjusting billions of weights based on feedback during training. This connectionist foundation is what allows the model to ingest massive amounts of information and develop something like long-term memory. The patterns it learns are encoded in the connections between nodes.

It uses analogist principles to understand context. When you write a prompt, the model isn't parsing your words through rigid grammatical rules. It's finding patterns similar to what it's seen before and using that context to interpret what you actually mean. This is what allows non-technical people to interact with it using natural language. The self-attention mechanism is essentially asking: "What other parts of this input are most relevant to understanding this particular word?" That's analogist thinking at scale.

And it relies on Bayesian probability to generate responses. When an LLM produces text, it's calculating the probability of the next token in a sequence, then the next, then the next. Given everything that came before, what word is most likely to come next? This is why early versions of ChatGPT were bad at math. When you asked "what is 2+2?" the model wasn't computing the answer. It was using Bayesian probability to predict that when a string of characters looks like "2+2=", the next most likely character is "4." The connectionist architecture stored the patterns; the Bayesian mechanism selected the output.

This is an oversimplified explanation of how LLMs work. The actual systems are far more complex. But the point is that the underlying technologies have existed for decades. It took until 2017 for engineers to creatively determine how to combine these approaches in a way that achieved something Domingos had speculated was nearly impossible just two years earlier.

Deterministic vs. Probabilistic Computing

There's another distinction that matters as much as understanding the different tribes of machine learning: the difference between deterministic and probabilistic computing.

Deterministic systems follow strict rules and deliver precise outcomes every time. A calculator is deterministic. When you enter 2+2, you get 4. Always. When you click "Add to Cart" on an e-commerce site, the JavaScript pulls the product information, updates the data layer, and calculates your total. There's no prediction involved. It's exact. If something is off, it's because something broke.

Probabilistic systems work differently. Instead of following fixed rules, they use patterns to generate likely responses without knowing whether those responses are correct. Most modern AI falls into this category. This is by design, and occasional hallucinations are a feature, not a bug.

Sam Tomlinson describes this well: "LLMs generate responses using probabilistic models, not deterministic models. These output a 'correct' or 'right' answer only in a probabilistic sense, not a binary sense. An LLM will give you a response that matches the pattern of a correct response. Whether or not that response is actually correct, or helpful, is a different matter entirely."

This distinction explains a lot of the frustration people have with AI tools. They approach these systems with deterministic expectations, hoping for perfect, binary outcomes, and then feel disappointed when the results seem random or incorrect. But that's not a flaw in the system. It's the nature of probabilistic computing.

Think back to the GitHub Copilot example from the previous chapter. Python code is either correct or incorrect. There's no middle ground. If the code is missing a semicolon, or if the AI writes the entire block in gibberish, it's wrong. The degree of wrongness shouldn't matter. But it's human nature to feel more frustrated when the AI produces a paragraph of nonsense than when it misplaces a semicolon. That frustration causes many people to conclude the tool doesn't work. They stop using it, stop working with it, and fall behind because they've missed the point.

There's an entire genre of social media content dedicated to pointing out wrong answers from AI tools. These posts are meant to be comical, or validating, or comforting to those who feel insecure about this technology and what it might mean for their personal or professional lives. Everyone posting and engaging with this content is missing the point.

Instead of focusing on how wrong an AI seems, it’s helpful to consider how helpful the correct answers can be to your own productivity, even if it’s not 100% of the time. The goal isn't to delegate entirely to the system. The goal is to work alongside it.

Why This Matters for Marketing

The same dynamic plays out in advertising. Many marketers have abandoned AI-driven campaign settings after seeing what look like "kooky" results. Broad Match targeting might show your ads for search terms that seem completely irrelevant. Your first instinct might be to conclude the system is broken.

But those odd results don't necessarily mean the system is failing. They could be part of the learning process. The success of AI-driven advertising doesn't hinge on every individual search term being relevant. It hinges on whether the overall system is delivering conversions at the efficiency you need. A few strange queries here and there shouldn't cause panic. The system is designed to learn over time, identifying which auctions drive quality traffic and which don't.

And even a well-optimized Google Ads account will still bid into many unrelated and seemingly kooky auctions from time to time. Again: this is a feature, not a bug.

This might explain why platforms like Google have significantly reduced advertiser access to certain data, like detailed search term reports. Part of this shift is likely cost-driven; it's expensive to process, store, and present that data to millions of advertisers. But another likely reason, though Google would never admit this publicly, is that advertisers can't always be trusted to interpret the data properly. They see odd results, assume the system is flawed, and make reactive decisions that actually hinder performance.

For marketers who've spent years working with PPC systems, this is a difficult adjustment. We want rich, insightful data at our fingertips. But we also need to accept that AI is probabilistic by nature. It will give us incorrect results sometimes, and that's not a bug. It's how these systems work. No AI will ever be 100% correct. Understanding this means shifting expectations from perfection to partnership.

The Shift From Deterministic to Probabilistic Advertising

Meta's evolution over the past few years is a clear example of this shift.

In the early days, Meta's advertising platform was largely deterministic. You could target users based on explicit data points like clicks, interests, and demographics with high accuracy. The system tracked user behavior across websites and apps, feeding that data back into algorithms that delivered precise targeting. Lookalike audiences worked on a simple deterministic principle: find users who look like this customer who converted.

Then came GDPR and Apple's App Tracking Transparency. Meta lost access to much of the granular user data it had relied on. Deterministic models depend on consistent, reliable data, and that data was suddenly unavailable.

Meta was forced to pivot toward probabilistic models. Instead of relying on exact data, the algorithms now make educated guesses about user behavior based on patterns and inferences. Rather than saying definitively "this user clicked an ad and then made a purchase," Meta estimates the likelihood that a user who saw an ad converted based on aggregate trends.

Ben Thompson, writing in Stratechery, explained the shift:

"Start with the News Feed: when Facebook was only ever pulling content from your social graph and brands you followed, the amount of content available was finite. That meant that the algorithm could be deterministic... Pulling content from anywhere on the Facebook or Instagram networks, though, is a fundamentally different problem, that requires fundamentally different approaches; these approaches are probabilistic in nature and built on machine learning."

Thompson also noted that this shift was deepening the moat around Google and Meta. As explicit data becomes harder to access and deterministic advertising becomes less viable, the ability to build better probabilistic models becomes the competitive advantage. That capability depends on the engineers you can hire and the infrastructure you can afford to build, which means the biggest players have the advantage.

For advertisers, this transition has tradeoffs. Probabilistic models allow Meta to maintain ad performance even with reduced access to individual-level data. But it also means less visibility and control over how ads are served and measured. If you're accustomed to deterministic systems where you could track every interaction, this can feel frustrating.

But frustrating or not, this is the reality. The platforms have moved to probabilistic models, and marketers who understand how to work within that system will outperform those who keep expecting deterministic precision from tools that aren't designed to provide it.

Reasoning Models vs. Standard LLMs

Reasoning is the fundamental skill behind problem-solving. It's the ability to break a problem into parts, work through each part logically, and build toward a conclusion where each step depends on the one before it. When you solve a math problem by showing your work, or build an argument by establishing premises before reaching a conclusion, you're reasoning. Each piece builds on the last in a sequence that can be followed and verified.

Standard LLMs like GPT-4 don't reason in this sense. They're fundamentally pattern-matching systems. They predict the next token in a sequence based on what they've seen in their training data. They're fast and fluent, capable of generating text that sounds authoritative and coherent. But they're not actually thinking through problems. They're producing outputs that match the pattern of what a thoughtful response looks like.

This works remarkably well for many tasks. If you ask a standard LLM to write an email, summarize a document, or brainstorm ideas, the pattern-matching approach delivers. The model has seen millions of emails, summaries, and brainstorms. It knows what those outputs are supposed to look like.

But pattern matching has limits. When you ask a standard LLM to work through a multi-step logic problem, it often stumbles. It might produce an answer that sounds confident but skips steps or makes errors that a human working carefully through the problem would catch. The model isn't reasoning—It's guessing what a reasoned answer would look like based on patterns.

Reasoning models attempt to address this. They're designed to work through problems step by step before producing a final answer, more like how a human might approach a logic puzzle or math problem. The model essentially "thinks out loud" internally, breaking the problem into pieces and working through each one before committing to a response.

To quote Albert Einstein: “If I had an hour to solve a problem, I'd spend 55 minutes thinking about the problem and five minutes thinking about solutions.” Reasoning models attempt to think about the problem before it gets to work.

This makes reasoning models slower and more expensive to run. But for complex, multi-step problems where precision matters, they tend to perform better than standard LLMs.

Neither type is "smarter" in an absolute sense. They're different tools for different jobs. A standard LLM is often the better choice for tasks that require speed, fluency, and creative generation. A reasoning model is often better for tasks that require careful, sequential logic.

Conversion Rate Optimization Transforms SaaS Content Into Profitable Advertising Collateral

read case study

TurnPoint Services Revenue Soars with AdVenture's PE Playbook

read case study

Implementation of Google’s Latest Targeting Features Increases a Telehealth Company’s YoY New Users by 96%

read case study

Request A Marketing Proposal

We'll get back to you within a day to schedule a quick strategy call. We can also communicate over email if that's easier for you.

Visit Us

New York
1074 Broadway
Woodmere, NY

Philadelphia
1429 Walnut Street
Philadelphia, PA

Florida
433 Plaza Real
Boca Raton, FL

General Inquiries

info@adventureppc.com
(516) 218-3722

AdVenture Education

Over 300,000 marketers from around the world have leveled up their skillset with AdVenture premium and free resources. Whether you're a CMO or a new student of digital marketing, there's something here for you.

OUR BOOK

We wrote the #1 bestselling book on performance advertising

Named one of the most important advertising books of all time.

buy on amazon

OUR EVENT

DOLAH '24.
Stream Now.

Over ten hours of lectures and workshops from our DOLAH Conference, themed: "Marketing Solutions for the AI Revolution"

check out dolah

The AdVenture Academy

Resources, guides, and courses for digital marketers, CMOs, and students. Brought to you by the agency chosen by Google to train Google's top Premier Partner Agencies.