Why Linear Algebra Is the Secret Language of AI

Here's something wild: when your phone's camera recognizes your face, it's not actually seeing you. It's doing math. Really specific, really fast math that involves concepts you might've slept through in college.

Fangfang Lee from IBM Technology just dropped an explainer that actually makes this make sense, and honestly? It's kind of beautiful how this all works. The gap between "computer processes image" and "computer understands what's in the image" isn't magic—it's linear algebra, and understanding even the basics changes how you think about AI.

The Translation Problem

Computers don't experience the world like we do. They can't look at a photo of your dog and think "good boy." They need everything converted into numbers first. As Lee explains it: "Computers cannot process images, text, audios, or videos directly like humans. Instead, we need to translate these inputs into a language they can understand, mathematics."

That's where linear algebra comes in. It's the Rosetta Stone between human-readable data and machine-processable information. Every image, every word, every sound—it all gets transformed into mathematical objects that computers can actually work with.

The process is called vectorization, which sounds intimidating but is basically just organized number storage. Your vacation photo? It becomes a matrix where each pixel gets a number representing its color intensity. A sentence? It turns into a vector that somehow captures what the words mean, not just what letters they contain.

The Building Blocks

Lee breaks down four fundamental types of mathematical objects that make this all possible:

Scalars are just single numbers. Think of them as the atoms of this whole system—5, 2.6, π. Single points in space.

Vectors are one-dimensional lists of numbers, like [2, 3, 4]. They're how you represent simple sequences or directions.

Matrices level up to two dimensions—rows and columns. This is where things get interesting because you can represent entire images as matrices. Each cell in the grid corresponds to a pixel's brightness or color.

Tensors go full sci-fi, handling three or more dimensions. They're the heavy lifters in frameworks like TensorFlow (the name isn't subtle). When you're processing video or working with massive language models, you're dealing with tensors.

What matters isn't memorizing these definitions—it's understanding that this hierarchy exists to handle increasingly complex data. A single temperature reading? Scalar. A sentence? Vector. An image? Matrix. A video? Tensor.

Measuring Similarity (The Interesting Part)

Once everything's converted to numbers, the real question becomes: how do you compare things? How does a recommendation algorithm know two movies are similar, or a search engine know which results match your query?

Two methods dominate: Euclidean distance and cosine similarity.

Euclidean distance is straightforward—it measures the literal distance between two vectors in space. You calculate the difference between each corresponding dimension, square them, sum them up, and take the square root. It's the Pythagorean theorem on steroids. The output is unbounded, which means it can be any positive number.

Cosine similarity takes a different approach. Instead of measuring distance, it measures the angle between two vectors. Lee explains: "The closer the two vectors are in their semantic meaning the smaller the angles will be." The output ranges from -1 to 1, which makes it standardized and easier to interpret.

When cosine similarity equals 1, the vectors point in the exact same direction—they're basically identical in meaning. Zero means they're perpendicular, representing completely independent features. Negative 1 means they point in opposite directions.

That perpendicular thing is actually significant. "In machine learning, when two vectors are perpendicular, it means that the features that they represent are completely independent of one another," Lee notes. That's not just math trivia—it tells you whether features in your data actually relate to each other or not.

The Efficiency Hack

Here's where it gets practical: modern AI models work with insane amounts of data. Training large language models involves billions of tokens, and doing full-dimensional calculations on all of that would be computationally ridiculous.

Enter Singular Value Decomposition (SVD), which Lee describes as "not only elegant but extremely versatile." SVD takes one massive matrix and breaks it down into three smaller, more manageable matrices that can be reconstructed back into the original.

The example Lee uses is perfect: imagine a matrix representing user ratings of movies. Rows are users, columns are movies, and each cell contains a rating. SVD splits this into three matrices: U (capturing user behavior patterns), Sigma (a diagonal matrix indicating which features matter most), and V-transposed (capturing movie characteristics).

The genius move? "Using SVD algorithm, we can select and only retain the most informative features based on the singular values and disregard unhelpful information." You're essentially compressing the data while keeping what matters. It's like converting a RAW photo to JPEG—you lose some information, but you keep the important stuff and save massive amounts of storage and processing power.

Why This Actually Matters

If you're shopping for AI-powered devices or trying to understand what's actually happening when ChatGPT responds to your prompts, this isn't academic. These operations—matrix multiplication, dot products, dimensionality reduction—are happening thousands of times per second.

When your phone processes your voice command locally instead of sending it to the cloud, it's running optimized matrix operations on a chip designed specifically for this math. When a recommendation algorithm suggests your next binge-watch, it's calculating cosine similarity between your viewing history vector and millions of other options.

The frameworks Lee mentions—PyTorch, TensorFlow, Keras—are all built to make these linear algebra operations fast and efficient. They've abstracted away a lot of the complexity, but understanding what's happening under the hood helps you make better decisions about which AI tools to trust and which claims to be skeptical of.

Lee puts it plainly at the end: linear algebra "converts data into mathematical form, computations into organized structure, and structure into actionable intelligence." That's the pipeline. That's how the magic trick works.

And honestly? Once you see it, you can't unsee it. Every AI feature, every smart recommendation, every image recognition system—it's all just extremely organized math, doing what math does best: finding patterns humans would never spot on their own.

— Tyler Nakamura, Consumer Tech & Gadgets Correspondent