Guozhen AIGlobal AI field notes and model intelligence

English translation

Activation function

Published:

Category: Linear Algebra for Beginners

Read time: 4 min

Reads: 0

Lesson #25Views are counted together with the original Chinese articleImages are preserved from the source page

Concept Map: The Role of Linear Algebra in Deep Learning

At its core, neural network computation still consists largely of matrix multiplications. Understanding tensor shapes, weights, and gradients transforms deep learning from mere library invocation into genuine comprehension.

Checklist Diagram: The Role of Linear Algebra in Deep Learning

I record tensor shapes layer by layer. As the number of layers increases, systematically documenting shapes is far more reliable than ad-hoc guessing.

In the previous article, we explored the applications of linear algebra in machine learning—particularly emphasizing its importance in data preprocessing and model construction. Today, we delve deeper into the role of linear algebra in deep learning, especially how it helps us understand and optimize neural networks.

Linear Algebra and Neural Networks

The fundamental building block of deep learning is the neural network—and neural networks can be expressed entirely using operations on matrices and vectors. A simple feedforward neural network learns complex functional relationships through linear transformations (e.g., matrix multiplication) followed by nonlinear activation functions (e.g., ReLU, Sigmoid).

Key-Point Judgment Card: The Role of Linear Algebra in Deep Learning

While reading this article, treat “Linear Algebra & Neural Networks → Linear Transformation → Nonlinear Activation → Backpropagation” as a checklist: first align the objects, steps, and evidence; then revisit concrete examples, code, or metrics for verification.

Linear Transformations

In a typical deep neural network, input data—usually represented as a feature vector—passes through multiple hidden layers. Each layer can be expressed as a linear transformation (matrix multiplication) plus a bias term:

z=Wx+b\mathbf{z} = \mathbf{W} \cdot \mathbf{x} + \mathbf{b}

where z\mathbf{z} is the input to the next layer, W\mathbf{W} is the weight matrix, x\mathbf{x} is the input to the current layer, and b\mathbf{b} is the bias vector.

For example, consider a network with 3 neurons in the input layer and 2 neurons in the hidden layer. This can be written as:

[z1z2]=[w11w12w13w21w22w23][x1x2x3]+[b1b2]\begin{bmatrix} z_1 \\ z_2 \end{bmatrix} = \begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix}

This operation clearly reveals the relationship between inputs and weights.

Nonlinear Activation

After computing the linear transformation, a nonlinear activation function is typically applied to enhance the model’s expressive power. This step is formalized as:

a=f(z)\mathbf{a} = f(\mathbf{z})

Here, ff denotes an activation function—such as ReLU or Sigmoid.

Backpropagation

Training neural networks in deep learning commonly relies on the backpropagation algorithm to optimize weights and biases. Backpropagation computes gradients of the loss function with respect to each weight and bias—requiring extensive matrix and vector operations rooted in linear algebra, calculus (especially derivatives), and the chain rule.

For instance, given a loss function LL, the gradient with respect to the weight matrix W\mathbf{W} is computed via the chain rule:

LW=LaazzW\frac{\partial L}{\partial \mathbf{W}} = \frac{\partial L}{\partial \mathbf{a}} \cdot \frac{\partial \mathbf{a}}{\partial \mathbf{z}} \cdot \frac{\partial \mathbf{z}}{\partial \mathbf{W}}

Each term above can be expressed and computed using matrix and vector operations.

Case Study

Consider a simple deep learning example: classifying handwritten digits (e.g., the MNIST dataset) using a three-layer neural network. Below is a basic Python implementation using NumPy.

import numpy as np

# Activation function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Forward pass
def forward(X, W1, b1, W2, b2):
    z1 = np.dot(X, W1) + b1
    a1 = sigmoid(z1)
    z2 = np.dot(a1, W2) + b2
    output = sigmoid(z2)
    return output

# Example input
np.random.seed(0)
X = np.random.rand(5, 3)  # 5 samples, 3 features each
W1 = np.random.rand(3, 4)  # Layer 1 weights: 3 → 4
b1 = np.random.rand(4)      # Layer 1 bias
W2 = np.random.rand(4, 1)  # Layer 2 weights: 4 → 1
b2 = np.random.rand(1)      # Layer 2 bias

output = forward(X, W1, b1, W2, b2)
print("Network output:\n", output)

In this example, we first generate random input data XX, then perform forward propagation using predefined weights and biases to obtain the network’s output. By iteratively adjusting W1W1, b1b1, W2W2, and b2b2, we can train the model to better classify handwritten digits.

Application Review Card: The Role of Linear Algebra in Deep Learning

When reviewing “The Role of Linear Algebra in Deep Learning,” place key concepts, procedural steps, and observable outcomes side-by-side on a single page for efficient revision.

Application Checklist Card: The Role of Linear Algebra in Deep Learning

When practicing “The Role of Linear Algebra in Deep Learning,” write input conditions, processing actions, and observable outcomes together—making future review straightforward.

Summary

Linear algebra plays a pivotal role in deep learning, primarily manifested in three ways:

Linear Algebra Reading Map Card

Before reading “The Role of Linear Algebra in Deep Learning,” use the accompanying diagram to confirm the central narrative; after reading, verify which steps you can execute directly—and identify where further study is needed.

  1. Data Representation: Inputs, weights, and outputs are naturally represented as vectors and matrices.
  2. Computational Efficiency: Matrix multiplication drastically reduces manual computational complexity, enabling scalable network architectures.
  3. Backpropagation: Efficient gradient computation via matrix operations underpins effective optimization of neural network performance.

Linear algebra provides not only essential mathematical tools—but also profound insight into the inner workings of complex deep learning models. In the next article, we will explore the application of linear algebra in state-space models, highlighting its critical role in dynamic systems.

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...