Guozhen AIGlobal AI field notes and model intelligence

English translation

Assume we have a pre-built character vocabulary and training data

Published:

Category: Neural Networks

Read time: 4 min

Reads: 0

Lesson #19Views are counted together with the original Chinese articleImages are preserved from the source page

RNN Transformation Mechanism Architecture Diagram

RNNs unroll sequences step-by-step over time and maintain contextual information via hidden states. To understand them, first clearly map how data flows at each time step. This article focuses on structure: begin by sketching the data flow, key modules, and output layer—then revisit formulas or code.

RNN Transformation Mechanism Hands-on Verification Diagram

I’ll verify the ordering of three dimensions: batch, time step, and feature. Incorrect dimension ordering is common in sequence modeling.

In the previous article, we thoroughly explored practical applications of Convolutional Neural Networks (CNNs), covering implementation workflows for tasks such as image classification and object detection. In this section, we shift focus to the transformation mechanism of Recurrent Neural Networks (RNNs) and examine how they process sequential data.

Fundamental Principles of RNNs

A Recurrent Neural Network (RNN) is a neural network architecture specifically designed for sequential data. Unlike traditional feedforward networks, RNNs possess hidden states that retain and leverage information from prior time steps, enabling dynamic state updates. This property makes RNNs particularly effective for time-series data—including text, speech, and video.

At any given time step tt, the hidden state hth_t depends not only on the current input xtx_t, but also on the previous hidden state ht1h_{t-1}. The core recurrence relation is expressed as:

ht=f(Whht1+Wxxt)h_t = f(W_h h_{t-1} + W_x x_t)

where WhW_h and WxW_x are weight matrices for the hidden state and input respectively, and ff denotes an activation function—commonly tanh or ReLU.

The Transformation Mechanism of RNNs

In the RNN transformation mechanism, an input sequence is fed into the network incrementally. At each step, the hidden state update incorporates both the current input and accumulated historical information. This enables RNNs to “remember” and “forget” information across the temporal dimension. However, standard RNNs suffer from vanishing or exploding gradients when learning long sequences.

To address this limitation, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures were introduced. Both employ gating mechanisms to selectively control information storage and forgetting—effectively resolving long-range dependency challenges.

The Gating Mechanism in LSTMs

LSTMs refine memory flow using three distinct gates: the forget gate, input gate, and output gate. Their core state-update equations are as follows:

  • Forget gate: determines which information to discard from the cell state
ft=σ(Wf[ht1,xt])f_t = \sigma(W_f \cdot [h_{t-1}, x_t])
  • Input gate: determines which new information to store in the cell state
it=σ(Wi[ht1,xt])i_t = \sigma(W_i \cdot [h_{t-1}, x_t])
  • Output gate: determines which part of the cell state to output
ot=σ(Wo[ht1,xt])o_t = \sigma(W_o \cdot [h_{t-1}, x_t])
  • Candidate cell state update: computes a candidate value for the cell state
Ct~=tanh(Wc[ht1,xt])\tilde{C_t} = \tanh(W_c \cdot [h_{t-1}, x_t])
  • Final cell state and hidden state update:
Ct=ftCt1+itCt~C_t = f_t \ast C_{t-1} + i_t \ast \tilde{C_t} ht=ottanh(Ct)h_t = o_t \ast \tanh(C_t)

Through these equations, LSTMs effectively handle long-range dependencies while selectively preserving relevant information at each time step.

Application of the Transformation Mechanism in Practice

The RNN transformation mechanism finds broad application across domains. Below is a concrete example: text generation using LSTM.

RNN Transformation Mechanism Conceptual Check Card

When grasping the RNN transformation mechanism, first consider: input sequence, hidden state, weight sharing, time-step updates, gradient propagation, and the long-term dependency problem.

Text Generation Example

Suppose we have a short text string and wish to train an LSTM model to generate new text.

import numpy as np
import tensorflow as tf

# Assume we have a pre-built character vocabulary and training data
char_to_idx = {'a': 0, 'b': 1, 'c': 2}  # Example vocabulary mapping
idx_to_char = {i: char for char, i in char_to_idx.items()}
text = "abcabcabc"

# Hyperparameters
seq_length = 3
vocab_size = len(char_to_idx)
embedding_dim = 256
hidden_units = 128

# Data preprocessing
inputs = []
targets = []

for i in range(len(text) - seq_length):
    inputs.append([char_to_idx[char] for char in text[i:i + seq_length]])
    targets.append(char_to_idx[text[i + seq_length]])

# Convert to TensorFlow tensors
inputs = tf.convert_to_tensor(inputs)
targets = tf.convert_to_tensor(targets)

# Define LSTM model
model = tf.keras.models.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[None, None]),
    tf.keras.layers.LSTM(hidden_units,
                          return_sequences=False,
                          recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
])

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam')

# Train the model
model.fit(inputs, targets, epochs=100)

# Text generation function
def generate_text(model, start_string, num_generate=10):
    input_eval = [char_to_idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    text_generated = []

    # Low-temperature sampling for more deterministic output
    temperature = 1.0
    model.reset_states()

    for i in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0) / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx_to_char[predicted_id])

    return start_string + ''.join(text_generated)

# Generate text
print(generate_text(model, start_string="ab", num_generate=10))

In this simplified example, the learned LSTM transformation mechanism enables the model to generate plausible continuations—starting from the prefix "ab".

RNN Transformation Mechanism Application Recap Card

If you haven’t fully internalized “RNN Transformation Mechanism”, revisit the four actions outlined on this card to reinforce understanding.

RNN Transformation Mechanism Application Verification Card

When reviewing “RNN Transformation Mechanism”, avoid jumping straight into large-scale projects. Instead, first validate your conceptual grasp using a minimal, working example.

Summary

This article provided an in-depth analysis of RNNs and their transformation mechanisms—with particular emphasis on LSTM architecture and usage. RNNs’ unique ability to process sequential information underpins their strong performance in tasks like sequence generation and sentiment analysis. In the next article, we will explore concrete real-world applications of RNNs, further illuminating their pivotal role in modern deep learning.

Neural Network Reading Roadmap Card

Before reading “RNN Transformation Mechanism”, use the accompanying diagram to confirm your understanding of the central narrative. After reading, reflect: which steps can you implement directly? Which concepts still require supplemental study?

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...