English translation
Assume we have a pre-built character vocabulary and training data
RNNs unroll sequences step-by-step over time and maintain contextual information via hidden states. To understand them, first clearly map how data flows at each time step. This article focuses on structure: begin by sketching the data flow, key modules, and output layer—then revisit formulas or code.
I’ll verify the ordering of three dimensions: batch, time step, and feature. Incorrect dimension ordering is common in sequence modeling.
In the previous article, we thoroughly explored practical applications of Convolutional Neural Networks (CNNs), covering implementation workflows for tasks such as image classification and object detection. In this section, we shift focus to the transformation mechanism of Recurrent Neural Networks (RNNs) and examine how they process sequential data.
Fundamental Principles of RNNs
A Recurrent Neural Network (RNN) is a neural network architecture specifically designed for sequential data. Unlike traditional feedforward networks, RNNs possess hidden states that retain and leverage information from prior time steps, enabling dynamic state updates. This property makes RNNs particularly effective for time-series data—including text, speech, and video.
At any given time step , the hidden state depends not only on the current input , but also on the previous hidden state . The core recurrence relation is expressed as:
where and are weight matrices for the hidden state and input respectively, and denotes an activation function—commonly tanh or ReLU.
The Transformation Mechanism of RNNs
In the RNN transformation mechanism, an input sequence is fed into the network incrementally. At each step, the hidden state update incorporates both the current input and accumulated historical information. This enables RNNs to “remember” and “forget” information across the temporal dimension. However, standard RNNs suffer from vanishing or exploding gradients when learning long sequences.
To address this limitation, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures were introduced. Both employ gating mechanisms to selectively control information storage and forgetting—effectively resolving long-range dependency challenges.
The Gating Mechanism in LSTMs
LSTMs refine memory flow using three distinct gates: the forget gate, input gate, and output gate. Their core state-update equations are as follows:
- Forget gate: determines which information to discard from the cell state
- Input gate: determines which new information to store in the cell state
- Output gate: determines which part of the cell state to output
- Candidate cell state update: computes a candidate value for the cell state
- Final cell state and hidden state update:
Through these equations, LSTMs effectively handle long-range dependencies while selectively preserving relevant information at each time step.
Application of the Transformation Mechanism in Practice
The RNN transformation mechanism finds broad application across domains. Below is a concrete example: text generation using LSTM.
When grasping the RNN transformation mechanism, first consider: input sequence, hidden state, weight sharing, time-step updates, gradient propagation, and the long-term dependency problem.
Text Generation Example
Suppose we have a short text string and wish to train an LSTM model to generate new text.
import numpy as np
import tensorflow as tf
# Assume we have a pre-built character vocabulary and training data
char_to_idx = {'a': 0, 'b': 1, 'c': 2} # Example vocabulary mapping
idx_to_char = {i: char for char, i in char_to_idx.items()}
text = "abcabcabc"
# Hyperparameters
seq_length = 3
vocab_size = len(char_to_idx)
embedding_dim = 256
hidden_units = 128
# Data preprocessing
inputs = []
targets = []
for i in range(len(text) - seq_length):
inputs.append([char_to_idx[char] for char in text[i:i + seq_length]])
targets.append(char_to_idx[text[i + seq_length]])
# Convert to TensorFlow tensors
inputs = tf.convert_to_tensor(inputs)
targets = tf.convert_to_tensor(targets)
# Define LSTM model
model = tf.keras.models.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[None, None]),
tf.keras.layers.LSTM(hidden_units,
return_sequences=False,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam')
# Train the model
model.fit(inputs, targets, epochs=100)
# Text generation function
def generate_text(model, start_string, num_generate=10):
input_eval = [char_to_idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
text_generated = []
# Low-temperature sampling for more deterministic output
temperature = 1.0
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
predictions = tf.squeeze(predictions, 0) / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx_to_char[predicted_id])
return start_string + ''.join(text_generated)
# Generate text
print(generate_text(model, start_string="ab", num_generate=10))
In this simplified example, the learned LSTM transformation mechanism enables the model to generate plausible continuations—starting from the prefix "ab".
If you haven’t fully internalized “RNN Transformation Mechanism”, revisit the four actions outlined on this card to reinforce understanding.
When reviewing “RNN Transformation Mechanism”, avoid jumping straight into large-scale projects. Instead, first validate your conceptual grasp using a minimal, working example.
Summary
This article provided an in-depth analysis of RNNs and their transformation mechanisms—with particular emphasis on LSTM architecture and usage. RNNs’ unique ability to process sequential information underpins their strong performance in tasks like sequence generation and sentiment analysis. In the next article, we will explore concrete real-world applications of RNNs, further illuminating their pivotal role in modern deep learning.
Before reading “RNN Transformation Mechanism”, use the accompanying diagram to confirm your understanding of the central narrative. After reading, reflect: which steps can you implement directly? Which concepts still require supplemental study?
Continue