Back to blog
RNN with PyTorch: Recurrent Neural Network from Scratch
ML10 min readJan 23, 2025

RNN with PyTorch: Recurrent Neural Network from Scratch

Learn RNN with PyTorch step by step. Guide to implementing a recurrent neural network from scratch: architecture, code and deep learning best practices.
Diego Velez
Diego Velez
Technical leadership

RNN vs Feedforward Architecture

RNNs are easier to understand when compared to feedforward nets. Unlike feedforward networks, a single RNN layer has 3 weight matrices, 2 input tensors and 2 output tensors. RNNs introduce a hidden state: an extra input that depends on previous layer outputs. For the first timestep, initialize it with zeros.

Architecture of an RNN Layer

We feed the network one element of the sequence at a time (e.g. one day's price, one character). Each step computes the hidden state, which carries information forward.

Inputs:

  • Input tensor: One timestep of the sequence
  • Hidden state tensor: Initialized to zeros at the start of each sequence

Weight matrices:

  • Input dense: For the current input
  • Hidden dense: For the previous hidden state
  • Output dense: For activation(input_dense + hidden_dense)

Outputs:

  • New hidden state: activation(input_dense + hidden_dense) — used as input next step
  • Output: activation(output_dense) — prediction vector

RNN Layer Code

import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size: int, hidden_size: int, output_size: int) -> None:
        super().__init__()
        self.i2h = nn.Linear(input_size, hidden_size, bias=False)
        self.h2h = nn.Linear(hidden_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        self.hidden_size = hidden_size

    def forward(self, x, hidden_state) -> tuple[torch.Tensor, torch.Tensor]:
        x = self.i2h(x)
        hidden_state = self.h2h(hidden_state)
        hidden_state = torch.tanh(x + hidden_state)
        out = self.h2o(hidden_state)
        return out, hidden_state

    def init_zero_hidden(self, batch_size=1) -> torch.Tensor:
        return torch.zeros(batch_size, self.hidden_size, requires_grad=False)

Batch Training

Training with batches is much faster (often 10x). Process multiple sequences per step; the training loop feeds one character per timestep, accumulates loss across the sequence, backpropagates and clips gradients. Implementing this by hand helps understand the operations and flow—and it's satisfying to watch the RNN learn and generate text.

Build what's next.

Ready to improve your cloud infrastructure and operations? Book an assessment with no commitment.

Book assessment