Skip to main content

3 posts tagged with "pytorch"

View All Tags

LLM basics from scratch

· 14 min read
VisualDust
Ordinary Magician | Half stack developer

Abstract

The main purpose of this article is to use basic self-attention blocks to build a simple large language model for learning purposes. Due to limitations in model scale and embedding methods, the model built in this article will not be very effective, but this does not affect the ability to learn various basic concepts of language models similar to Transformer through the code provided in this article.

What happens on this page:

  • get full code of a basic Large(?) Language Model (data preparation, model architecture, model training and predicting)
  • understand the general architecture of transformer with a few illustrations
  • understand how self regressive training works
  • understand how to load very large text dataset into limited memory (OpenWebTextCorpus)
  • train and observe the training procedure
  • load the trained model into a simple ask-and-answer interactive script

Get full code

Code available at github.com/visualDust/naive-llm-from-scratch

warning

Download the code via git clone before continue.

Variational AutoEncoders from scratch

· 18 min read
VisualDust
Ordinary Magician | Half stack developer

image-20231228150104010

AutoEncoders

AutoEncoders are a type of artificial neural network used primarily for unsupervised learning tasks, particularly in the field of dimensionality reduction and feature learning. Their general purpose is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction or noise reduction.

The main components of an autoencoder are:

  1. Encoder: This part of the network compresses the input into a latent space representation. It encodes the input data as an encoded representation in a reduced dimension. The encoder layer is typically composed of a series of layers that gradually decrease in size.

  2. Bottleneck: This is the layer that contains the encoded representation of the input data. It is the heart of the network, which holds the compressed knowledge of the input data. The bottleneck is where the dimensionality reduction takes place.

  3. Decoder: The decoder network performs the reverse operation of the encoder. It takes the encoded data from the bottleneck and reconstructs the input data as closely as possible. This part of the network is typically symmetrical to the encoder, with layers increasing in size.

The objective of an autoencoder is to minimize the difference between the original input and its reconstruction, typically measured by a loss function like mean squared error. By learning to reconstruct the input data, the network learns valuable properties about the data and its structure. AutoEncoders are used in various applications like anomaly detection, image denoising, and as a pre-training step for deep learning models.

basicautoencoder

The objective of an autoencoder is to minimize the difference between the original input and its reconstruction, typically measured by a loss function like mean squared error. By learning to reconstruct the input data, the network learns valuable properties about the data and its structure.

Get full code before continue

Code available at gist : single-file-linear-ave.py

tip

This is a single file approach consists of model architecture, training code, testing code, and inference code. We will talk about each part of the code separately later. If you're in hurry, run this single file python script and open localhost:20202 to see the result.

warning

Please make a new directory for putting this file, once you run this file, it automatically download MNIST dataset to the relative path ./data. You might not want the code download data into unwanted place.

Cityscapes class level boundary labeling with improved one-hot

· 10 min read
VisualDust
Ordinary Magician | Half stack developer

Why

I'm working on some semantic segmentation related code, where I need to enhance segmentation accuracy on boundaries. Therefore, I tried to use boundary loss to assist model training. This article is my attempt and codes.

image-20231119145929665

Our purpose is clear. In Cityscapes, we have indexed image that represents pixels of each class named gtFine_labelIds. What we want is to generate class level boundary from gtFine_labelIds so that we can use it to optimize boundary regions for specific class.