How to Create Your Local LLM Model (Revised and Updated) — Extra Curated LLMOps Resources

Thomas Cherickal
8 min readApr 1, 2023

Updated to September 2023

Advanced Scientific Particle Physics Research Model. All other images created by Bing Image Creator.

Introduction

Creating a local large language model (LLM) is a significant undertaking, typically requiring substantial computational resources and expertise in machine learning. It was not feasible to run local LLMs on your own local system because of the computational costs involved. However, with the advent of new software, GPT4All and LM-Studio can be used to create complete software packages that work locally. But let’s start with a HuggingFace Transformers source code example that shows you how to use the HuggingFace Libraries and PyTorch for LLMs (cloud-based, not local in this case):

HuggingFace Transformers

A complete program that uses the GPT-2 model, GPT-2 tokenizer, and is fine-tuned on the AG NEWS dataset (a small dataset used for utility purposes) is given below and explained in code snippets. We can leverage the power of pre-trained models and fine-tune them on specific tasks.

  • Importing necessary libraries and modules: The script starts by importing the necessary libraries and modules. AG_NEWS is a news classification dataset from the torchtext.datasets package. AutoModelWithLMHead and AdamW are imported from the transformers library. AutoModelWithLMHead is a class that provides automatic access to pre-trained models with a language modeling head, and AdamW is a class that implements the AdamW optimizer, a variant of the Adam optimizer with weight decay.
from torchtext.datasets import AG_NEWS
from transformers import AutoModelWithLMHead, AdamW
from transformers import AutoTokenizer
  • Setting up the tokenizer: The script uses the AutoTokenizer class from the transformers library to load the tokenizer associated with the “gpt2” model. The tokenizer is responsible for converting input text into a format that the model can understand. This includes splitting the text into tokens (words, subwords, or characters), mapping the tokens to their corresponding IDs in the model’s vocabulary, and creating the necessary inputs for the model (like attention masks).
tokenizer = AutoTokenizer.from_pretrained("gpt2")
  • Setting the number of epochs: The script sets the number of epochs for training to 50. An epoch is one complete pass through the entire training dataset. The number of epochs is a hyperparameter that you can tune. Training for more epochs can lead to better results, but it also increases the risk of overfitting and requires more computational resources.
EPOCHS = 50
  • Preprocessing the data: The preprocess_data function is defined to preprocess the data. It takes an iterator over the data and encodes the text in each item using the tokenizer. The AG_NEWS dataset is then loaded and preprocessed. The dataset is split into ‘train’ and the text from each item is encoded. Encoding the text involves splitting it into tokens, mapping the tokens to their IDs in the model’s vocabulary, and creating the necessary inputs for the model.
def preprocess_data(data_iter):
data = [tokenizer.encode(text) for _, text in data_iter]
return data


train_iter = AG_NEWS(split='train')
train_data = preprocess_data(train_iter)
  • Setting up the model and optimizer: The script loads the pre-trained “gpt2” model using the AutoModelWithLMHead class and sets up the AdamW optimizer with the model’s parameters. The model is a transformer-based model with a language modeling head, which means it’s designed to generate text. The AdamW optimizer is a variant of the Adam optimizer with weight decay, which can help prevent overfitting.
model = AutoModelWithLMHead.from_pretrained("gpt2")
optimizer = AdamW(model.parameters())

model.train()
for epoch in range(EPOCHS):
for batch in train_data:
outputs = model(batch)
loss = outputs.loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
  • Training the model: The script trains the model for the specified number of epochs. In each epoch, it iterates over the batches of training data, feeds each batch to the model, computes the loss, performs backpropagation with loss.backward(), and updates the model’s parameters with optimizer.step(). It also resets the gradients with optimizer.zero_grad(). This is a standard training loop for PyTorch models.
  • Generating text: After training, the script uses the model to generate text. It starts by encoding a prompt using the tokenizer, then feeds this encoded prompt to the model’s generate method. The output of the generate method is a sequence of token IDs, which is then decoded back into text using the tokenizer.
prompt = tokenizer.encode("Write a summary of the new features in the latest release of the Julia Programming Language", return_tensors="pt")
generated = model.generate(prompt)
generated_text = tokenizer.decode(generated[0])
  • Saving the generated text: Finally, the script saves the generated text to a file named “generated.txt”. This is done using Python’s built-in file handling functions.
with open("generated.txt", "w") as f:
f.write(generated_text)

This script is a good example of how to fine-tune a pre-trained language model on a specific task. However, it’s worth noting that fine-tuning a large model like GPT-2 can be computationally intensive and may require a powerful machine or cloud-based resources. Also, this script doesn’t include some important steps like splitting the data into training and validation sets, shuffling the data, and batching the data. These steps are crucial for training a robust model. For convenience, the entire program is given below (please do report errors or corrections in the comments below):

from torchtext.datasets import AG_NEWS
from transformers import AutoModelWithLMHead, AdamW
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")

EPOCHS = 50


def preprocess_data(data_iter):
data = [tokenizer.encode(text) for _, text in data_iter]
return data


train_iter = AG_NEWS(split='train')
train_data = preprocess_data(train_iter)


model = AutoModelWithLMHead.from_pretrained("gpt2")
optimizer = AdamW(model.parameters())

model.train()
for epoch in range(EPOCHS):
for batch in train_data:
outputs = model(batch)
loss = outputs.loss
loss.backward()
optimizer.step()
optimizer.zero_grad()


prompt = tokenizer.encode("Write a summary of the new features in the latest release of the Julia Programming Language", return_tensors="pt")
generated = model.generate(prompt)

generated_text = tokenizer.decode(generated[0])
with open("generated.txt", "w") as f:
f.write(generated_text)

There are two packaged solutions for Local LLMs (and many more popping up, everyday). Two of the best of them are given below. I especially have a preference for LM-Studio.

GPT4All

You don’t need any of this code anymore because the GPT4All open-source application has been released that runs an LLM on your local computer without the Internet and without a GPU. I’m linking tothe site below:

This is the best solution for those of you who want a completely open-source on-premises system. Have fun! But make sure you have at least 32 GB of local RAM, 16 GB GPU RAM, a 3+ Ghz multicore(the more, the better) processor, and a local SSD. LLMs are almost as computationally expensive as Bitcoin mining!

The relevant GitHub repository is:

Now, of course there’s a lot more to LLM models than just chat. But considering the expensive;y daunting computational requirements for fine-tuning musical and pictures and audio for LLMs, I am just going to mention some popular, already built and ready-to-go solutions as well as some interesting source material:

Audio LLMs

Image LLMs

Multimodal LLMs

General LLM Resources

#And An Absolute Must-See List Of Resources on LLMs and LLMOps:

https://medium.com/@abonia/best-llm-and-llmops-resources-for-2023-75e96ac37feb

This article is an absolute gem. Please do visit, there are some incredible both standard and extraordinary resources here.

Of course, we cannot leave out:

https://learn.deeplearning.ai/ — The courses that are curated and run by Andrew Ng himself. Need we say more?

LM-Studio

LM-Studio is a powerful tool for training and deploying language models. It provides a user-friendly interface and a wide range of features to help you fine-tune your models, visualize their performance, and deploy them in production.

LM-Studio supports various transformer-based models like GPT-2, GPT-3, BERT, Falcon, Llama2, Llama-Python, and many, many others. It also provides various options for data preprocessing, model training, and hyperparameter tuning. This makes it a versatile tool for both beginners and experienced machine learning practitioners.

One of the key features of LM-Studio is its support for fine-tuning. Fine-tuning is a process where you take a pre-trained model and train it further on a specific task. This can significantly improve the model’s performance on that task. LM-Studio makes this process easy by providing a simple interface for loading pre-trained models and training them on your data.

Another important feature of LM-Studio is its visualization tools. These tools allow you to monitor your model’s performance during training and evaluate its performance on a test set. This can help you identify issues early and make necessary adjustments to your training process.

LM-Studio also provides a robust deployment pipeline. Once your model is trained and tested, you can easily deploy it to a production environment. This makes LM-Studio a great tool for end-to-end machine learning projects.

Usability: LM-Studio is designed to be user-friendly. It provides a clean and intuitive interface that makes it easy to navigate through different features. It also provides detailed documentation and tutorials to help you get started.

Scalability: LM-Studio can handle large datasets and complex models. It leverages the power of modern hardware and software to train models efficiently. This makes it a suitable tool for both small and large-scale projects.

Community Support: LM-Studio has a vibrant community of users and contributors. They provide valuable feedback and contribute to the development of the tool. This ensures that LM-Studio is always up-to-date with the latest trends and technologies in the field of machine learning.

LM-Studio is a comprehensive tool for training and deploying language models. It provides a wide range of features, a user-friendly interface, and robust performance.

Conclusion

The field of LLMs and advanced AI is what I’ve decided will easily be the most versatile technology of the future. Junior Programmers, Artists, ML Engineers, Data Processing Analysts, Beginner Data Scientists, and practically every other digital job should be learning this technology since advanced versions in the future will have no errors and be production-ready — from a single line of text. Learn Generative Learning. It really is the future of the Digital World. And artists are suffering already! I can see a similar situation for even junior-level software engineers very soon. People, skill up! Get into Generative AI! 10x your productivity! The future belongs to those who will use these tools the best. Get Going! ASAP.

--

--