How to Run Your Own Local LLM: Updated for 2024 — Version 1

Originally published at https://hackernoon.com on March 21st, 2024

Thomas Cherickal
3 min readOct 11, 2024

This is the breakout year for Generative AI!

Well; to say the very least, this year, I’ve been spoiled for choice as to how to run an LLM Model locally.

Let’s start!

1) HuggingFace Transformers:

Magic of Bing Image Creator — Very imaginative.

All Images Created by Bing Image Creator

To run Hugging Face Transformers offline without internet access, follow these steps:

Running HuggingFace Transformers Offline in Python on Windows

Requirements:

  • Python 3.6+
  • PyTorch (version compatible with Transformers)
  • Transformers library Tokenizers library
  • Sentence Transformers library (optional, for sentence-level tasks)

Steps:

  1. Download the model:
  2. Choose a model from the HuggingFace Hub.
  3. Download the model weights and tokenizer weights.
  4. Place the downloaded files in a local directory.
  5. Set environment variables:
  6. Create a .env file in your project directory.
  7. In the .env file, define the following variables:
  8. transformers_home: Path to the directory where you stored the downloaded model and tokenizer weights.
  9. MODEL_NAME: Name of the model you want to use.
  10. MODEL_CONFIG: Path to the model configuration file (optional).
  11. TOKENIZER_NAME: Name of the tokenizer you want to use.

Import libraries:

import os 
import transformers
from transformers import AutoModel, AutoTokenizer
#Replace "your-model-name" with the actual name of your model
model_name = os.getenv("MODEL_NAME") 
model_config_path = os.getenv("MODEL_CONFIG")
#Load the model and tokenizer
model = AutoModel.from_pretrained(model_name, config=model_config_path) 
tokenizer = AutoTokenizer.from_pretrained(model_name)

Use the Model:

#Example usage:
input_text = "Hello, world!" 
tokens = tokenizer(input_text)
outputs = model(tokens)
#Print the outputs
print(outputs)

Additional Notes:

You may need to modify the transformers_home variable if you want to store the downloaded models in a different location.

You should download the model by cloning the repository and tokenizer weights manually to run it offline.

You can find more information on how to run Transformers offline on the HuggingFace documentation:

https://transformers.huggingface.co/docs/usage/inference#offline-inference

Example:

#Assuming you have downloaded the model and tokenizer weights 
#for bert-base-uncased-finetuned-sst-2-#english
os.environ["transformers_home"] = "C:\transformers" 
os.environ["MODEL_NAME"] = "bert-base-uncased-finetuned-sst-2-english"
import os 
import transformers
model_name = os.getenv("MODEL_NAME") 
model_config_path = os.getenv("MODEL_CONFIG")
model = AutoModel.from_pretrained(model_name, config=model_config_path) 
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "The quick brown fox jumps over the lazy dog." 
tokens = tokenizer(input_text)
outputs = model(tokens)
print(outputs)

This code will output the model’s predictions for the input text.

Remember, you must either download the model with internet access and save it locally or clone the model repository.

You can visit the website https://huggingface.co/models for more details.

There are around a stunning 558,000~ odd transformer LLMs available.

Hugging Face has become the de facto democratizer for LLM models, making nearly all available open source LLM models accessible, and executable without the usual mountain of expenses and bills. Basically, available, open source, and free. This is the mother lode!

2) gpt4all

I’m sorry, I can’t describe this one!

This is a paid story. You can read the rest at the following link:

How to Run Your Own Local LLM (Updated for 2024) | by Thomas Cherickal | Technology Hits | Medium

--

--