How to Run Private LLMs on Your Own Hardware

Learn how to run powerful uncensored language models completely offline on affordable hardware for enhanced privacy and unrestricted access to information.

Introduction

Welcome to the Global Science Network! I’m going to show you how to download and run a large language model that was trained on what would be equivalent to:

Reading 127 million novels
Reading through all of Wikipedia 2,500 times

The best part? This model can be downloaded and run on an external flash drive that costs around $12. The model only requires about 10GB of storage space.

Why Run Local LLMs

Running uncensored, offline LLMs offers two major advantages:

1. Unrestricted Access to Information

Users gain access to information that might otherwise be difficult to access. Some countries and tech companies limit and censor internet content, but with these models, you can access previously restricted information.

Important note: The quality of output depends on what data the model was trained on. There could still be inherent biases based on which sources (Meta, OpenAI, Google, Anthropic, xAI, or DeepSeek) were used for training.

While LLMs don’t provide absolute truth and can produce incorrect results, they are excellent tools for:

Finding information quickly
Summarizing information
Converting thoughts into usable code

2. Enhanced Privacy

Running models offline ensures that tech companies and governments cannot monitor what you’re searching or thinking about. This provides genuine privacy while still allowing access to powerful AI capabilities.

Offline models are particularly valuable when working with:

Proprietary information
Classified data
Personal information

These models can even be further trained based on your specific needs, making them uniquely suited to your requirements over time.

Understanding the Model Architecture

The Dolphin Llama 3 model comes in two versions:

8 billion parameter model (~5GB storage)
70 billion parameter model (~40GB storage)

Both were trained by Meta with 15 trillion tokens (about 60 terabytes of raw text data).

The 8 billion parameter model consists of:

32 transformer layers with self-attention and feed-forward networks
Self-attention components using 496×496 weight matrices
Each layer containing 67.1 million parameters for attention (totaling 2.15 billion parameters)
Feed-forward networks with large weight matrices to expand and refine token representations
Layer normalization and biases to stabilize training
Token embeddings and positional encodings to help the model understand meaning and word order

Step-by-Step Installation Guide

Requirements

Computer with sufficient RAM (8GB+ recommended)
128GB USB 3.0 flash drive (for portable usage)
Internet connection (for initial download only)

Step 1: Download Ollama

Go to Ollama.com
Navigate to the Models tab
Search for “dolphin”
Select the Dolphin Llama 3 Model
Click Download and run the executable

Step 2: Pull the Model

Open two terminals (PowerShell terminals if on Windows)
In the first terminal, type: ollama serve
In the second terminal, copy the run command from Ollama.com and paste it
Wait for the model to download (may take a few minutes)
When finished, press Ctrl+D and Ctrl+C to end the programs

Step 3: Test the Model

Open two new terminals (don’t run as administrator, as this might activate censorship)
In terminal one, enter: ollama serve
In terminal two, enter: ollama run dolphin-llama-3
Test with a query that would typically be censored to ensure it’s working as expected

Step 4: Transfer to External Drive

Format a 128GB USB flash drive using NTFS file system (allows files larger than 4GB)
Locate the Ollama files on your system (typically in C:\Ollama on Windows)
Verify one of the model files is around 4.5GB
Copy the Ollama folder to your external drive
Find and copy the base Ollama server program files as well

Step 5: Run from External Drive

Open two PowerShell terminals
In the first terminal:
- Change directory to the external drive
- Set the environmental variables and model path
- Start the server with the serve command
In the second terminal:
- Change directory to the external drive
- Run the program with: ollama.exe run dolphin-llama-3

Using a Better Interface: AnythingLLM

The terminal interface works, but AnythingLLM provides a much better experience:

Run the Ollama server from PowerShell as before
Download AnythingLLM from anythingllm.com
During installation, set the path to your external drive
Create an .env file in the AnythingLLM folder with the correct model path
Start the program and select:
- Ollama
- “Run LLMs on your machine”
- Model: dolphin-llama-3:latest

The AnythingLLM interface allows you to:

Upload documents
Get more user-friendly responses
Switch between different AI models
Customize workspace settings

Additional Resources

You can also use other interfaces to run offline models:

GPT4All
LM Studio
Open WebUI

Conclusion

You now have access to a powerful AI model trained on 127 million novels worth of data, running completely offline on affordable hardware. This provides both unrestricted access to information and enhanced privacy.

In future videos, I’ll explore building a low-cost companion robot with a mix of LLMs and hardware-based neural networks. Stay tuned for more content about hardware-based neural networks!

Need More Power?

Running large 70B models locally can be demanding on consumer hardware. If you need cloud GPU power, try DigitalOcean GPU droplets — get started with free credits and scale up when you need it.

If you have questions or suggestions about other AI models to run, please let me know in the comments.