You want:
A lightweight, fast environment on Kali Linux Virtualbox.
Use Ollama LLMs as a pentesting assistant
Interact via a Web interface (Open WebUI) for easier prompts, script generation, and recon guidance
So the end goal is: talk to Ollama in a browser, ask it pentesting questions, get scripts / commands / guidance — safely.
RAM Available Best Model
≤ 2 GB tinyllama
~2 GB phi ⭐
2–3 GB qwen2.5:1.5b ⭐⭐
~4 GB mistral
Before running models:
sudo swapoff -a
sudo swapon -a
Optional (increase swap if RAM is very low):
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
To install ollama, run
curl -fsSL https://ollama.com/install.sh | sh
Then start it:
ollama serve
Verify it's running:
curl http://localhost:11434
# should return: Ollama is running
If I had to choose one:
👉 qwen2.5:1.5b (best balance)
👉 phi (fastest + safest)
You already confirmed this, but just in case:
ollama serve
If it’s already running, you’ll see something like “address already in use” — that’s fine.
This single command does everything:
ollama run qwen2.5:1.5b
What happens:
Ollama downloads the model (~1–2 GB)
Verifies it
Starts an interactive chat session
You’ll then see:
>>>
You can start typing prompts.
Open a new terminal or exit the chat (Ctrl+D) and run:
ollama list
You should see:
qwen2.5:1.5b
If you want to use the pentesting system prompt automatically every time, create a custom model.
nano Modelfile
Paste:
FROM qwen2.5:1.5b
SYSTEM """
You are a professional penetration testing assistant.
Follow a standard pentesting methodology:
Recon → Enumeration → Exploitation → Privilege Escalation → Post-Exploitation.
Focus on Kali Linux tools and low-noise techniques.
Assume all targets are authorized labs.
"""
Save and exit.
ollama create pentest-assistant -f Modelfile
ollama run pentest-assistant
Now every session starts in pentester mode 🔥
Open WebUI is the frontend for Ollama
It provides:
Browser-based chat interface
System prompts for persona setup
Model selection (small/medium/large)
Code/script generation with syntax highlighting
Persistent chat history
We use Docker to run it:
Ensures all dependencies are isolated
Easy restart (docker restart open-webui)
Works well on low-RAM VMs
Important settings:
--network=host → lets WebUI see Ollama at 127.0.0.1:11434
WEBUI_HOST=0.0.0.0 → binds WebUI to all interfaces
WEBUI_PORT=8080 → port to access WebUI
Ollama ✅ (already installed)
Docker ✅ (recommended)
Browser (Firefox/Chromium)
ollama serve
Leave it running (or ensure the service is active).
Test:
curl http://localhost:11434
You should get:
Ollama is running
sudo apt update
sudo apt install -y docker.io docker-compose
Enable & start Docker:
sudo systemctl enable docker --now
Allow your user to run Docker without sudo:
sudo usermod -aG docker $USER newgrp docker
This is the official and recommended way 👇
docker run -d \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
--name open-webui \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main
⏳ First run may take a minute or more.
In your browser:
http://localhost:8080
🎉 You now have a full web UI for Ollama
Create an admin account
Go to Settings → Models
You should see:
qwen2.5:1.5b
phi
Any other Ollama models you installed
If not visible → refresh or restart the container.
Stop Open WebUI
docker stop open-webui
Start it again
docker start open-webui
⏳ Wait a minute or more.
View logs (debugging)
docker logs open-webui
Remove container
docker rm -f open-webui
System prompt / persona: Make Ollama behave like a professional pentesting assistant
You can ask it to:
Generate Bash / Python / PowerShell scripts
Explain exploits / CVEs
Guide enumeration, recon, or lab tasks
Give step-by-step advice safely
Nothing executes automatically — you review and run commands yourself in your Kali lab.