RAG + Embedding with AnythingLLM and Ollama

July 23, 2024

This blog post is to demonstrate how easy and accessible RAG capabilities are when we leverage the strengths of AnythingLLM and Ollama to enable Retrieval-Augmented Generation (RAG) capabilities for various document types.

AnythingLLM - is an all-in-one AI application that simplifies the interaction with Large Language Models (LLMs) for business intelligence purposes. It allows users to chat with any document, such as PDFs or Word files, using various LLMs, including enterprise models like GPT-4 or open-source models like Llama and Mistral.

Ollama - we introduce this in the last blog. We will use its ability to service multiple Open Source LLMs via its API interface.

Here is a quick outline:

Details about docker containers for Ollama (Platform/Server) and AnythingLLM (front end/chat/uploading documents).
Explore:
- Embedding a news article about recent US political events.
- Vector database (Lance DB) - you can spin up chroma if you like, but Lance DB comes bundled with AnythingLLM.
- Query the LLM about the news article and assess how well it did.

A lot of the above is built into AnythingLLM.

Components used

Ollama Server - a platform that make easier to run LLM locally on your compute.
Open WebUI - a self-hosted front end that interacts with APIs that presented by Ollama or OpenAI compatible platforms. I am using to download new LLMs much easier to manage than connecting to the ollama docker container and issuing ‘ollama pull’.
AnythingLLM - an all-in-one AI application that simplifies the interaction with Large Language Models (LLMs).
Linux Server or equivalent device - spin up three docker containers with the Docker-compose YAML file specified below.

Code break down

Analysis of Docker-Compose.yml file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
services:
  ollama-server:
    image: ollama/ollama:latest
    container_name: ollama-server
    ports:
      - "11434:11434"
    volumes:
      - ./ollama_data:/root/.ollama
    restart: unless-stopped

  ollama-webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    restart: unless-stopped
    environment:
      - 'OLLAMA_BASE_URL=http://ollama-server:11434'
    volumes:
      - ./webui:/app/backend/data
    ports:
      - "3010:8080"
    extra_hosts:
      - host.docker.internal:host-gateway

  anything-LLM:
    image: mintplexlabs/anythingllm:latest
    container_name: anything-llm
    cap_add:
      - SYS_ADMIN
    restart: unless-stopped
    environment:
      - SERVER_PORT=3001
      - UID='1000'
      - GID='1000'
      - STORAGE_DIR=/app/server/storage
      - LLM_PROVIDER=ollama
      - OLLAMA_BASE_PATH=http://ollama-server:11434
      - OLLAMA_MODEL_PREF='phi3'
      - OLLAMA_MODEL_TOKEN_LIMIT=4096
      - EMBEDDING_ENGINE=ollama
      - EMBEDDING_BASE_PATH=http://ollama-server:11434
      - EMBEDDING_MODEL_PREF=nomic-embed-text:latest
      - EMBEDDING_MODEL_MAX_CHUNK_LENGTH=8192
      - VECTOR_DB=lancedb
      - WHISPER_PROVIDER=local
      - TTS_PROVIDER=native
      - PASSWORDMINCHAR=8
    volumes:
      - ./anythingllm_data/storage:/app/server/storage
      - ./anythingllm_data/collector/hotdir/:/app/collector/hotdir
      - ./anythingllm_data/collector/outputs/:/app/collector/outputs
    ports:
      - "3001:3001"
    extra_hosts:
      - host.docker.internal:host-gateway

Line 6 - Ollama Server exposes port 11434 for its API.

Line 8 - maps a folder on the host ollama_data to the directory inside the container /root/.ollama - this is where all LLM are downloaded to.

Line 16 - environment variable that tells Web UI which port to connect to on the Ollama Server. Since both docker containers are sitting on the same host we can refer to the ollama container name ‘ollama-server’ in the URL.

Line 18 - maps a folder on the host webui to the directory inside the container /app/backend/data - storing configs.

Line 20 - Connect to the Web UI on port 3010.

Line 21-22 - Avoids the need for this container to use ‘host’ network mode.

Line 30 - Environmental variable that are used by AnythingLLM - more can be found at ENV variables Note the Base_Path to ollama refers to the ollama container listed above in the docker compose file.

Line 47 - AnythingLLM uses a lot of volume mapping. They may make changes to this later, the last two collector was a recent addition, so it will depend on the version of the docker image that gets pulled. Since it is set to ‘latest’

My directory structure in the folder where docker compose exist. Create these folders before starting the ‘docker compose’ commands.

❯ tree -L 2 -d
.
├── anythingllm_data
│   ├── collector
│   └── storage
├── ollama_data
└── webui

Issue ‘docker compose up -d’ from the folder where your docker compose YAML file sits, to install and start the containers. Once the containers are up, you can browse to the AnythingLLM on port 3001 - example http://x.x.x.x:3001

Check Anything LLM integration with Ollama

Open your browser and check you can get to Anything LLM, then on the bottom left, navigate to the “Open settings” button. Check the AI Provider section for LLM that Ollama is selected and that the “Ollama Model” drop down has a list of LLM pull down already on Ollama.

Then navigate to Embedder and check that you have ‘nomic-embed-text’ selected. If not use Web-Ui to download it or Ollama to pull it down. We will use nomic-embed-text model to embed our document.

Next checked that we are using ‘LanceDB’ as a Vector database.

Then head back out to the main screen and create a ‘New workspace’ - I called mine “Trump” Then click on the upload button to embed a document.

I then uploaded a recent news article about Trump so that we can query it.

Testing the embedding.

Here are the workspace setting used.

Gemma2 LLM
Chat Setting - set to Query - Query will provide answers only if document context is found.
LLM Temperature set to 0.7

News article used was from news.com.au

The last question it got in-correct - Elon did “pledged $US45 million ($52 million) monthly to a Trump super PAC”

Summary

That was a quick test of AnythingLLM with Ollama. You can see it was close but could not score 3 out of 3. It also had some issue processing PDF file, when I print page from the web browser, perhaps something with the formatting. It would accept the file but let’s just say the knowledge gaps was much bigger.

Open Source Community had made AI so accessible now. Can’t wait to see what the future brings.

Please reach out if you have any questions or comments.

Share on

Twitter Facebook LinkedIn

Peter Nhan

RAG + Embedding with AnythingLLM and Ollama

Components used

Code break down

Check Anything LLM integration with Ollama

Testing the embedding.

Summary

Share on

You may also enjoy

Running Ollama locally on Android device

Running LLM locally with Ollama and Open WebUI

SyncThing to sync my Obsidian folders across different platforms

Cisco Live WLC Web Monitoring App