ollama

I installed ollama on my computer, and I want to know more about it.

Date Created:

2 942

References

ollama site

Large Language Model
- A large language model is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
- The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large scale text data. Modern models can be fine-tuned for specific tasks, or be guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherit in human language corpora, but they inherit inaccuracies and biases present in the data on which they are trained.
GGUF Format
- GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML.
- It is a successor file format to GGML, GGMF, and GGJT, and is designed to be unambiguous by containing all the information needed to load a model.

Notes

Ollama is an open-source tool that allows users to run large language models (LLMs) on their own computers. LLMs are AI programs that can generate and understand human-like text, code, and perform other analytical tasks. Ollama makes it possible to run these models locally, which can be beneficial for a number of reasons.

Getting Started:

Download ollama on macOS, Linux, or Windows

$ ollama run llama3.2

Customize a Model

Ollama supports GGUF models in the Modelfile:

Create a file named ModelFile, with a FROM instruction with the local file path to the model you want to import.

$ FROM ./model_file.Q4_0.gguf

Create the model in Ollama

$ ollama create example_model -f Modelfile

Run the model

$ ollama run example_model

You can import models from PyTorch or Safetensors.

Customize a Prompt

Models from the Ollama library can be customized with a prompt.

$ ollama pull llama3.2

Create a Modelfile

FROM llama3.2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Create and run the model.

$ ollama create mario -f ./Modelfile
$ ollama run mario
>>> hi
Hello! It's your friend Mario.

Examples of Customizing models.

CLI Reference

$ ollama create mymodel -f ./Modelfile # ollama create is used to create a model from a Modelfile
$ ollama pull llama3.2 # Pull a model / update a local model
$ ollama rm llama3.2 # Remove a model
$ ollama cp llama3.2 my-model # copy a model
$ # For multiline input, you can wrap text with """
>>> """Hello,
World!
"""
I'm a basic program that prints the famous "Hello, world!" message to the console.

Multimodal Models

$ # Multimodal models
$ oolama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
The image features a yellow smiley face, which is likely the central focus of the picture.

# Pass the prompt as an argument
$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
$ ollama show llama3.2 # Show model information
$ ollama list # List models on computer
$ ollama ps # Showw which models are currently loaded
$ ollama stop llama3.2 # Stop a model which is currently running

ollama serve is used when you want to start ollama without running the desktop application.

Rest API

Ollama has a REST API for running and managing models.

Generate a Response

$ curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

Chat with Model

$ curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Models Available

ollama supports a list of models available on https://ollama.com/library
- You should have at least 8GB of RAM to run the 7B models, 16 GB to run the 13B models, and 32GB to run the 33B models.


Model	Parameters	Size	Download
Llama 3.2	3B	2.0GB	`ollama run llama3.2`
Llama 3.2	1B	1.3GB	`ollama run llama3.2:1b`
Llama 3.2 Vision	11B	7.9GB	`ollama run llama3.2-vision`
Llama 3.2 Vision	90B	55GB	`ollama run llama3.2-vision:90b`
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	70B	40GB	`ollama run llama3.1:70b`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Phi 3 Medium	14B	7.9GB	`ollama run phi3:medium`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`