ollama

I installed ollama on my computer, and I want to know more about it.

Date Created:
2 914

References



Related


  • Large Language Model
    • A large language model is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
    • The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large scale text data. Modern models can be fine-tuned for specific tasks, or be guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherit in human language corpora, but they inherit inaccuracies and biases present in the data on which they are trained.
  • GGUF Format
    • GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML.
    • It is a successor file format to GGML, GGMF, and GGJT, and is designed to be unambiguous by containing all the information needed to load a model.


Notes


Ollama is an open-source tool that allows users to run large language models (LLMs) on their own computers. LLMs are AI programs that can generate and understand human-like text, code, and perform other analytical tasks. Ollama makes it possible to run these models locally, which can be beneficial for a number of reasons.

Getting Started:

$ ollama run llama3.2

Customize a Model

Ollama supports GGUF models in the Modelfile:

  1. Create a file named ModelFile, with a FROM instruction with the local file path to the model you want to import.
$ FROM ./model_file.Q4_0.gguf
  1. Create the model in Ollama
$ ollama create example_model -f Modelfile
  1. Run the model
$ ollama run example_model

You can import models from PyTorch or Safetensors.

Customize a Prompt

Models from the Ollama library can be customized with a prompt.

$ ollama pull llama3.2

Create a Modelfile

FROM llama3.2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Create and run the model.

$ ollama create mario -f ./Modelfile
$ ollama run mario
>>> hi
Hello! It's your friend Mario.

Examples of Customizing models.

CLI Reference

$ ollama create mymodel -f ./Modelfile # ollama create is used to create a model from a Modelfile
$ ollama pull llama3.2 # Pull a model / update a local model
$ ollama rm llama3.2 # Remove a model
$ ollama cp llama3.2 my-model # copy a model
$ # For multiline input, you can wrap text with """
>>> """Hello,
World!
"""
I'm a basic program that prints the famous "Hello, world!" message to the console.

Multimodal Models

$ # Multimodal models
$ oolama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
The image features a yellow smiley face, which is likely the central focus of the picture.
# Pass the prompt as an argument
$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
$ ollama show llama3.2 # Show model information
$ ollama list # List models on computer
$ ollama ps # Showw which models are currently loaded
$ ollama stop llama3.2 # Stop a model which is currently running

ollama serve is used when you want to start ollama without running the desktop application.

Rest API

Ollama has a REST API for running and managing models.

Generate a Response

$ curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'

Chat with Model

$ curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'


Models Available

  • ollama supports a list of models available on https://ollama.com/library
    • You should have at least 8GB of RAM to run the 7B models, 16 GB to run the 13B models, and 32GB to run the 33B models.
Model Library

Model

Parameters

Size

Download

Llama 3.2

3B

2.0GB

ollama run llama3.2

Llama 3.2

1B

1.3GB

ollama run llama3.2:1b

Llama 3.2 Vision

11B

7.9GB

ollama run llama3.2-vision

Llama 3.2 Vision

90B

55GB

ollama run llama3.2-vision:90b

Llama 3.1

8B

4.7GB

ollama run llama3.1

Llama 3.1

70B

40GB

ollama run llama3.1:70b

Llama 3.1

405B

231GB

ollama run llama3.1:405b

Phi 3 Mini

3.8B

2.3GB

ollama run phi3

Phi 3 Medium

14B

7.9GB

ollama run phi3:medium

Gemma 2

2B

1.6GB

ollama run gemma2:2b

Gemma 2

9B

5.5GB

ollama run gemma2

Gemma 2

27B

16GB

ollama run gemma2:27b

Mistral

7B

4.1GB

ollama run mistral

Moondream 2

1.4B

829MB

ollama run moondream

Neural Chat

7B

4.1GB

ollama run neural-chat

Starling

7B

4.1GB

ollama run starling-lm

Code Llama

7B

3.8GB

ollama run codellama

Llama 2 Uncensored

7B

3.8GB

ollama run llama2-uncensored

LLaVA

7B

4.5GB

ollama run llava

Solar

10.7B

6.1GB

ollama run solar



You can read more about how comments are sorted in this blog post.

User Comments