ollama
I installed ollama on my computer, and I want to know more about it.
References
Related
- Large Language Model
- A large language model is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
- The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large scale text data. Modern models can be fine-tuned for specific tasks, or be guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherit in human language corpora, but they inherit inaccuracies and biases present in the data on which they are trained.
- GGUF Format
- GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML.
- It is a successor file format to GGML, GGMF, and GGJT, and is designed to be unambiguous by containing all the information needed to load a model.
Notes
Ollama is an open-source tool that allows users to run large language models (LLMs) on their own computers. LLMs are AI programs that can generate and understand human-like text, code, and perform other analytical tasks. Ollama makes it possible to run these models locally, which can be beneficial for a number of reasons.
Getting Started:
- Download ollama on macOS, Linux, or Windows
$ ollama run llama3.2
Customize a Model
Ollama supports GGUF models in the Modelfile
:
- Create a file named
ModelFile
, with aFROM
instruction with the local file path to the model you want to import.
$ FROM ./model_file.Q4_0.gguf
- Create the model in Ollama
$ ollama create example_model -f Modelfile
- Run the model
$ ollama run example_model
You can import models from PyTorch or Safetensors.
Customize a Prompt
Models from the Ollama library can be customized with a prompt.
$ ollama pull llama3.2
Create a Modelfile
FROM llama3.2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
Create and run the model.
$ ollama create mario -f ./Modelfile
$ ollama run mario
>>> hi
Hello! It's your friend Mario.
Examples of Customizing models.
CLI Reference
$ ollama create mymodel -f ./Modelfile # ollama create is used to create a model from a Modelfile
$ ollama pull llama3.2 # Pull a model / update a local model
$ ollama rm llama3.2 # Remove a model
$ ollama cp llama3.2 my-model # copy a model
$ # For multiline input, you can wrap text with """
>>> """Hello,
World!
"""
I'm a basic program that prints the famous "Hello, world!" message to the console.
Multimodal Models
$ # Multimodal models
$ oolama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
The image features a yellow smiley face, which is likely the central focus of the picture.
# Pass the prompt as an argument
$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
$ ollama show llama3.2 # Show model information
$ ollama list # List models on computer
$ ollama ps # Showw which models are currently loaded
$ ollama stop llama3.2 # Stop a model which is currently running
ollama serve
is used when you want to start ollama without running the desktop application.
Rest API
Ollama has a REST API for running and managing models.
Generate a Response
$ curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'
Chat with Model
$ curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Models Available
- ollama supports a list of models available on https://ollama.com/library
- You should have at least 8GB of RAM to run the 7B models, 16 GB to run the 13B models, and 32GB to run the 33B models.
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.2 | 3B | 2.0GB |
|
Llama 3.2 | 1B | 1.3GB |
|
Llama 3.2 Vision | 11B | 7.9GB |
|
Llama 3.2 Vision | 90B | 55GB |
|
Llama 3.1 | 8B | 4.7GB |
|
Llama 3.1 | 70B | 40GB |
|
Llama 3.1 | 405B | 231GB |
|
Phi 3 Mini | 3.8B | 2.3GB |
|
Phi 3 Medium | 14B | 7.9GB |
|
Gemma 2 | 2B | 1.6GB |
|
Gemma 2 | 9B | 5.5GB |
|
Gemma 2 | 27B | 16GB |
|
Mistral | 7B | 4.1GB |
|
Moondream 2 | 1.4B | 829MB |
|
Neural Chat | 7B | 4.1GB |
|
Starling | 7B | 4.1GB |
|
Code Llama | 7B | 3.8GB |
|
Llama 2 Uncensored | 7B | 3.8GB |
|
LLaVA | 7B | 4.5GB |
|
Solar | 10.7B | 6.1GB |
|
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.