AI Models

Select the vision-language model for image description generation

Florence-2-base

A popular series of small vision language models built by Microsoft, including a 250 Million (base) and a 700 Million (large) parameter variant.

Learn more

Florence-2-large

The 700 Million parameter vision language model variant of the Florence-2 series.

Learn more

SmolVLM-256M-Instruct

A 256 Million parameter vision language model built by Hugging Face.

Learn more

SmolVLM-500M-Instruct

A 500 Million parameter vision language model built by Hugging Face.

Learn more

moondream2

A 2 Billion parameter vision language model used for image captioning / extracting image text.

Learn more

moondream2-int8

INT8 quantized version of Moondream2 (2B params) for memory-constrained hardware. Reduces memory from ~5GB to ~1.5-2GB with minimal quality loss. Ideal for CPU-only machines.

Learn more

Ollama-Gemma3

Active

External Ollama instance

Learn more