Indexing Engine

SymbolicAI supports multiple indexing engines for vector search and RAG (Retrieval-Augmented Generation) operations. This document covers both the default naive vector engine and the production-ready Qdrant engine.

Naive Vector Engine (Default)

By default, text indexing and retrieval is performed with the local naive vector engine using the Interface abstraction:

from symai.interfaces import Interface

db = Interface('naive_vectordb', index_name="my_index")
db("Hello world", operation="add")
result = db("Hello", operation="search", top_k=1)
print(result.value)  # most relevant match

You can also add or search multiple documents at once, and perform save/load/purge operations:

docs = ["Alpha document", "Beta entry", "Gamma text"]
db = Interface('naive_vectordb', index_name="my_index")
db(docs, operation="add")
db("save", operation="config")
# Load or purge as needed

Qdrant RAG Engine

The Qdrant engine provides a production-ready vector database for scalable RAG applications. It supports both local and cloud deployments, advanced document chunking, and comprehensive collection management.

Setup

Option 1: Local Qdrant Server (via symserver)

Start Qdrant using the symserver CLI (Docker by default).

Option 2: Cloud Qdrant

Configure your cloud Qdrant instance:

Basic Usage

The Qdrant engine is used directly via the QdrantIndexEngine class:

Local Search with citations

If you need citation-formatted results compatible with parallel.search, use the local_search interface. It embeds the query locally, queries Qdrant, and returns a SearchResult (with value and citations) instead of raw ScoredPoint objects:

Local search accepts the same args as passed to Qdrant directly: collection_name/index_name, limit/top_k/index_top_k, score_threshold, query_filter (dict or Qdrant Filter), and any extra Qdrant search kwargs. Citation fields are derived from Qdrant payloads: the excerpt uses payload["text"] (or content), the URL is resolved from payload["source"]/url/file_path/path and is always returned as an absolute file:// URI (relative inputs resolve against the current working directory), and the title is the stem of that path (PDF pages append #p{page} when provided). Each matching chunk yields its own citation; multiple citations can point to the same file.

If you want a stable source header for each chunk, store a source_id or chunk_id in the payload (otherwise the Qdrant point id is used).

Example:

Collection Management

Create and manage collections programmatically:

Document Chunking and RAG

The Qdrant engine includes built-in document chunking for RAG workflows:

Point Operations

For fine-grained control over individual vectors:

Configuration Options

The Qdrant engine supports extensive configuration:

Environment Variables

Configure Qdrant via environment variables:

Embedding Model & API Key Behavior

  • If EMBEDDING_ENGINE_API_KEY is empty ("", the default), SymbolicAI will use a local, lightweight embedding engine based on SentenceTransformers. You can specify any supported model name via EMBEDDING_ENGINE_MODEL (e.g. "all-mpnet-base-v2").

  • If you DO provide an EMBEDDING_ENGINE_API_KEY, then the respective remote embedding engine will be used (e.g. OpenAI). The model is selected according to the EMBEDDING_ENGINE_MODEL key where applicable.

This allows you to easily experiment locally for free, and switch to more powerful cloud backends when ready.

Installation

Install Qdrant support using the package extra (recommended):

This installs all required dependencies:

  • qdrant-client - Qdrant Python client

  • chonkie[all] - Document chunking library

  • tokenizers - Tokenization support

Alternatively, install dependencies individually:

See Also

  • See tests/engines/index/test_qdrant_engine.py for comprehensive usage examples

  • Qdrant documentation: https://qdrant.tech/documentation/

Last updated