Local Engine

Local Neuro-Symbolic Engine

You can use a locally hosted instance for the Neuro-Symbolic Engine. We build on top of:

  • llama.cpp either through:

    ❗️NOTE❗️ Latest llama.cpp commit on master branch on November 5th, 2025 that we tested symai with is a5c07dcd7b49. We used the build setup.

  • huggingface/transformers through a custom FastAPI server.

llama.cpp backend

For instance, let's suppose you want to set up the Neuro-Symbolic Engine with the gpt-oss-120b model. Download the GGUF shards you need (e.g. the Q4_1 variant).

With symai, first set the NEUROSYMBOLIC_ENGINE_MODEL to llamacpp:

{
  "NEUROSYMBOLIC_ENGINE_API_KEY": "",
  "NEUROSYMBOLIC_ENGINE_MODEL": "llamacpp",
  ...
}

You can then run the server in two ways:

  1. Using Python bindings:

  1. Using C++ server directly:

To see all available options, run:

The Neuro-Symbolic Engine now supports tool execution and structured JSON responses out of the box. For concrete examples, review the tests in tests/engines/neurosymbolic/test_nesy_engine.py::test_tool_usage and tests/contract/test_contract.py.

HuggingFace backend

Let's suppose we want to use dolphin-2.9.3-mistral-7B-32k from HuggingFace. First, download the model with the HuggingFace CLI:

For the HuggingFace server, you have to set the NEUROSYMBOLIC_ENGINE_MODEL to huggingface:

Then, run symserver with the following options:

To see all the available options we support for HuggingFace, run:

Now you are set to use the local engine.

Local Embedding Engine

You can also use local embedding models through the llama.cpp backend. First, set the EMBEDDING_ENGINE_MODEL to llamacpp:

For instance, to use the Nomic embed text model, first download it:

Then start the server with embedding-specific parameters using either:

Python bindings:

C++ server:

The server supports batch processing for embeddings. Here's how to use it with symai:

Last updated