Local Engine

Local Neuro-Symbolic Engine

You can use a locally hosted instance for the Neuro-Symbolic Engine. We build on top of:

  • llama.cpp either through:

    ❗️NOTE❗️ Latest llama.cpp commit on master branch on November 5th, 2025 that we tested symai with is a5c07dcd7b49. We used the build setup.

  • huggingface/transformers through a custom FastAPI server.

llama.cpp backend

For instance, let's suppose you want to set up the Neuro-Symbolic Engine with the gpt-oss-120b model. Download the GGUF shards you need (e.g. the Q4_1 variant).

With symai, first set the NEUROSYMBOLIC_ENGINE_MODEL to llamacpp:

{
  "NEUROSYMBOLIC_ENGINE_API_KEY": "",
  "NEUROSYMBOLIC_ENGINE_MODEL": "llamacpp",
  ...
}

You can then run the server in two ways:

  1. Using Python bindings:

symserver --env python --model ./llama-pro-8b-instruct.Q4_K_M.gguf --n_gpu_layers -1 --chat_format llama-3 --port 8000 --host localhost
  1. Using C++ server directly:

symserver --env cpp --cpp-server-path /path/to/llama.cpp/llama-server -ngl -1 -m gpt-oss-120b/Q4_1/gpt-oss-120b-Q4_1-00001-of-00002.gguf -fa 'on' -b 8092 -ub 1024 --port 8000 --host localhost -c 0 -n 4096 -t 14 --jinja

To see all available options, run:

symserver --env python --help  # for Python bindings
symserver --env cpp --cpp-server-path /path/to/llama.cpp/llama-server --help  # for C++ server

The Neuro-Symbolic Engine now supports tool execution and structured JSON responses out of the box. For concrete examples, review the tests in tests/engines/neurosymbolic/test_nesy_engine.py::test_tool_usage and tests/contract/test_contract.py.

HuggingFace backend

Let's suppose we want to use dolphin-2.9.3-mistral-7B-32k from HuggingFace. First, download the model with the HuggingFace CLI:

huggingface-cli download cognitivecomputations/dolphin-2.9.3-mistral-7B-32k --local-dir ./dolphin-2.9.3-mistral-7B-32k

For the HuggingFace server, you have to set the NEUROSYMBOLIC_ENGINE_MODEL to huggingface:

{
  "NEUROSYMBOLIC_ENGINE_API_KEY": "",
  "NEUROSYMBOLIC_ENGINE_MODEL": "huggingface",
  ...
}

Then, run symserver with the following options:

symserver --model ./dolphin-2.9.3-mistral-7B-32k --attn_implementation flash_attention_2

To see all the available options we support for HuggingFace, run:

symserver --help

Now you are set to use the local engine.

# do some symbolic computation with the local engine
sym = Symbol('Kitties are cute!').compose()
print(sym)

# :Output:
# Kittens are known for their adorable nature and fluffy appearance, making them a favorite addition to many homes across the world. They possess a strong bond with
# their owners, providing companionship and comfort that can ease stress and anxiety. With their playful personalities, they are often seen as a symbol of happiness
# and joy, and their unique characteristics such as purring, kneading, and head butts bring warmth to our hearts. Cats also have a natural instinct to groom, which
# helps them maintain their clean and soft fur. Not only do they bring comfort and love to their owners, but they also have some practical benefits, such as reducing
# allergens, deterring pests, and even reducing stress in their surroundings. Overall, it is no surprise that pets have a long history of providing both emotional
# and physical comfort and happiness to their owners, making them a much-loved member of families around the world.

Local Embedding Engine

You can also use local embedding models through the llama.cpp backend. First, set the EMBEDDING_ENGINE_MODEL to llamacpp:

{
  "EMBEDDING_ENGINE_API_KEY": "",
  "EMBEDDING_ENGINE_MODEL": "llamacpp",
  ...
}

For instance, to use the Nomic embed text model, first download it:

huggingface-cli download nomic-ai/nomic-embed-text-v1.5-GGUF nomic-embed-text-v1.5.Q8_0.gguf --local-dir .

Then start the server with embedding-specific parameters using either:

Python bindings:

symserver --env python --model nomic-embed-text-v1.5.Q8_0.gguf --embedding True --n_ctx 2048 --rope_scaling_type 2 --rope_freq_scale 0.75 --n_batch 32 --port 8000 --host localhost

C++ server:

symserver --env cpp --cpp-server-path /path/to/llama.cpp/llama-server -ngl -1 -m nomic-embed-text-v1.5.Q8_0.gguf --embedding -b 8092 -ub 1024 --port 8000 --host localhost -t 14 --mlock --no-mmap

The server supports batch processing for embeddings. Here's how to use it with symai:

from symai import Symbol

# Single text embedding
some_text = "Hello, world!"
embedding = Symbol(some_text).embed()  # returns a list (1 x dim)

# Batch processing
some_batch_of_texts = ["Hello, world!"] * 32
embeddings = Symbol(some_batch_of_texts).embed()  # returns a list (32 x 1 x dim)

Last updated