Expressions
⚠️ Outdated or Deprecated Documentation ⚠️
This documentation is outdated and may not reflect the current state of the SymbolicAI library. This page might be revived or deleted entirely as we continue our development. We recommend using more modern tools that infer the documentation from the code itself, such as DeepWiki. This will ensure you have the most accurate and up-to-date information and give you a better picture of the current state of the library.
Overview
An Expression
is a non-terminal symbol that can be further evaluated. It inherits all the properties from the Symbol class and overrides the __call__
method to evaluate its expressions or values. All other expressions are derived from the Expression
class, which also adds additional capabilities, such as the ability to fetch
data from URLs, search
on the internet, or open
files. These operations are specifically separated from the Symbol
class as they do not use the value
attribute of the Symbol class.
Expression Design
SymbolicAI's API closely follows best practices and ideas from PyTorch
, allowing the creation of complex expressions by combining multiple expressions as a computational graph. Each Expression has its own forward
method that needs to be overridden. The forward
method is used to define the behavior of the expression. It is called by the __call__
method, which is inherited from the Expression
base class. The __call__
method evaluates an expression and returns the result from the implemented forward
method. This design pattern evaluates expressions in a lazy manner, meaning the expression is only evaluated when its result is needed. It is an essential feature that allows us to chain complex expressions together. Numerous helpful expressions can be imported from the symai.components
file.
Core Properties
Other important properties inherited from the Symbol class include sym_return_type
and static_context
. These two properties define the context in which the current Expression operates, as described in the Prompt Design section. The static_context
influences all operations of the current Expression sub-class. The sym_return_type
ensures that after evaluating an Expression, we obtain the desired return object type. It is usually implemented to return the current type but can be set to return a different type.
Expression Structure
Expressions may have more complex structures and can be further sub-classed, as shown in the Sequence
expression example in the following figure:
A Sequence expression can hold multiple expressions evaluated at runtime.
Expression Types
Sequence Expressions
Here is an example of defining a Sequence expression:
# First import all expressions
from symai.components import *
# Define a sequence of expressions
Sequence(
Clean(),
Translate(),
Outline(),
Compose('Compose news:'),
)
Stream expressions
As previously mentioned, we can create contextualized prompts to define the behavior of operations on our neural engine. However, this limits the available context size due to GPT-3 Davinci's context length constraint of 4097 tokens. This issue can be addressed using the Stream
processing expression, which opens a data stream and performs chunk-based operations on the input stream.
A Stream expression can be wrapped around other expressions. For example, the chunks can be processed with a Sequence
expression that allows multiple chained operations in a sequential manner. Here is an example of defining a Stream expression:
Stream(Sequence(
Clean(),
Translate(),
Outline(),
Embed()
))
The example above opens a stream, passes a Sequence
object which cleans, translates, outlines, and embeds the input. Internally, the stream operation estimates the available model context size and breaks the long input text into smaller chunks, which are passed to the inner expression. The returned object type is a generator
.
This approach has the drawback of processing chunks independently, meaning there is no shared context or information among chunks. To address this issue, the Cluster
expression can be used, where the independent chunks are merged based on their similarity, as illustrated in the following figure:
In the illustrated example, all individual chunks are merged by clustering the information within each chunk. It consolidates contextually related information, merging them meaningfully. The clustered information can then be labeled by streaming through the content of each cluster and extracting the most relevant labels, providing interpretable node summaries.
The full example is shown below:
stream = Stream(Sequence(
Clean(),
Translate(),
Outline(),
))
sym = Symbol('<some long text>')
res = Symbol(list(stream(sym)))
expr = Cluster()
expr(res)
Next, we could recursively repeat this process on each summary node, building a hierarchical clustering structure. Since each Node resembles a summarized subset of the original information, we can use the summary as an index. The resulting tree can then be used to navigate and retrieve the original information, transforming the large data stream problem into a search problem.
Alternatively, vector-based similarity search can be used to find similar nodes. Libraries such as Annoy, Faiss, or Milvus can be employed for searching in a vector space.
Last updated