Memory System - Embedding Service¶

File: cogs/memory/embedding_service.py

The EmbeddingService is a critical component responsible for converting natural language text into high-dimensional numerical vectors, known as embeddings. These embeddings capture the semantic meaning of the text, enabling the system to perform similarity searches.

`EmbeddingService` Class¶

`init(self, profile: MemoryProfile)`¶

Initializes the embedding service based on a given memory profile.

Parameters:
- profile (MemoryProfile): A data class containing configuration for the memory system, including the embedding_model name, embedding_dimension, and hardware preferences (cpu_only).
Process:
1. Device Detection: It calls _detect_device() to automatically select the best available hardware (cuda, mps, or cpu).
2. Model Identification: It determines the type of model (e.g., Qwen3-Embedding or a standard SentenceTransformer) to use the correct loading and encoding logic.
3. Fallback Setup: It defines a fallback model to use if the primary model fails to load.

Key Methods¶

`get_model(self)`¶

A lazy-loading method that retrieves the embedding model. If the model hasn't been loaded yet, it calls _load_model to initialize it. This ensures that the model is only loaded into memory when it's first needed.

`_load_model(self)`¶

Handles the complex logic of loading the model from the Hugging Face Hub. * It first attempts to load the primary model specified in the configuration. * It includes specific logic for different model types, such as _load_qwen3_model for Qwen models. * If the primary model fails to load, it automatically attempts to load a reliable fallback model (paraphrase-multilingual-MiniLM-L12-v2). * After loading, it performs a test encoding to verify that the model's actual output dimension matches the configured dimension, adjusting the configuration if necessary.

`encode_batch(self, texts: List[str], ...)`¶

The main method for converting a list of texts into a batch of embeddings.

Parameters:
- texts (List[str]): A list of text strings to be encoded.
Process:
1. It retrieves the model using get_model().
2. It preprocesses the texts (e.g., cleaning whitespace).
3. It calls the appropriate encoding method based on the model type (_encode_qwen3_batch or _encode_sentence_transformers_batch).
4. The model processes the texts in batches to optimize performance and manage memory usage.
5. It returns the embeddings as a NumPy array.
Returns: A NumPy array of shape (num_texts, embedding_dimension).

`EmbeddingServiceManager` Class¶

This is a singleton manager that ensures only one instance of each embedding model is loaded into memory, even if multiple components request it. It maintains a cache of EmbeddingService instances keyed by their model name.

Memory System - Embedding Service¶

EmbeddingService Class¶

__init__(self, profile: MemoryProfile)¶

Key Methods¶

get_model(self)¶

_load_model(self)¶

encode_batch(self, texts: List[str], ...)¶