Memory System - Embedding Service¶
File: cogs/memory/embedding_service.py
The EmbeddingService is a critical component responsible for converting natural language text into high-dimensional numerical vectors, known as embeddings. These embeddings capture the semantic meaning of the text, enabling the system to perform similarity searches.
EmbeddingService Class¶
__init__(self, profile: MemoryProfile)¶
Initializes the embedding service based on a given memory profile.
- Parameters:
profile(MemoryProfile): A data class containing configuration for the memory system, including theembedding_modelname,embedding_dimension, and hardware preferences (cpu_only).
- Process:
- Device Detection: It calls
_detect_device()to automatically select the best available hardware (cuda,mps, orcpu). - Model Identification: It determines the type of model (e.g.,
Qwen3-Embeddingor a standardSentenceTransformer) to use the correct loading and encoding logic. - Fallback Setup: It defines a fallback model to use if the primary model fails to load.
- Device Detection: It calls
Key Methods¶
get_model(self)¶
A lazy-loading method that retrieves the embedding model. If the model hasn't been loaded yet, it calls _load_model to initialize it. This ensures that the model is only loaded into memory when it's first needed.
_load_model(self)¶
Handles the complex logic of loading the model from the Hugging Face Hub.
* It first attempts to load the primary model specified in the configuration.
* It includes specific logic for different model types, such as _load_qwen3_model for Qwen models.
* If the primary model fails to load, it automatically attempts to load a reliable fallback model (paraphrase-multilingual-MiniLM-L12-v2).
* After loading, it performs a test encoding to verify that the model's actual output dimension matches the configured dimension, adjusting the configuration if necessary.
encode_batch(self, texts: List[str], ...)¶
The main method for converting a list of texts into a batch of embeddings.
- Parameters:
texts(List[str]): A list of text strings to be encoded.
- Process:
- It retrieves the model using
get_model(). - It preprocesses the texts (e.g., cleaning whitespace).
- It calls the appropriate encoding method based on the model type (
_encode_qwen3_batchor_encode_sentence_transformers_batch). - The model processes the texts in batches to optimize performance and manage memory usage.
- It returns the embeddings as a NumPy array.
- It retrieves the model using
- Returns: A NumPy array of shape
(num_texts, embedding_dimension).
EmbeddingServiceManager Class¶
This is a singleton manager that ensures only one instance of each embedding model is loaded into memory, even if multiple components request it. It maintains a cache of EmbeddingService instances keyed by their model name.