Memory System - Segmentation Service¶
File: cogs/memory/segmentation_service.py
The TextSegmentationService is a crucial component that groups a continuous stream of messages into meaningful "conversation segments." Instead of treating each message as an independent memory, the system operates on these segments, which provides better context for semantic search.
TextSegmentationService Class¶
__init__(self, db_manager, embedding_service, ...)¶
Initializes the service with its dependencies, including the database manager (to get message history) and the embedding service (for semantic comparisons).
async process_new_message(self, ...)¶
This is the main entry point for the service. It is called by the MemoryManager for each new message.
- Process:
- It checks if a new segment should be created by calling
_should_create_new_segment. - If
True, it finalizes the currently active segment for the channel (saving it to the database) and then starts a new one. - If
False, it simply adds the new message to the currently active segment.
- It checks if a new segment should be created by calling
- Returns: The completed
ConversationSegmentobject if a segment was just finalized, otherwiseNone.
Segmentation Strategies¶
The decision to create a new segment is determined by the _should_create_new_segment method, which uses a strategy defined in the system configuration (SegmentationStrategy).
_check_time_threshold(...) (Time-Only Strategy)¶
This strategy creates a new segment if the time gap between the new message and the last message in the current segment exceeds a dynamic threshold. The threshold is calculated by _calculate_dynamic_interval, which creates shorter intervals for highly active channels and longer intervals for inactive ones.
_check_semantic_threshold(...) (Semantic-Only Strategy)¶
This strategy creates a new segment if the semantic meaning of the new message is significantly different from the content of the current segment.
- It gets a representative text for the current segment (e.g., a summary or concatenation of messages).
- It uses the
EmbeddingServiceto calculate the cosine similarity between the new message and the segment's representative text. - If the similarity score is below the
similarity_thresholdfrom the configuration, a new segment is created.
_check_hybrid_threshold(...) (Hybrid Strategy - Default)¶
This is the default and most robust strategy. It creates a new segment if any of the following conditions are met:
* The time threshold is exceeded.
* The semantic threshold is not met (i.e., the topic has changed).
* The current segment has reached its maximum configured message count (max_messages_per_segment).
_finalize_current_segment(...)¶
When a segment is completed, this method is called to perform final processing before it's saved. This includes: * Calculating the segment's overall vector representation (e.g., by averaging the embeddings of its messages). * Calculating a semantic coherence score for the segment. * Optionally, generating an AI summary of the segment. * Saving the completed segment and its vector to the database and vector index, respectively.