Short-Term Memory Provider¶

Overview¶

The ShortTermMemoryProvider is responsible for providing the immediate conversational context. It fetches the most recent messages from a Discord channel and converts them into a format that multimodal LLMs can understand.

Core Logic¶

1. History Retrieval¶

Fetches the last N messages (default 10) from the current channel.
Orders them from oldest to newest to maintain conversational flow.

2. Message Conversion¶

Each Discord message is mapped to a LangChain HumanMessage or AIMessage.

3. Metadata Enrichment¶

To help the LLM understand the context better, the provider injects metadata into each message: - Speaker ID: [AuthorName | UserID:123 | MessageID:456] - Timestamps: Both Unix and human-readable UTC time. - Reactions: Lists any emojis reacted to the message. - Replies: If a message is a reply, it includes a summary of the referenced message (e.g., Replying to @Author: 'Hello...').

4. Multimodal Support¶

The provider identifies and includes various attachment types: - Images: Injected as image_url objects for vision-capable models (Gemini, GPT-4). - Videos/PDFs/Audio: Injected as descriptive text placeholders (e.g., [Video Attachment: filename.mp4]).

Multi-Agent Differentiation¶

The provider uses explicit speaker identification to help the LLM distinguish between different users and the bot itself: - Human Messages: Include a name parameter formatted as AuthorName_UserID. - AI Messages: Identified as AIMessage.

Markers¶

Messages are wrapped in custom markers for easy parsing: - <som>: Start of Message content. - <eom>: End of Message content.

Short-term memory provides the "now" of the conversation, ensuring the bot can follow threads, respond to replies, and "see" uploaded images.