Reflection Pattern¶
source : https://github.com/neural-maze/agentic_patterns
The first pattern we are going to implement is the reflection pattern.
This pattern allows the LLM to reflect and critique its outputs, following the next steps:
- The LLM generates a candidate output. If you look at the diagram above, it happens inside the "Generate" box.
- The LLM reflects on the previous output, suggesting modifications, deletions, improvements to the writing style, etc.
- The LLM modifies the original output based on the reflections and another iteration begins ...
Now, we are going to build, from scratch, each step, so that you can truly understand how this pattern works.
Generation Step¶
The first thing we need to consider is:
What do we want to generate? A poem? An essay? Python code?
For this example, I've decided to test the Python coding skills of Llama3 70B (that's the LLM we are going to use for all the tutorials). In particular, we are going to ask our LLM to code a famous sorting algorithm: Merge Sort.
Groq Client and relevant imports¶
import os
from pprint import pprint
from groq import Groq
from dotenv import load_dotenv
from IPython.display import display_markdown
# Remember to load the environment variables. You should have the Groq API Key in there :)
load_dotenv()
client = Groq()
We will start the "generation" chat history with the system prompt, as we said before. In this case, let the LLM act like a Python programmer eager to receive feedback / critique by the user.
generation_chat_history = [
{
"role": "system",
"content": "You are a Python programmer tasked with generating high quality Python code."
"Your task is to Generate the best content possible for the user's request. If the user provides critique,"
"respond with a revised version of your previous attempt."
}
]
Now, as the user, we are going to ask the LLM to generate an implementation of the Merge Sort algorithm. Just add a new message with the user role to the chat history.
generation_chat_history.append(
{
"role": "user",
"content": "Generate a Python implementation of the Merge Sort algorithm"
}
)
Let's generate the first version of the essay.
mergesort_code = client.chat.completions.create(
messages=generation_chat_history,
model="llama3-70b-8192"
).choices[0].message.content
generation_chat_history.append(
{
"role": "assistant",
"content": mergesort_code
}
)
display_markdown(mergesort_code, raw=True)
Reflection Step¶
Now, let's allow the LLM to reflect on its outputs by defining another system prompt. This system prompt will tell the LLM to act as Andrej Karpathy, computer scientist and Deep Learning wizard.
To be honest, I don't think the fact of acting like Andrej Karpathy will influence the LLM outputs, but it was fun :)
reflection_chat_history = [
{
"role": "system",
"content": "You are Andrej Karpathy, an experienced computer scientist. You are tasked with generating critique and recommendations for the user's code",
}
]
The user message, in this case, is the essay generated in the previous step. We simply add the mergesort_code
to the reflection_chat_history
.
reflection_chat_history.append(
{
"role": "user",
"content": mergesort_code
}
)
Now, let's generate a critique to the Python code.
critique = client.chat.completions.create(
messages=reflection_chat_history,
model="llama3-70b-8192"
).choices[0].message.content
display_markdown(critique, raw=True)
Finally, we just need to add this critique to the generation_chat_history
, in this case, as the user
role.
generation_chat_history.append(
{
"role": "user",
"content": critique
}
)
Generation Step (II)¶
essay = client.chat.completions.create(
messages=generation_chat_history,
model="llama3-70b-8192"
).choices[0].message.content
display_markdown(essay, raw=True)
And the iteration starts again ...¶
After Generation Step (II) the corrected Python code will be received, once again, by Karpathy. Then, the LLM will reflect on the corrected output, suggesting further improvements and the loop will go, over and over for a number n of total iterations.
There's another possibility. Suppose the Reflection step can't find any further improvement. In this case, we can tell the LLM to output some stop string, like "OK" or "Good" that means the process can be stopped. However, we are going to follow the first approach, that is, iterating for a fixed number of times.
Implementing a class¶
Now that you understand the underlying loop of the Reflection Agent, let's implement this agent as a class.
utils¶
import re
from dataclasses import dataclass
import time
from colorama import Fore
from colorama import Style
def completions_create(client, messages: list, model: str) -> str:
"""
Sends a request to the client's `completions.create` method to interact with the language model.
Args:
client (Groq): The Groq client object
messages (list[dict]): A list of message objects containing chat history for the model.
model (str): The model to use for generating tool calls and responses.
Returns:
str: The content of the model's response.
"""
response = client.chat.completions.create(messages=messages, model=model)
return str(response.choices[0].message.content)
def build_prompt_structure(prompt: str, role: str, tag: str = "") -> dict:
"""
Builds a structured prompt that includes the role and content.
Args:
prompt (str): The actual content of the prompt.
role (str): The role of the speaker (e.g., user, assistant).
Returns:
dict: A dictionary representing the structured prompt.
"""
if tag:
prompt = f"<{tag}>{prompt}</{tag}>"
return {"role": role, "content": prompt}
def update_chat_history(history: list, msg: str, role: str):
"""
Updates the chat history by appending the latest response.
Args:
history (list): The list representing the current chat history.
msg (str): The message to append.
role (str): The role type (e.g. 'user', 'assistant', 'system')
"""
history.append(build_prompt_structure(prompt=msg, role=role))
class ChatHistory(list):
def __init__(self, messages: list | None = None, total_length: int = -1):
"""Initialise the queue with a fixed total length.
Args:
messages (list | None): A list of initial messages
total_length (int): The maximum number of messages the chat history can hold.
"""
if messages is None:
messages = []
super().__init__(messages)
self.total_length = total_length
def append(self, msg: str):
"""Add a message to the queue.
Args:
msg (str): The message to be added to the queue
"""
if len(self) == self.total_length:
self.pop(0)
super().append(msg)
class FixedFirstChatHistory(ChatHistory):
def __init__(self, messages: list | None = None, total_length: int = -1):
"""Initialise the queue with a fixed total length.
Args:
messages (list | None): A list of initial messages
total_length (int): The maximum number of messages the chat history can hold.
"""
super().__init__(messages, total_length)
def append(self, msg: str):
"""Add a message to the queue. The first messaage will always stay fixed.
Args:
msg (str): The message to be added to the queue
"""
if len(self) == self.total_length:
self.pop(1)
super().append(msg)
@dataclass
class TagContentResult:
"""
A data class to represent the result of extracting tag content.
Attributes:
content (List[str]): A list of strings containing the content found between the specified tags.
found (bool): A flag indicating whether any content was found for the given tag.
"""
content: list[str]
found: bool
def extract_tag_content(text: str, tag: str) -> TagContentResult:
"""
Extracts all content enclosed by specified tags (e.g., <thought>, <response>, etc.).
Parameters:
text (str): The input string containing multiple potential tags.
tag (str): The name of the tag to search for (e.g., 'thought', 'response').
Returns:
dict: A dictionary with the following keys:
- 'content' (list): A list of strings containing the content found between the specified tags.
- 'found' (bool): A flag indicating whether any content was found for the given tag.
"""
# Build the regex pattern dynamically to find multiple occurrences of the tag
tag_pattern = rf"<{tag}>(.*?)</{tag}>"
# Use findall to capture all content between the specified tag
matched_contents = re.findall(tag_pattern, text, re.DOTALL)
# Return the dataclass instance with the result
return TagContentResult(
content=[content.strip() for content in matched_contents],
found=bool(matched_contents),
)
def fancy_print(message: str) -> None:
"""
Displays a fancy print message.
Args:
message (str): The message to display.
"""
print(Style.BRIGHT + Fore.CYAN + f"\n{'=' * 50}")
print(Fore.MAGENTA + f"{message}")
print(Style.BRIGHT + Fore.CYAN + f"{'=' * 50}\n")
time.sleep(0.5)
def fancy_step_tracker(step: int, total_steps: int) -> None:
"""
Displays a fancy step tracker for each iteration of the generation-reflection loop.
Args:
step (int): The current step in the loop.
total_steps (int): The total number of steps in the loop.
"""
fancy_print(f"STEP {step + 1}/{total_steps}")
from colorama import Fore
from dotenv import load_dotenv
from groq import Groq
load_dotenv()
BASE_GENERATION_SYSTEM_PROMPT = """
Your task is to Generate the best content possible for the user's request.
If the user provides critique, respond with a revised version of your previous attempt.
You must always output the revised content.
"""
BASE_REFLECTION_SYSTEM_PROMPT = """
You are tasked with generating critique and recommendations to the user's generated content.
If the user content has something wrong or something to be improved, output a list of recommendations
and critiques. If the user content is ok and there's nothing to change, output this: <OK>
"""
class ReflectionAgent:
"""
A class that implements a Reflection Agent, which generates responses and reflects
on them using the LLM to iteratively improve the interaction. The agent first generates
responses based on provided prompts and then critiques them in a reflection step.
Attributes:
model (str): The model name used for generating and reflecting on responses.
client (Groq): An instance of the Groq client to interact with the language model.
"""
def __init__(self, model: str = "llama-3.1-70b-versatile"):
self.client = Groq()
self.model = model
def _request_completion(
self,
history: list,
verbose: int = 0,
log_title: str = "COMPLETION",
log_color: str = "",
):
"""
A private method to request a completion from the Groq model.
Args:
history (list): A list of messages forming the conversation or reflection history.
verbose (int, optional): The verbosity level. Defaults to 0 (no output).
Returns:
str: The model-generated response.
"""
output = completions_create(self.client, history, self.model)
if verbose > 0:
print(log_color, f"\n\n{log_title}\n\n", output)
return output
def generate(self, generation_history: list, verbose: int = 0) -> str:
"""
Generates a response based on the provided generation history using the model.
Args:
generation_history (list): A list of messages forming the conversation or generation history.
verbose (int, optional): The verbosity level, controlling printed output. Defaults to 0.
Returns:
str: The generated response.
"""
return self._request_completion(
generation_history, verbose, log_title="GENERATION", log_color=Fore.BLUE
)
def reflect(self, reflection_history: list, verbose: int = 0) -> str:
"""
Reflects on the generation history by generating a critique or feedback.
Args:
reflection_history (list): A list of messages forming the reflection history, typically based on
the previous generation or interaction.
verbose (int, optional): The verbosity level, controlling printed output. Defaults to 0.
Returns:
str: The critique or reflection response from the model.
"""
return self._request_completion(
reflection_history, verbose, log_title="REFLECTION", log_color=Fore.GREEN
)
def run(
self,
user_msg: str,
generation_system_prompt: str = "",
reflection_system_prompt: str = "",
n_steps: int = 10,
verbose: int = 0,
) -> str:
"""
Runs the ReflectionAgent over multiple steps, alternating between generating a response
and reflecting on it for the specified number of steps.
Args:
user_msg (str): The user message or query that initiates the interaction.
generation_system_prompt (str, optional): The system prompt for guiding the generation process.
reflection_system_prompt (str, optional): The system prompt for guiding the reflection process.
n_steps (int, optional): The number of generate-reflect cycles to perform. Defaults to 3.
verbose (int, optional): The verbosity level controlling printed output. Defaults to 0.
Returns:
str: The final generated response after all cycles are completed.
"""
generation_system_prompt += BASE_GENERATION_SYSTEM_PROMPT
reflection_system_prompt += BASE_REFLECTION_SYSTEM_PROMPT
# Given the iterative nature of the Reflection Pattern, we might exhaust the LLM context (or
# make it really slow). That's the reason I'm limitting the chat history to three messages.
# The `FixedFirstChatHistory` is a very simple class, that creates a Queue that always keeps
# fixeed the first message. I thought this would be useful for maintaining the system prompt
# in the chat history.
generation_history = FixedFirstChatHistory(
[
build_prompt_structure(prompt=generation_system_prompt, role="system"),
build_prompt_structure(prompt=user_msg, role="user"),
],
total_length=3,
)
reflection_history = FixedFirstChatHistory(
[build_prompt_structure(prompt=reflection_system_prompt, role="system")],
total_length=3,
)
for step in range(n_steps):
if verbose > 0:
fancy_step_tracker(step, n_steps)
# Generate the response
generation = self.generate(generation_history, verbose=verbose)
update_chat_history(generation_history, generation, "assistant")
update_chat_history(reflection_history, generation, "user")
# Reflect and critique the generation
critique = self.reflect(reflection_history, verbose=verbose)
if "<OK>" in critique:
# If no additional suggestions are made, stop the loop
print(
Fore.RED,
"\n\nStop Sequence found. Stopping the reflection loop ... \n\n",
)
break
update_chat_history(generation_history, critique, "user")
update_chat_history(reflection_history, critique, "assistant")
return generation
agent = ReflectionAgent()
generation_system_prompt = "You are a Python programmer tasked with generating high quality Python code"
reflection_system_prompt = "You are Andrej Karpathy, an experienced computer scientist"
user_msg = "Generate a Python implementation of the Merge Sort algorithm"
final_response = agent.run(
user_msg=user_msg,
generation_system_prompt=generation_system_prompt,
reflection_system_prompt=reflection_system_prompt,
n_steps=10,
verbose=1,
)
Final result¶
display_markdown(final_response, raw=True)