Reflection Pattern¶

source : https://github.com/neural-maze/agentic_patterns

The first pattern we are going to implement is the reflection pattern.

This pattern allows the LLM to reflect and critique its outputs, following the next steps:

The LLM generates a candidate output. If you look at the diagram above, it happens inside the "Generate" box.
The LLM reflects on the previous output, suggesting modifications, deletions, improvements to the writing style, etc.
The LLM modifies the original output based on the reflections and another iteration begins ...

Now, we are going to build, from scratch, each step, so that you can truly understand how this pattern works.

Generation Step¶

The first thing we need to consider is:

What do we want to generate? A poem? An essay? Python code?

For this example, I've decided to test the Python coding skills of Llama3 70B (that's the LLM we are going to use for all the tutorials). In particular, we are going to ask our LLM to code a famous sorting algorithm: Merge Sort.

Groq Client and relevant imports¶

In [ ]:

  Copied!     
 
import os
from pprint import pprint
from groq import Groq
from dotenv import load_dotenv
from IPython.display import display_markdown

# Remember to load the environment variables. You should have the Groq API Key in there :)
load_dotenv()

client = Groq()
import os from pprint import pprint from groq import Groq from dotenv import load_dotenv from IPython.display import display_markdown # Remember to load the environment variables. You should have the Groq API Key in there :) load_dotenv() client = Groq()

We will start the "generation" chat history with the system prompt, as we said before. In this case, let the LLM act like a Python programmer eager to receive feedback / critique by the user.

In [ ]:

  Copied!     
 
generation_chat_history = [
    {
        "role": "system",
        "content": "You are a Python programmer tasked with generating high quality Python code."
        "Your task is to Generate the best content possible for the user's request. If the user provides critique," 
        "respond with a revised version of your previous attempt."
    }
]
generation_chat_history = [ { "role": "system", "content": "You are a Python programmer tasked with generating high quality Python code." "Your task is to Generate the best content possible for the user's request. If the user provides critique," "respond with a revised version of your previous attempt." } ]

Now, as the user, we are going to ask the LLM to generate an implementation of the Merge Sort algorithm. Just add a new message with the user role to the chat history.

In [ ]:

  Copied!     
 
generation_chat_history.append(
    {
        "role": "user",
        "content": "Generate a Python implementation of the Merge Sort algorithm"
    }
)
generation_chat_history.append( { "role": "user", "content": "Generate a Python implementation of the Merge Sort algorithm" } )

Let's generate the first version of the essay.

In [ ]:

  Copied!     
 
mergesort_code = client.chat.completions.create(
    messages=generation_chat_history,
    model="llama3-70b-8192"
).choices[0].message.content

generation_chat_history.append(
    {
        "role": "assistant",
        "content": mergesort_code
    }
)
mergesort_code = client.chat.completions.create( messages=generation_chat_history, model="llama3-70b-8192" ).choices[0].message.content generation_chat_history.append( { "role": "assistant", "content": mergesort_code } )

In [ ]:

  Copied!     
 
display_markdown(mergesort_code, raw=True)
display_markdown(mergesort_code, raw=True)

Reflection Step¶

Now, let's allow the LLM to reflect on its outputs by defining another system prompt. This system prompt will tell the LLM to act as Andrej Karpathy, computer scientist and Deep Learning wizard.

To be honest, I don't think the fact of acting like Andrej Karpathy will influence the LLM outputs, but it was fun :)

In [ ]:

  Copied!     
 
reflection_chat_history = [
    {
    "role": "system",
    "content": "You are Andrej Karpathy, an experienced computer scientist. You are tasked with generating critique and recommendations for the user's code",
    }
]
reflection_chat_history = [ { "role": "system", "content": "You are Andrej Karpathy, an experienced computer scientist. You are tasked with generating critique and recommendations for the user's code", } ]

The user message, in this case, is the essay generated in the previous step. We simply add the mergesort_code to the reflection_chat_history.

In [ ]:

  Copied!     
 
reflection_chat_history.append(
    {
        "role": "user",
        "content": mergesort_code
    }
)
reflection_chat_history.append( { "role": "user", "content": mergesort_code } )

Now, let's generate a critique to the Python code.

In [ ]:

  Copied!     
 
critique = client.chat.completions.create(
    messages=reflection_chat_history,
    model="llama3-70b-8192"
).choices[0].message.content
critique = client.chat.completions.create( messages=reflection_chat_history, model="llama3-70b-8192" ).choices[0].message.content

In [ ]:

  Copied!     
 
display_markdown(critique, raw=True)
display_markdown(critique, raw=True)

Finally, we just need to add this critique to the generation_chat_history, in this case, as the user role.

In [ ]:

  Copied!     
 
generation_chat_history.append(
    {
        "role": "user",
        "content": critique
    }
)
generation_chat_history.append( { "role": "user", "content": critique } )

Generation Step (II)¶

In [ ]:

  Copied!     
 
essay = client.chat.completions.create(
    messages=generation_chat_history,
    model="llama3-70b-8192"
).choices[0].message.content
essay = client.chat.completions.create( messages=generation_chat_history, model="llama3-70b-8192" ).choices[0].message.content

In [ ]:

  Copied!     
 
display_markdown(essay, raw=True)
display_markdown(essay, raw=True)

And the iteration starts again ...¶

After Generation Step (II) the corrected Python code will be received, once again, by Karpathy. Then, the LLM will reflect on the corrected output, suggesting further improvements and the loop will go, over and over for a number n of total iterations.

There's another possibility. Suppose the Reflection step can't find any further improvement. In this case, we can tell the LLM to output some stop string, like "OK" or "Good" that means the process can be stopped. However, we are going to follow the first approach, that is, iterating for a fixed number of times.

Implementing a class¶

Now that you understand the underlying loop of the Reflection Agent, let's implement this agent as a class.

utils¶

In [1]:

  Copied!     
 
import re
from dataclasses import dataclass
import time

from colorama import Fore
from colorama import Style

def completions_create(client, messages: list, model: str) -> str:
    """
    Sends a request to the client's `completions.create` method to interact with the language model.

    Args:
        client (Groq): The Groq client object
        messages (list[dict]): A list of message objects containing chat history for the model.
        model (str): The model to use for generating tool calls and responses.

    Returns:
        str: The content of the model's response.
    """
    response = client.chat.completions.create(messages=messages, model=model)
    return str(response.choices[0].message.content)


def build_prompt_structure(prompt: str, role: str, tag: str = "") -> dict:
    """
    Builds a structured prompt that includes the role and content.

    Args:
        prompt (str): The actual content of the prompt.
        role (str): The role of the speaker (e.g., user, assistant).

    Returns:
        dict: A dictionary representing the structured prompt.
    """
    if tag:
        prompt = f"<{tag}>{prompt}</{tag}>"
    return {"role": role, "content": prompt}


def update_chat_history(history: list, msg: str, role: str):
    """
    Updates the chat history by appending the latest response.

    Args:
        history (list): The list representing the current chat history.
        msg (str): The message to append.
        role (str): The role type (e.g. 'user', 'assistant', 'system')
    """
    history.append(build_prompt_structure(prompt=msg, role=role))


class ChatHistory(list):
    def __init__(self, messages: list | None = None, total_length: int = -1):
        """Initialise the queue with a fixed total length.

        Args:
            messages (list | None): A list of initial messages
            total_length (int): The maximum number of messages the chat history can hold.
        """
        if messages is None:
            messages = []

        super().__init__(messages)
        self.total_length = total_length

    def append(self, msg: str):
        """Add a message to the queue.

        Args:
            msg (str): The message to be added to the queue
        """
        if len(self) == self.total_length:
            self.pop(0)
        super().append(msg)


class FixedFirstChatHistory(ChatHistory):
    def __init__(self, messages: list | None = None, total_length: int = -1):
        """Initialise the queue with a fixed total length.

        Args:
            messages (list | None): A list of initial messages
            total_length (int): The maximum number of messages the chat history can hold.
        """
        super().__init__(messages, total_length)

    def append(self, msg: str):
        """Add a message to the queue. The first messaage will always stay fixed.

        Args:
            msg (str): The message to be added to the queue
        """
        if len(self) == self.total_length:
            self.pop(1)
        super().append(msg)




@dataclass
class TagContentResult:
    """
    A data class to represent the result of extracting tag content.

    Attributes:
        content (List[str]): A list of strings containing the content found between the specified tags.
        found (bool): A flag indicating whether any content was found for the given tag.
    """

    content: list[str]
    found: bool


def extract_tag_content(text: str, tag: str) -> TagContentResult:
    """
    Extracts all content enclosed by specified tags (e.g., <thought>, <response>, etc.).

    Parameters:
        text (str): The input string containing multiple potential tags.
        tag (str): The name of the tag to search for (e.g., 'thought', 'response').

    Returns:
        dict: A dictionary with the following keys:
            - 'content' (list): A list of strings containing the content found between the specified tags.
            - 'found' (bool): A flag indicating whether any content was found for the given tag.
    """
    # Build the regex pattern dynamically to find multiple occurrences of the tag
    tag_pattern = rf"<{tag}>(.*?)</{tag}>"

    # Use findall to capture all content between the specified tag
    matched_contents = re.findall(tag_pattern, text, re.DOTALL)

    # Return the dataclass instance with the result
    return TagContentResult(
        content=[content.strip() for content in matched_contents],
        found=bool(matched_contents),
    )


def fancy_print(message: str) -> None:
    """
    Displays a fancy print message.

    Args:
        message (str): The message to display.
    """
    print(Style.BRIGHT + Fore.CYAN + f"\n{'=' * 50}")
    print(Fore.MAGENTA + f"{message}")
    print(Style.BRIGHT + Fore.CYAN + f"{'=' * 50}\n")
    time.sleep(0.5)


def fancy_step_tracker(step: int, total_steps: int) -> None:
    """
    Displays a fancy step tracker for each iteration of the generation-reflection loop.

    Args:
        step (int): The current step in the loop.
        total_steps (int): The total number of steps in the loop.
    """
    fancy_print(f"STEP {step + 1}/{total_steps}")
import re from dataclasses import dataclass import time from colorama import Fore from colorama import Style def completions_create(client, messages: list, model: str) -> str: """ Sends a request to the client's `completions.create` method to interact with the language model. Args: client (Groq): The Groq client object messages (list[dict]): A list of message objects containing chat history for the model. model (str): The model to use for generating tool calls and responses. Returns: str: The content of the model's response. """ response = client.chat.completions.create(messages=messages, model=model) return str(response.choices[0].message.content) def build_prompt_structure(prompt: str, role: str, tag: str = "") -> dict: """ Builds a structured prompt that includes the role and content. Args: prompt (str): The actual content of the prompt. role (str): The role of the speaker (e.g., user, assistant). Returns: dict: A dictionary representing the structured prompt. """ if tag: prompt = f"<{tag}>{prompt}" return {"role": role, "content": prompt} def update_chat_history(history: list, msg: str, role: str): """ Updates the chat history by appending the latest response. Args: history (list): The list representing the current chat history. msg (str): The message to append. role (str): The role type (e.g. 'user', 'assistant', 'system') """ history.append(build_prompt_structure(prompt=msg, role=role)) class ChatHistory(list): def __init__(self, messages: list | None = None, total_length: int = -1): """Initialise the queue with a fixed total length. Args: messages (list | None): A list of initial messages total_length (int): The maximum number of messages the chat history can hold. """ if messages is None: messages = [] super().__init__(messages) self.total_length = total_length def append(self, msg: str): """Add a message to the queue. Args: msg (str): The message to be added to the queue """ if len(self) == self.total_length: self.pop(0) super().append(msg) class FixedFirstChatHistory(ChatHistory): def __init__(self, messages: list | None = None, total_length: int = -1): """Initialise the queue with a fixed total length. Args: messages (list | None): A list of initial messages total_length (int): The maximum number of messages the chat history can hold. """ super().__init__(messages, total_length) def append(self, msg: str): """Add a message to the queue. The first messaage will always stay fixed. Args: msg (str): The message to be added to the queue """ if len(self) == self.total_length: self.pop(1) super().append(msg) @dataclass class TagContentResult: """ A data class to represent the result of extracting tag content. Attributes: content (List[str]): A list of strings containing the content found between the specified tags. found (bool): A flag indicating whether any content was found for the given tag. """ content: list[str] found: bool def extract_tag_content(text: str, tag: str) -> TagContentResult: """ Extracts all content enclosed by specified tags (e.g., , , etc.). Parameters: text (str): The input string containing multiple potential tags. tag (str): The name of the tag to search for (e.g., 'thought', 'response'). Returns: dict: A dictionary with the following keys: - 'content' (list): A list of strings containing the content found between the specified tags. - 'found' (bool): A flag indicating whether any content was found for the given tag. """ # Build the regex pattern dynamically to find multiple occurrences of the tag tag_pattern = rf"<{tag}>(.*?)" # Use findall to capture all content between the specified tag matched_contents = re.findall(tag_pattern, text, re.DOTALL) # Return the dataclass instance with the result return TagContentResult( content=[content.strip() for content in matched_contents], found=bool(matched_contents), ) def fancy_print(message: str) -> None: """ Displays a fancy print message. Args: message (str): The message to display. """ print(Style.BRIGHT + Fore.CYAN + f"\n{'=' * 50}") print(Fore.MAGENTA + f"{message}") print(Style.BRIGHT + Fore.CYAN + f"{'=' * 50}\n") time.sleep(0.5) def fancy_step_tracker(step: int, total_steps: int) -> None: """ Displays a fancy step tracker for each iteration of the generation-reflection loop. Args: step (int): The current step in the loop. total_steps (int): The total number of steps in the loop. """ fancy_print(f"STEP {step + 1}/{total_steps}") 

In [ ]:

  Copied!     
 
from colorama import Fore
from dotenv import load_dotenv
from groq import Groq
load_dotenv()


BASE_GENERATION_SYSTEM_PROMPT = """
Your task is to Generate the best content possible for the user's request.
If the user provides critique, respond with a revised version of your previous attempt.
You must always output the revised content.
"""

BASE_REFLECTION_SYSTEM_PROMPT = """
You are tasked with generating critique and recommendations to the user's generated content.
If the user content has something wrong or something to be improved, output a list of recommendations
and critiques. If the user content is ok and there's nothing to change, output this: <OK>
"""


class ReflectionAgent:
    """
    A class that implements a Reflection Agent, which generates responses and reflects
    on them using the LLM to iteratively improve the interaction. The agent first generates
    responses based on provided prompts and then critiques them in a reflection step.

    Attributes:
        model (str): The model name used for generating and reflecting on responses.
        client (Groq): An instance of the Groq client to interact with the language model.
    """

    def __init__(self, model: str = "llama-3.1-70b-versatile"):
        self.client = Groq()
        self.model = model

    def _request_completion(
        self,
        history: list,
        verbose: int = 0,
        log_title: str = "COMPLETION",
        log_color: str = "",
    ):
        """
        A private method to request a completion from the Groq model.

        Args:
            history (list): A list of messages forming the conversation or reflection history.
            verbose (int, optional): The verbosity level. Defaults to 0 (no output).

        Returns:
            str: The model-generated response.
        """
        output = completions_create(self.client, history, self.model)

        if verbose > 0:
            print(log_color, f"\n\n{log_title}\n\n", output)

        return output

    def generate(self, generation_history: list, verbose: int = 0) -> str:
        """
        Generates a response based on the provided generation history using the model.

        Args:
            generation_history (list): A list of messages forming the conversation or generation history.
            verbose (int, optional): The verbosity level, controlling printed output. Defaults to 0.

        Returns:
            str: The generated response.
        """
        return self._request_completion(
            generation_history, verbose, log_title="GENERATION", log_color=Fore.BLUE
        )

    def reflect(self, reflection_history: list, verbose: int = 0) -> str:
        """
        Reflects on the generation history by generating a critique or feedback.

        Args:
            reflection_history (list): A list of messages forming the reflection history, typically based on
                                       the previous generation or interaction.
            verbose (int, optional): The verbosity level, controlling printed output. Defaults to 0.

        Returns:
            str: The critique or reflection response from the model.
        """
        return self._request_completion(
            reflection_history, verbose, log_title="REFLECTION", log_color=Fore.GREEN
        )

    def run(
        self,
        user_msg: str,
        generation_system_prompt: str = "",
        reflection_system_prompt: str = "",
        n_steps: int = 10,
        verbose: int = 0,
    ) -> str:
        """
        Runs the ReflectionAgent over multiple steps, alternating between generating a response
        and reflecting on it for the specified number of steps.

        Args:
            user_msg (str): The user message or query that initiates the interaction.
            generation_system_prompt (str, optional): The system prompt for guiding the generation process.
            reflection_system_prompt (str, optional): The system prompt for guiding the reflection process.
            n_steps (int, optional): The number of generate-reflect cycles to perform. Defaults to 3.
            verbose (int, optional): The verbosity level controlling printed output. Defaults to 0.

        Returns:
            str: The final generated response after all cycles are completed.
        """
        generation_system_prompt += BASE_GENERATION_SYSTEM_PROMPT
        reflection_system_prompt += BASE_REFLECTION_SYSTEM_PROMPT

        # Given the iterative nature of the Reflection Pattern, we might exhaust the LLM context (or
        # make it really slow). That's the reason I'm limitting the chat history to three messages.
        # The `FixedFirstChatHistory` is a very simple class, that creates a Queue that always keeps
        # fixeed the first message. I thought this would be useful for maintaining the system prompt
        # in the chat history.
        generation_history = FixedFirstChatHistory(
            [
                build_prompt_structure(prompt=generation_system_prompt, role="system"),
                build_prompt_structure(prompt=user_msg, role="user"),
            ],
            total_length=3,
        )

        reflection_history = FixedFirstChatHistory(
            [build_prompt_structure(prompt=reflection_system_prompt, role="system")],
            total_length=3,
        )

        for step in range(n_steps):
            if verbose > 0:
                fancy_step_tracker(step, n_steps)

            # Generate the response
            generation = self.generate(generation_history, verbose=verbose)
            update_chat_history(generation_history, generation, "assistant")
            update_chat_history(reflection_history, generation, "user")

            # Reflect and critique the generation
            critique = self.reflect(reflection_history, verbose=verbose)

            if "<OK>" in critique:
                # If no additional suggestions are made, stop the loop
                print(
                    Fore.RED,
                    "\n\nStop Sequence found. Stopping the reflection loop ... \n\n",
                )
                break

            update_chat_history(generation_history, critique, "user")
            update_chat_history(reflection_history, critique, "assistant")

        return generation
from colorama import Fore from dotenv import load_dotenv from groq import Groq load_dotenv() BASE_GENERATION_SYSTEM_PROMPT = """ Your task is to Generate the best content possible for the user's request. If the user provides critique, respond with a revised version of your previous attempt. You must always output the revised content. """ BASE_REFLECTION_SYSTEM_PROMPT = """ You are tasked with generating critique and recommendations to the user's generated content. If the user content has something wrong or something to be improved, output a list of recommendations and critiques. If the user content is ok and there's nothing to change, output this:  """ class ReflectionAgent: """ A class that implements a Reflection Agent, which generates responses and reflects on them using the LLM to iteratively improve the interaction. The agent first generates responses based on provided prompts and then critiques them in a reflection step. Attributes: model (str): The model name used for generating and reflecting on responses. client (Groq): An instance of the Groq client to interact with the language model. """ def __init__(self, model: str = "llama-3.1-70b-versatile"): self.client = Groq() self.model = model def _request_completion( self, history: list, verbose: int = 0, log_title: str = "COMPLETION", log_color: str = "", ): """ A private method to request a completion from the Groq model. Args: history (list): A list of messages forming the conversation or reflection history. verbose (int, optional): The verbosity level. Defaults to 0 (no output). Returns: str: The model-generated response. """ output = completions_create(self.client, history, self.model) if verbose > 0: print(log_color, f"\n\n{log_title}\n\n", output) return output def generate(self, generation_history: list, verbose: int = 0) -> str: """ Generates a response based on the provided generation history using the model. Args: generation_history (list): A list of messages forming the conversation or generation history. verbose (int, optional): The verbosity level, controlling printed output. Defaults to 0. Returns: str: The generated response. """ return self._request_completion( generation_history, verbose, log_title="GENERATION", log_color=Fore.BLUE ) def reflect(self, reflection_history: list, verbose: int = 0) -> str: """ Reflects on the generation history by generating a critique or feedback. Args: reflection_history (list): A list of messages forming the reflection history, typically based on the previous generation or interaction. verbose (int, optional): The verbosity level, controlling printed output. Defaults to 0. Returns: str: The critique or reflection response from the model. """ return self._request_completion( reflection_history, verbose, log_title="REFLECTION", log_color=Fore.GREEN ) def run( self, user_msg: str, generation_system_prompt: str = "", reflection_system_prompt: str = "", n_steps: int = 10, verbose: int = 0, ) -> str: """ Runs the ReflectionAgent over multiple steps, alternating between generating a response and reflecting on it for the specified number of steps. Args: user_msg (str): The user message or query that initiates the interaction. generation_system_prompt (str, optional): The system prompt for guiding the generation process. reflection_system_prompt (str, optional): The system prompt for guiding the reflection process. n_steps (int, optional): The number of generate-reflect cycles to perform. Defaults to 3. verbose (int, optional): The verbosity level controlling printed output. Defaults to 0. Returns: str: The final generated response after all cycles are completed. """ generation_system_prompt += BASE_GENERATION_SYSTEM_PROMPT reflection_system_prompt += BASE_REFLECTION_SYSTEM_PROMPT # Given the iterative nature of the Reflection Pattern, we might exhaust the LLM context (or # make it really slow). That's the reason I'm limitting the chat history to three messages. # The `FixedFirstChatHistory` is a very simple class, that creates a Queue that always keeps # fixeed the first message. I thought this would be useful for maintaining the system prompt # in the chat history. generation_history = FixedFirstChatHistory( [ build_prompt_structure(prompt=generation_system_prompt, role="system"), build_prompt_structure(prompt=user_msg, role="user"), ], total_length=3, ) reflection_history = FixedFirstChatHistory( [build_prompt_structure(prompt=reflection_system_prompt, role="system")], total_length=3, ) for step in range(n_steps): if verbose > 0: fancy_step_tracker(step, n_steps) # Generate the response generation = self.generate(generation_history, verbose=verbose) update_chat_history(generation_history, generation, "assistant") update_chat_history(reflection_history, generation, "user") # Reflect and critique the generation critique = self.reflect(reflection_history, verbose=verbose) if "" in critique: # If no additional suggestions are made, stop the loop print( Fore.RED, "\n\nStop Sequence found. Stopping the reflection loop ... \n\n", ) break update_chat_history(generation_history, critique, "user") update_chat_history(reflection_history, critique, "assistant") return generation 

In [ ]:

  Copied!     
 
agent = ReflectionAgent()
agent = ReflectionAgent()

In [ ]:

  Copied!     
 
generation_system_prompt = "You are a Python programmer tasked with generating high quality Python code"

reflection_system_prompt = "You are Andrej Karpathy, an experienced computer scientist"

user_msg = "Generate a Python implementation of the Merge Sort algorithm"
generation_system_prompt = "You are a Python programmer tasked with generating high quality Python code" reflection_system_prompt = "You are Andrej Karpathy, an experienced computer scientist" user_msg = "Generate a Python implementation of the Merge Sort algorithm"

In [ ]:

  Copied!     
 
final_response = agent.run(
    user_msg=user_msg,
    generation_system_prompt=generation_system_prompt,
    reflection_system_prompt=reflection_system_prompt,
    n_steps=10,
    verbose=1,
)
final_response = agent.run( user_msg=user_msg, generation_system_prompt=generation_system_prompt, reflection_system_prompt=reflection_system_prompt, n_steps=10, verbose=1, )

Final result¶

In [ ]:

  Copied!     
 
display_markdown(final_response, raw=True)
display_markdown(final_response, raw=True)