Structure Is All You Need

A review of structure models in Python code using Pydantic and Structured Output from LLMs with Ollama
December 20, 2024

Introduction

In development as a whole, but especially when working with APIs or LLMs in a production environment, neglecting to establish structure limits the adaptability of code and dependability of the outputs among the applications we create and the third-party systems we integrate.

Structure provides a common language and format for communication, enabling different software components to interact effectively with each other. Compatibility relies heavily on structure. When code structure adheres to standards and best practices it becomes easier for external systems to understand and interpret the data we exchange, reducing the chances of errors. Output reliability also benefits from having a well-structured system as it makes troubleshooting and debugging processes more efficient.


Pydantic

When it comes to creating structure within Python code, Pydantic is an incredibly useful tool. It allows developers to define custom data structures or schemas for their data models, ensuring that the data adheres to specific format requirements and validation rules. This not only makes the code easier to understand but also improves compatibility with external systems by providing a common understanding of the expected data structure.

By using Pydantic, developers can quickly create powerful and flexible data models with minimal effort. These models include built-in validation features that check for missing or incorrect data before allowing it to be processed further, helping to prevent errors and inconsistencies in the system. This improved reliability makes integration with external systems more straightforward, as the data passed between components can be trusted to be correct and consistent.

Furthermore, Pydantic's support for automatic serialization and deserialization of data saves developers time by handling these tasks automatically. This reduces the risk of errors during data exchange and ensures that the structure of the data remains consistent regardless of where it originated from.


Scenario: A Database API

Let's consider a hypothetical use-case. Suppose you have a table in database that contains User information. All fields in a database must adhere to a db schema, and some fields might also be catagorical, containing only a subset of allowed values. An example could be a position rank field that can only be one of "entry", "middle", and "senior" values.

For the purposes of this exercise, let's use a Polars DataFrame to simulate a DB table:


import polars as pl
from uuid import uuid4


db_table = pl.DataFrame({
    "name": ["Jane", "Tony"],
    "age": [51, 34],
    "rank": ["senior", "entry"],
    "uid": [str(uuid4()) for _ in range(2)]
})


print(db_table)
        

shape: (2, 4)
┌──────┬─────┬────────┬─────────────────────────────────┐
│ name ┆ age ┆ rank   ┆ uid                             │
│ ---  ┆ --- ┆ ---    ┆ ---                             │
│ str  ┆ i64 ┆ str    ┆ str                             │
╞══════╪═════╪════════╪═════════════════════════════════╡
│ Jane ┆ 51  ┆ senior ┆ fea1640a-075b-4bfd-9c91-ca4c8c… │
│ Tony ┆ 34  ┆ entry  ┆ f5a4da8a-6d70-47d0-ad61-c4d3d4… │
└──────┴─────┴────────┴─────────────────────────────────┘
        

In this scenario, new users are added to the table via a custom API. In order to populate the table, the structure of the input has to match the DB Schema. Let's build out the API logic in Python using Pydantic.

First, let's create a User model that will set the input format guidelines. Note that the rank field can only contain one of three values ("senior", "middle", and "entry") which is accomplished by passing the Literal type from the typing package. Likewise, the Optional type is used to assert that providing a user id is not necessary, because one will be assigned automatically to any new user with the uuid4() method.


from pydantic import BaseModel, ValidationError
from typing import Literal, Optional


class User(BaseModel):
    name: str
    age: int
    rank: Literal["senior", "middle", "entry"]
    uid: Optional[str] = str(uuid4())
        

We can now create a new User (note that the user id field is populated automatically by our API):


User(name = "Sidney", age = 39, rank = "middle")
        

User(name='Sidney', age=39, rank='middle', uid='4ece530f-e33d-4569-9f94-769dc14a1c53')
        

Next, let's create a user validation function to ensure that new user entries being passed to the DB table adhere to the schema.

The basis for validation is Pydantic's model_validate() method, used alongside its ValidationError exception. The model_validate() method takes as input a JSON object (as a Python dict) of the predefined User attributes. Importantly, the strict = True arg is also passed to it, ensuring that if the input does not match the predefined schema, a ValidationError exception is thrown. The entire process is captured in a try block.


def validate_user(name: str, age: int, rank: str) -> User | None:
    try:
        return User.model_validate(
            {
                "name": name,
                "age": age,
                "rank": rank
            },
            strict = True
        )
    except ValidationError as e:
        print(e)
        return None
        

We can now validate users at time of creation

A valid user entry returns the familiar User class model:


validate_user("Sidney", 39, "middle")
        

User(name='Sidney', age=39, rank='middle', uid='4ece530f-e33d-4569-9f94-769dc14a1c53')
        

However, if an attempt is made to create a new User using incorrect attributes, the ValidationError exception is caught, yielding a highly informative errror message:


validate_user("Sidney", "39", "middle")   # 'age' field is a string instead of the required int
        

1 validation error for User
age
  Input should be a valid integer [type=int_type, input_value='39', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/int_type
        

validate_user("Sidney", 39, "standard")  # the value "standard" is not one of the 3 allowed values for the 'rank' field
        

1 validation error for User
rank
  Input should be 'senior', 'middle' or 'entry' [type=literal_error, input_value='standard', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/literal_error
        

We can now incorporate the above logic into our API to populate the DB table

For the purposes of this exercise, our API is a very simple concatonation of the Polars DataFrame. The important aspect is that it uses Pydantic's model_dump() method to convert the User class back into a now-validated JSON object, which is itself then appended to the table as a new row (in a real-world scenario this validated JSON object would most likely be sent to a REST API).


new_user = validate_user("Sidney", 39, "middle")

print(new_user.model_dump())
        

{'name': 'Sidney', 'age': 39, 'rank': 'middle', 'uid': '4ece530f-e33d-4569-9f94-769dc14a1c53'}
        

Combining everything, we can defined our API logic, enabling us to add new users to our existing table:


new_user = validate_user("Sidney", 39, "middle")

if isinstance(new_user, User):
    db_table = pl.concat([db_table, pl.DataFrame(new_user.model_dump())])


print(db_table)
        

shape: (3, 4)
┌────────┬─────┬────────┬─────────────────────────────────┐
│ name   ┆ age ┆ rank   ┆ uid                             │
│ ---    ┆ --- ┆ ---    ┆ ---                             │
│ str    ┆ i64 ┆ str    ┆ str                             │
╞════════╪═════╪════════╪═════════════════════════════════╡
│ Jane   ┆ 51  ┆ senior ┆ fea1640a-075b-4bfd-9c91-ca4c8c… │
│ Tony   ┆ 34  ┆ entry  ┆ f5a4da8a-6d70-47d0-ad61-c4d3d4… │
│ Sidney ┆ 39  ┆ middle ┆ 4ece530f-e33d-4569-9f94-769dc1… │
└────────┴─────┴────────┴─────────────────────────────────┘
        



Scenario: Call Center Transcriptions with Ollama

Ollama recently added official support for Structured Output. In this example, you are tasked with analyzing a transcribed call center conversation between an Agent and a Customer. The goal is to take a transcribed call, and output a call summary, as well as some attributes such as general sentiment and whether or not the call had a resolution.

NOTE: The sample call center transcript used in this example can be found at very bottom of this article, bellow the Summary section.

First, let's import the neccessary packages and define the Ollama response function. This function establishes the parameters of our LLM, as well as providing the input_format arg which will be crucial to the Structured Output.


from pydantic import BaseModel
from ollama import Client, ChatResponse
from typing import List, Literal, Optional


def ollama_response(
    user_content: str,
    input_format: dict,
    ollama_model: str,
    ollama_options: dict = {},
    ollama_host: str = "http://localhost:11434"
) -> ChatResponse:
    response: ChatResponse = Client(host = ollama_host).chat(
        messages = [
            {
                "role": "system",
                "content": "You are a helpful assistant. Always return as JSON."
            },
            {
                "role": "user",
                "content": user_content
            }
        ],
        model = ollama_model,
        format = input_format,
        options = ollama_options
    )
    return response
        

Next, let's create a Conversation model to define the Structured Output's strict parameters:


class Conversation(BaseModel):
    summary: list[str]
    sentiment: Literal["positive", "negative"]
    resolution: Literal[True, False]
        

Importantly, it is this Conversation model that will be passed to the above-mentioned input_format argument, by first being converted to a JSON schema using Pydantic's model_json_schema() method. The converted schema will look like this:


Conversation.model_json_schema()
        

{'properties': {'summary': {'items': {'type': 'string'},
   'title': 'Summary',
   'type': 'array'},
  'sentiment': {'enum': ['positive', 'negative'],
   'title': 'Sentiment',
   'type': 'string'},
  'resolution': {'enum': [True, False],
   'title': 'Resolution',
   'type': 'boolean'}},
 'required': ['summary', 'sentiment', 'resolution'],
 'title': 'Conversation',
 'type': 'object'}
        

Now we can define a make_conversation_prompt() function, which takes as an argument the actual call transcript and applies it to a template, returning the custom prompt that the LLM will use.

Note that the desired Structured Output's strict parameters are a reiteration of our Conversation model, and are clearly defined for the prompt:


def make_conversation_prompt(conversation: str) -> str:
    ret = f"""You are a helpful assistant who analyzes CONVERSATION between an AGENT and a CUSTOMER. You extract the following info from each CONVERSATION:

    1. A summary of the CONVERSATION
    2. The sentiment of the CONVERSATION where the only options are: positive, negative
    3. Whether or not the issue(s) described in the CONVERSATION had a resolution where the only options are: True, False

    Include only the JSON object in your output.

    CONVERSATION:\n{ conversation }
    """
    return ret
        

Finally, we import the call transcript and run the function:


with open("data/agent_customer_conversation.txt", "r") as f:
    conversation_transcript = f.read()


response = ollama_response(
    user_content = make_conversation_prompt(conversation_transcript),
    input_format = Conversation.model_json_schema(),
    ollama_model = "mistral",
    ollama_options = {'temperature': 0}
)
        

As we did in the previous section, we will now use Pydantic to validate the output and make sure we have the desired Structured Output. The only difference here is that instead of using Pydantic's model_validate() method, we use the model_validate_json() method (this is because the response output from Ollama is not a JSON object, but rather an object of type String):


conversation_response = Conversation.model_validate_json(response.message.content)
        

Now that the desired Structured Output is validated, we can extract the response attributes:


print(conversation_response.summary)
        

["Customer expresses dissatisfaction with product received from 'Doug's Fishing Imporium Limited', stating it is not as pictured and does not work as expected.",
 'Agent apologizes for inconvenience, explains the picture on the website is accurate, and offers a detailed description of the product.',
 'Customer complains about the size of the product and its functionality.',
 'Agent offers additional information about the product, including usage instructions and a 50% discount on any future purchase.',
 'Customer expresses dissatisfaction with the shipping representative, who was late and unprofessional.',
 "Agent apologizes for the representative's behavior, promises to take necessary action, and offers a refund or replacement."]
        

for point in conversation_response.summary:
    print(f"- {point}")
        

- Customer expresses dissatisfaction with product received from 'Doug's Fishing Imporium Limited', stating it is not as pictured and does not work as expected.
- Agent apologizes for inconvenience, explains the picture on the website is accurate, and offers a detailed description of the product.
- Customer complains about the size of the product and its functionality.
- Agent offers additional information about the product, including usage instructions and a 50% discount on any future purchase.
- Customer expresses dissatisfaction with the shipping representative, who was late and unprofessional.
- Agent apologizes for the representative's behavior, promises to take necessary action, and offers a refund or replacement.
        

print(conversation_response.sentiment)
        

'negative'
        

conversation_response.resolution
        

True
        



Scenario: Structured Output using OpenAI and Ollama

Ollama also allows for use of OpenAI's API when working with Structured Output. The implementation is straightforward and only requires a few minor modifications to the previously-discussed code block above.

As above, we first import the neccessary packages, create the Conversation model, define the Prompt function, and define the Ollama response function.

The difference here is how we integrate the OpenAI API into the response function. Of note, the OpenAI base_url and api_key variables when working with Ollama are simply "http://localhost:11434/v1" and "ollama", respectively. Obviously, you are also free to use any other LLM provider as long as it has OpenAI compatibility.


from typing import List, Literal, Optional
from ollama import Client, ChatResponse
from pydantic import BaseModel

from openai import OpenAI
import openai



class Conversation(BaseModel):
    summary: list[str]
    sentiment: Literal["positive", "negative"]
    resolution: Literal[True, False]



def make_conversation_prompt(conversation: str) -> str:
    ret = f"""You are a helpful assistant who analyzes CONVERSATION between an AGENT and a CUSTOMER. You extract the following info from each CONVERSATION:

    1. A summary of the CONVERSATION
    2. The sentiment of the CONVERSATION where the only options are: positive, negative
    3. Whether or not the issue(s) described in the CONVERSATION had a resolution where the only options are: True, False

    Include only the JSON object in your output.

    CONVERSATION:\n{ conversation }
    """
    return ret



def ollama_openai_response(
    user_content: str,
    input_format: dict,
    ollama_model: str,
    openai_base_url: str = "http://localhost:11434/v1",
    openai_api_key: str = "ollama"
) -> Conversation | None:
    try:
        completion = OpenAI(base_url = openai_base_url, api_key = openai_api_key).beta.chat.completions.parse(
            temperature = 0,
            model = ollama_model,
            messages = [
                {
                    "role": "system",
                    "content": "You are a helpful assistant. Always return as JSON."
                },
                {
                    "role": "user",
                    "content": user_content
                }
            ],
            response_format = input_format,
        )
        response = completion.choices[0].message
        if response.parsed:               # successful structured output parsing
            return response.parsed
        elif response.refusal:            # unsuccessful structured output parsing
            return response.refusal
    except Exception as e:
        print(f"Exception of type '{type(e)}' has occured:\n\t{e}")
        


Code execution is very similar to the previous section with only a few notable modifications.

The input_format arg is now simply the Conversation model.


with open("data/agent_customer_conversation.txt", "r") as f:
    conversation_transcript = f.read()


response = ollama_openai_response(
    user_content = make_conversation_prompt(conversation_transcript),
    input_format = Conversation,   # NOT Conversation.model_json_schema()
    ollama_model = "mistral"
)
        

Additionally, there is no need to validate the output using Conversation.model_validate_json(response.message.content).


for point in response.summary:
    print(f"- {point}")
        

- Customer expressed dissatisfaction with product received from the company due to size discrepancy and poor delivery experience.
- Agent apologized for inconvenience, explained the reason behind the size difference, offered a detailed description of product usage, and promised to investigate the shipping issue.
- Company offered a 50% discount on any future purchase and free shipping as compensation.
- Agent assured follow-up from the department of inquiries regarding product usage.
        

print(response.sentiment)
        

'negative'
        

response.resolution
        

True
        



Summary

Neglecting the importance of a well-structured approach when building software can lead to lost opportunities for collaboration, compatibility, and dependability with external systems.

By prioritizing structure from the outset, developers can create flexible, scalable, and reliable solutions that are ready to adapt to changing requirements and technological advancements in the future.

Thanks for reading!






Sample Call Center Transcript:

AGENT: Welcome, Mr. Phillips, how are you today?
CUSTOMER: Welcome, I’m fine.
AGENT: With you Jim from the complaints and suggestions department, how can I help you today, Mr. Phillips?
CUSTOMER: I have a big problem with the product I bought from your site this week
AGENT: Moments with me Mr. Phillips, I will review your request now
CUSTOMER: Ok
AGENT: Your request number 1234567 was on Wednesday correct Mr. Phillips?
CUSTOMER: Yup
AGENT: Ok, Mr. Phillips can you explain to me what is the problem exactly?
CUSTOMER: I didn’t like the product and I didn’t get it as the same picture on the company website, you are liars!
AGENT: I apologize very much to you Mr. Phillips but on our website, we put pictures of real products only, and the proof is the positive comments and evaluations from all our customers in Saudi Arabia
CUSTOMER: I don’t believe it; I believe my eyes who see the product
AGENT: Sorry, Mr. Phillips I’m not accusing you of lying at all, and I will solve this problem immediately, can you please explain to me in detail how the product is different from the picture?
CUSTOMER: In the picture, the size of the product is bigger than what I received
AGENT: Ok, Mr. Phillips if you opened our website, you would find the sizes written in detail in the product description, if you can look at it now while you are with me on the line?
CUSTOMER: Ok
AGENT: Take your time, Mr. Phillips
CUSTOMER: it is Written but the product in the picture is very big than the size that is written in the description?
AGENT: This is because of the angle of photography Mr. Phillips because we in the company must show all the details of the product to the customer, so the pictures are a little close to the product to show all the details to the customers, and we provide the product with a specific and clear description of materials and sizes and advantages and everything so that the customer knows the product that he will buy, are you still with me Mr. Phillips?
CUSTOMER: yes
AGENT: Are there any other differences than the size Mr. Phillips?
CUSTOMER: I don’t know how to use this product; I mean the way I used it and it didn’t work
AGENT: There is no problem Mr. Phillips I will send you an accurate description of the method of use and will follow up with you one of the customer service representatives in the department of inquiries to clarify all the details and information about the product
CUSTOMER: Pretty good!
AGENT: Do you have any other complaints, Mr. Phillips?
CUSTOMER: I wanted to complain to the shipping representative who got me this product
AGENT: Moments with me, Mr. Phillips now I will determine who is responsible for your shipping operation
CUSTOMER: OK I am waiting for you
AGENT: I apologize for being late, moments Mr. Phillips
CUSTOMER: Ok
AGENT: are you still with me Mr. Phillips ?
CUSTOMER: Go ahead I am with you
AGENT: Sorry again for being late Mr. Phillips, now we will review with you some data about the shipment, ok?
CUSTOMER: Ok
AGENT: If you can check the delivery information on the bill!
CUSTOMER: One minute
AGENT: Ok
CUSTOMER: ok I am with you now
AGENT: All right, the shipment was 6:00 p.m. on Wednesday, is it right?
CUSTOMER: The product was supposed to be delivered to me at 6 p.m., but I was surprised that the agent was two hours late!!!
AGENT: Ok Mr. Phillips, can you tell me the number of the representative on the bill?
CUSTOMER: Moments… Agent number is (95520)
AGENT: OK Mr. Phillips, can you tell me now about the whole problem, go-ahead
CUSTOMER: The representative was two hours behind schedule, and when I received the shipment was untidy and once humiliated, and unacceptably spoke to me!
AGENT: I apologize to you very much Mr. Phillips, and the necessary work will be done, and strict measures will be taken with the representative immediately, and an apology from our company to you we will offer you a 50% discount on any product you want from our site and shipping the product is free
CUSTOMER: nice thank you Jim
AGENT: I’m always at your service, Mr. Phillips, do you have any complaints or other inquiries?
CUSTOMER: no
AGENT: Ok, Mr. Phillips a customer service representative will contact you from the department of inquiries about the method of using the product as soon as possible, I wish you a happy day
CUSTOMER: thanks a lot
AGENT: I am pleased to serve you, thank you for contacting the customer service of the company Doug's Fishing Imporium Limited