Structured Output Generation using Ollama

Generating Structured Output in Python using only the Ollama package

November 23, 2024

Introduction

Large Language Models (LLMs) are extremely versatile in a multitude of tasks, including Classification, Named Entry Extraction, and Summarization just to name a few. In the majority of these tasks, and especially in Production environments, the goal is to create reproducible structured output in a defined format, which can then be passed to subsequent steps in a pipeline.

The idea of Structured Output in LLMs is nothing new. In fact, there are existing services such as Outlines that aim to do just that. While these services are highly useful and should certainly be considered, they rely on their own packages and libraries which often adds a layer of maintanence to any application stack, especially if those packages rely on dependencies such as the PyTorch and Transformers libraries.

In this article Structured Output is generated for a variety of LLM tasks using only the Ollama framework. The goal is to minimize development overhead, while taking advantage of the power of the established open-source LLM infrastructure.

Specifically, and for a fair comparison, I will be recreating a few examples from the Outlines Cookbook. This is by no means a criticism of their work. In fact, their work on Structured Output was the motivation for this article! My goal here is to simply introduce a more streamlined approach. Their work, especially in Prompt Engineering, is extraordinary and was the main inspiration for this article.

Setup

The required setup is minimal. All you need is a running Ollama instance, and the Ollama Python Package to be installed. You can download the Ollama framework here, and the Ollama Python library can be installed using Pip.


$ pip install ollama

Once Ollama is installed, an Ollama LLM needs to be downloaded for use. Here I will be using llama3.2, which can be pulled (downloaded) using this terminal command:


$ ollama pull llama3.2

With that out of the way, all we need now is to load the Ollama Python package and define one helper function that we will use throughout this exercise.


from ollama import Client



def chat_response(prompt, USE_MODEL = "llama3.2"):
    chat_messages = [
        {
            "role": "user",
            "content": prompt
        }
    ]
    response = Client(host = "http://localhost:11434").chat(
        model = USE_MODEL,
        messages = chat_messages
    )
    return response["message"]["content"]

Thats it! The setup is complete. We can now move onto our first LLM task: Classification.

Classification

The following Classification task is a recreation of the existing Outlines Classification cookbook. The prompt I am using is essentially the same, as I wanted to recreate the target use-case scenario.

The prompt is wrapped in a function that takes in a single variable, request:


def customer_support_prompt(request):
    ret = f"""You are an experienced customer success manager.

    Given a request from a client, you need to determine when the
    request is urgent using the label "URGENT" or when it can wait
    a little with the label "STANDARD".

    Include only the label in your output.

    # Examples

    Request: "How are you?"
    Label: STANDARD

    Request: "I need this fixed immediately!"
    Label: URGENT

    # TASK

    Request: { request }
    Label: """
    return ret

Once the prompt is defined, it can be passed directly to the helper function (defined in setup), along with the same list of requests used in the original example:


requests = [
    "My hair is one fire! Please help me!!!",
    "Just wanted to say hi"
]


chat_list = [chat_response(customer_support_prompt(r)) for r in requests]


print(chat_list)


['URGENT', 'STANDARD']

As can be seen, the Structured Output list of labels generated here is the same as in the original example.

Named Entity Extraction

The next task recreated here is Named Entity Extraction, inspired by this Cookbook. Again, the prompt I am using is essentially the same, but it is the only part carried over from their example.

This task is highly useful, as it can extract fields and values from text, and return a structured JSON object, which can itself then be passed to a multitude of downstream steps in a production pipeline.

As before, the prompt is wrapped in a function that takes in a single variable, this time the order:


def take_order_prompt(order):
    ret = f"""You are the owner of a pizza parlor. Customers send you orders from which you need to extract:

    1. The pizza that is ordered
    2. The number of pizzas

    Include only the JSON object in your output.

    # EXAMPLE

    ORDER: I would like one Margherita pizza
    RESULT: {{"pizza": "Margherita", "number": 1}}

    # TASK

    ORDER: { order }
    RESULT: """
    return ret

Once the prompt is defined, it is passed directly to our chat_response helper function:


orders = [
    "Hi! I would like to order two pepperonni pizzas and would like them in 30mins.",
    "Is it possible to get 12 margheritas?"
]


chat_list = [chat_response(take_order_prompt(o)) for o in orders]


print(chat_list)


['{"pizza": "Pepperonni", "number": 2}',
 '{"pizza": "Margherita", "number": 12}']

Once again, the same structured list of JSON objects is returned, with the same values extracted, when compared to the original example.

Chain Of Density Summaries

The last task that I will recreate is Chain Of Density Summaries, found in this Cookbook. I highly recommend going over the cookbook to understand the functional aspects of how chain-of-density summaries work. Like the previous tasks, the prompt I am using is a very slightly modified version of to the original, with no other code used.

The prompt itself is wrapped in a function that takes in a single variable, this time the article being summarized. For this example, I also use the same sample article for a direct comparison:


def cod_prompt(article):
    ret = f"""You will generate increasingly concise, entity-dense summaries of the given ARTICLE below.

    Repeat the following 2 steps 5 times.

    Step 1. Identify 1-3 informative Entities ("; " delimited) from the Article which are missing from the previously generated summary.
    Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.

    A Missing Entity is:
    - Relevant: to the main story.
    - Specific: descriptive yet concise (5 words or fewer).
    - Novel: not in the previous summary.
    - Faithful: present in the Article.
    - Anywhere: located anywhere in the Article.

    Guidelines:
    - The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
    - Make every word count: rewrite the previous summary to improve flow and make space for additional entities.
    - Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
    - The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.
    - Missing entities can appear anywhere in the new summary.
    - Never drop entities from the previous summary. If space cannot be made, add fewer new entities.

    Remember, use the exact same number of words for each summary.

    Answer in JSON. The JSON should be a a dictionary with key "summaries" that contains a list (length 5) of dictionaries whose keys are "Missing_Entities" and "Denser_Summary".

    Include only the JSON dictionary in your output, wuth the following format:

    {{'summaries': [
        {{
          'missing_entities': '',
          'denser_summary': ''
        }},
        {{
          'missing_entities': '',
          'denser_summary': ''
        }},
        {{
          'missing_entities': '',
          'denser_summary': ''
        }},
        {{
          'missing_entities': '',
          'denser_summary': ''
        }},
        {{
          'missing_entities': '',
          'denser_summary': ''
        }}
    ]}}

    # TASK:

    ARTICLE: { article }
    """
    return ret

The sample article is defined here in a text format:


sample_article = """
Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.[5] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[6][7][8] He is widely considered to be the father of theoretical computer science and artificial intelligence.[9]

Born in Maida Vale, London, Turing was raised in southern England. He graduated at King's College, Cambridge, with a degree in mathematics. Whilst he was a fellow at Cambridge, he published a proof demonstrating that some purely mathematical yes–no questions can never be answered by computation. He defined a Turing machine and proved that the halting problem for Turing machines is undecidable. In 1938, he obtained his PhD from the Department of Mathematics at Princeton University. During the Second World War, Turing worked for the Government Code and Cypher School at Bletchley Park, Britain's codebreaking centre that produced Ultra intelligence. For a time he led Hut 8, the section that was responsible for German naval cryptanalysis. Here, he devised a number of techniques for speeding the breaking of German ciphers, including improvements to the pre-war Polish bomba method, an electromechanical machine that could find settings for the Enigma machine. Turing played a crucial role in cracking intercepted coded messages that enabled the Allies to defeat the Axis powers in many crucial engagements, including the Battle of the Atlantic.[10][11]

After the war, Turing worked at the National Physical Laboratory, where he designed the Automatic Computing Engine, one of the first designs for a stored-program computer. In 1948, Turing joined Max Newman's Computing Machine Laboratory at the Victoria University of Manchester, where he helped develop the Manchester computers[12] and became interested in mathematical biology. He wrote a paper on the chemical basis of morphogenesis[1] and predicted oscillating chemical reactions such as the Belousov–Zhabotinsky reaction, first observed in the 1960s. Despite these accomplishments, Turing was never fully recognised in Britain during his lifetime because much of his work was covered by the Official Secrets Act.[13]
"""

Running the code generates the same structured JSON object as the original example:


CHAT = chat_response(cod_prompt(sample_article))

print(CHAT)


{
  "summaries": [
    {
      "missing_entities": "Bletchley Park, King's College Cambridge, Princeton University, Government Code and Cypher School",
      "denser_summary": "This article discusses Alan Turing OBE FRS. Born in Maida Vale London; raised southern England; graduated King's College Cambridge mathematics degree. Published proof mathematical yes–no questions undecidable at Cambridge proved halting problem Turing machines undecidable. Led Hut 8 Bletchley Park Britain codebreaking centre produced Ultra intelligence German naval cryptanalysis. Devised techniques speeding breaking ciphers improving pre-war Polish bomba method electromechanical machine Enigma machine intercepted coded messages enabling Allies defeat Axis powers Battle of Atlantic."
    },
    {
      "missing_entities": "Max Newman, Victoria University of Manchester, National Physical Laboratory",
      "denser_summary": "Alan Turing OBE FRS influential theoretical computer science algorithm computation. Developed formalisation concepts algorithm and computation with Turing machine model general-purpose computer father artificial intelligence. Designed Automatic Computing Engine stored-program computer one first designs computer joined Max Newman's Computing Machine Laboratory Victoria University Manchester helped develop Manchester computers mathematical biology predicted oscillating chemical reactions Belousov–Zhabotinsky reaction."
    },
    {
      "missing_entities": "Department of Mathematics, Ultra intelligence, Battle of the Atlantic",
      "denser_summary": "Turing's work at National Physical Laboratory Automatic Computing Engine designs stored-program computer. Joined Max Newman's Computing Machine Laboratory Victoria University Manchester mathematical biology paper chemical basis morphogenesis predicted oscillating reactions Belousov–Zhabotinsky reaction. Worked Government Code and Cypher School Bletchley Park Britain's codebreaking centre."
    },
    {
      "missing_entities": "Maida Vale, King's College Cambridge",
      "denser_summary": "Turing's influence theoretical computer science algorithm computation. Developed formalisation concepts algorithm and computation with Turing machine model general-purpose computer father artificial intelligence. Designed Automatic Computing Engine stored-program computer one first designs."
    },
    {
      "missing_entities": "Polish bomba method, Enigma machine",
      "denser_summary": "Turing's work at Bletchley Park Britain's codebreaking centre Ultra intelligence. Devised techniques speeding breaking ciphers improving Polish bomba method Enigma machine intercepted coded messages enabling Allies defeat Axis powers Battle of Atlantic."
    }
  ]
}

As expected the same structured JSON object as the original example is returned.

Summary

In this article we used the Ollama framework and prompt engineering to create structured output for a variety of LLM tasks. The ease of implementation and value of the generated structured output really exemplifies the power of intelligent prompt engineering when it is applied to the highly capable Ollama framework.

Thanks for reading!