TIL: Sum Types With `instructor_ex`
The Instructor Elixir library lets you retrieve structured output from LLMs like the OpenAI GPT models.
But I found that having it return structs that are sum types is not that straightforward.
Simple Instructor responses
For this post, let’s looks at survey questions as an example.
You can define an Ecto schema like:
embedded_schema do field :question, :string field :options, {:array, :string} end def generate_question(topic) do Instructor.chat_completion( model: "gpt-4-turbo-preview", response_model: __MODULE__, max_retries: 3, messages: [ %{ role: "user", content: """ Generate a survey question to gather opinions on the topic of #{topic}. """ } ] ) end
And Instructor will ensure that the response you get is a struct with the question
and options
fields.
iex(1)> SurveySoft.Question.generate_question("climate change") {:ok, %SurveySoft.Question{ question: "How strongly do you agree with the statement: 'Immediate action is necessary to address climate change.'?", options: ["Strongly agree", "Agree", "Neutral", "Disagree", "Strongly disagree"] }}
Okay, the question GPT-4 returned isn’t the most exciting, buuuut, you could take output like this and render it in a form without having to do any additional processing. That’s kind of exciting.
The way it does this is by generating a JSON schema based on the Ecto schema. You can also add validations so if the LLM response is not shaped the way we like, Instructor will complain about it and ask the LLM to re-generate another response.
Sum types
Now, what if, in addition to multiple choice questions like the one above, you also want to support questions that participants respond to by sliding a scale:
%SliderScaleQuestion{ question: "On a scale of 1 to 10, how critical do you view climate change?", min: 1, max: 10, step: 1 }
Our Question
schema needs to support sum types which Ecto doesn’t have first-class support for. However, Ecto does let you define your own types which must map to types that Ecto does support(eg. maps, strings, etc)– our defined Ecto type will be stored as, say, a map but be represented in memory as the type we defined. We do this using the Ecto.Type behaviour.
First, we create Ecto schemas for the multiple choice and slider question types:
defmodule SurveySoft.Question.MultipleChoice do use Ecto.Schema embedded_schema do field :question, :string field :options, {:array, :string} end end defmodule SurveySoft.Question.Slider do use Ecto.Schema embedded_schema do field :question, :string field :min, :integer field :max, :integer field :step, :integer end end
Then, we define the Ecto Type that represents the sum type:
defmodule SurveySoft.Question.QuestionType do use Ecto.Type use Instructor.EctoType defstruct [:kind, :value] @impl true def type, do: :map @impl true def cast(%{"kind" => "multiple_choice", "question" => question, "options" => options}) do {:ok, %QuestionType{ kind: :multiple_choice, value: %MultipleChoice{question: question, options: options} }} end @impl true def cast(%{ "kind" => "slider", "question" => question, "min" => min, "max" => max, "step" => step }) do {:ok, %QuestionType{ kind: :slider, value: %Slider{question: question, min: min, max: max, step: step} }} end # I've omitted showing the other callbacks necessary for Ecto.Type here. See linked Github link for the full example.
Defining the JSON schema
However, Instructor cannot automatically generate a JSON schema for Ecto.Type
s. We need to do that ourselves. Fortunately, it’s quite simple:
def to_json_schema() do %{ type: "object", properties: %{ kind: %{ type: "string", enum: ["multiple_choice", "slider"] }, question: %{type: "string"}, options: %{type: "array", items: %{type: "string"}}, min: %{type: "integer"}, max: %{type: "integer"}, step: %{type: "integer"} } } end
JSON schema also lets you define dependencies between fields so that you can specify constraints such as: response shouldn’t have min/max fields for a multiple_choice
question.
This starts to become a bit complex, but you can ask for help from–who else?–ChatGPT to add these constraints.
Anyway, after you do that, the LLM should be able to respond with both kinds of survey questions:
{:ok, %SurveySoft.Question{ question: %SurveySoft.Question.QuestionType{ kind: :multiple_choice, value: %SurveySoft.Question.MultipleChoice{ id: nil, question: "How much do you agree with the statement: 'Immediate action is necessary to combat climate change?'", options: ["Strongly agree", "Agree", "Neutral", "Disagree", "Strongly disagree"] } }, required: true }} iex(2)> SurveySoft.Question.generate_question("climate change(slider question)") {:ok, %SurveySoft.Question{ question: %SurveySoft.Question.QuestionType{ kind: :slider, value: %SurveySoft.Question.Slider{ id: nil, question: "How concerned are you about climate change?", min: 1, max: 10, step: 1 } }, required: true }}