As other people pointed out here you can also add "verbosity sinks" as text fields in structured output, recently I've also been experimenting with tool calls to support guided self-talk in a way that doesn't necessarily all accumulate in the context (e.g. if not all the tool parameters get echoed back).
Here is a short example that came up for me last week.
I had a set of documents I wanted to classify according a taxonomy that is well known (so it is exists in the training data of all the major llm models I tested)
If I have prompt like, `You are an expert classification system. Using the Classification Approach Foo, consider the following and output the category in JSON format, such as {"class":"bar"} `
This works ok, but it works much better if I tell it to output {"class":"bar", "reason": "baz"} and improved with some other approaches like adding "related_class" or "parent_category" which would otherwise be redundant.
Also including some few-shot examples helped, but the biggest benefit came from the "reason" field. Trying justification or other synonyms seems to produce the same output.
Have you tested moving the "reason" field before the "class" field? That may encourage better CoT instead of having the model justify the class after it already picked it. Anecdotally, I saw a 5% boost in performance from a NER system from having the model output the entity's class at the end rather than the beginning.
Speaking only for myself these ideas are a combination of things I've seen scanning new papers and informal discussions with other people working in the area. Feel free to shoot me an e-mail though, maybe I can point you somewhere more specific.
Edit: The "verbosity sink" name is inspired by the idea from the paper below although they're not actually at all the same thing.