I'm more interested in technical side of this, but I'm not seeing any links to GitHub with the source code of this project.
Anyway, I have a tangential question, and this is the first time I see langchain, so may be a stupid one. The point is the vendor-API seems to be far less uniform than what I'd expect from a framework like this. I'm wondering, why cannot[0] this be done with Ollama? Isn't it ultimately just system prompt, user input and a few additional params like temperature all these APIs require as an input? I'm a bit lost in this chain of wrappers around other wrappers, especially when we are talking about services that host many models themselves (like together.xyz), and I don't even fully get the role langchain plays here. I mean, in the end, all that any of these models does is just repeatedly guessing the next token, isn't it? So there may be a difference on the very low-level, there my be some difference on a high level (considering different ways these models have been trained? I have no idea), but on some "mid-level" isn't all of this utlimately just the same thing? Why are these wrappers so diverse and so complicated then?
Is there some more novice-friendly tutorial explaining these concepts?
You really don't have to use langchain. I usually don't except on a few occasions I used some document parsing submodule.
The APIs between different providers are actually pretty similar, largely close to the OpenAI API.
The reason to use a paid service is because the models are superior to the open source ones and definitely a lot better than what you can run locally.
It depends on the task though. For this task, I think a really good small model like phi-3 could handle 90-95% of the entries well through ollama. It's just that the 5-10% of extra screw ups are usually not worth the privilege of using your own hardware or infrastructure.
For this particular task, I would definitely skip langchain (but I always skip it). You could use any of the top performing open or closed models, with ollama locally, together.ai, and multiple closed models.
It should be much less than 50 lines of code. Definitely under 100.
You can just use string interpolation for the prompts and request JSON output with the API calls. You don't need to get a PhD in langchain for that.
Well, I mean, it appears langchain just technically doesn't support structured response for Ollama (according to the link above). But, as I've said, I have absolutely no idea what all this middle-layer stuff actually does and what may be the reason why different vendors have different integration capabilities in this regard.
I'm totally new (and maybe somewhat late) to the ___domain, literally just tried right now to automate a fairly simple task (extracting/guessing book author + title in nice uniform format from a badly abbreviated/transliterated and incomplete filename) using plain ollama HTTP-API (with llama3 as a model), but didn't have much success with that (it tries to chat with me in its responses, instead of strictly following my instructions). I think, my prompts must be the problem, and I hoped to try the langchain, since it somehow seems to abstract the problem, but saw that it isn't supported for a workflow the OP used. But since this is a field where I'm really totally new, I suppose I also may be making some more general mistake, like using a model that cannot be used for this task at all. How would I know, they all look the same to me…
Ollama project itself is fairly stingy with explanations. Doubtfully there are many people out there trying to automate an answer to the "Why is the sky blue?" question.
So, I wonder, maybe somebody knows a more digestible tutorial somewhere, explaining this stuff from the ground up?
1. Use temperature 0. Anything over that is asking for randomness, which not useful unless you actually want it to say something random rather than following instructions.
2. Use the best/largest model possible. Small models are generally stupid. phi-3 might work as an exception of a very well trained tiny model. Very large models are generally dramatically smarter and better at following directions.
3. Tell it to output JSON and give it examples of acceptable outputs.
4. The API for OpenAI and Anthropic is very very similar to ollama. The models are vastly better than llama3 7b. You can basically make some minor modifications and if you have the temp right I bet it will work.
Personally I think that langchain will just make it more complicated and has nothing to do with your problem, which is probably that you used a tiny rather dumb model with a higher than optimal temperature and didn't specify enough in your prompt. The biggest thing is the size and ability of the model. Most models that will run on your computer are MUCH MUCH stupider than ChatGPT (even 3.5).
Temperature 0 will not prevent randomness, only reduced it.
I addition, there may be times when temperature > 0 is essential for reproducing the text accurately. Consider a model with a knowledge cutoff 3--6 months out of date and trying to write e.g. a model name which did not exist when the model was trained. In that case temperature 0 will make it more likely to fix your code by replacing the model name it's never heard of with one more likely according to the model training data.
In other words, if the text you want was not in the model training data, a higher than normal temperature may be required, depending on how frequently the term appears in the input data. If you provide a few samples in the input, then you may be able to use 0 again.
Right, temperature only controls the distribution of tokens, not answers - for many use cases, the “same” answer can be represented with many different sequences of tokens. If you consider the entire space of possible input texts, at temperature=0 some model outputs are going to be “wrong” because the single most likely token did not belong to the set of tokens corresponding to the most likely answer (of course it’s also possible that the model didn’t “know” the answer, so temp>0 only helps in some cases). Temperature > 0 increases the likelihood of a correct answer being given in those cases.
The problem with generating structured output like JSON is that temperature > 0 also increases the likelihood of a token belonging to the set of “wrong” answers being chosen. With prose that’s not the end of the world because subsequent tokens can change the meaning. But with JSON or code, the wrong token in the wrong place can make the output invalid: it’s no longer parseable json or compilable code. In the blog they were also generating bools in one spot, and temp > 0 would probably result in the “wrong” answer being chosen sometimes.
For that reason I’d suggest generating JSON fields independently and then create the full JSON object from those outputs the old fashioned way. That way different fields can use different temperature settings. You’d probably want temperature=0 for generating bools/enums/very short answers like “New York”, and temperature > 0 for prose text like summaries or descriptions.
Thanks. Indeed, it doesn't directly answer my question. For one, the author seems to be failing the "Chesterton's fence" test here: it doesn't even try to answer what langchain is supposed to be good at, but ends up being bad. It just plainly says it's bad, that's all.
And, as stated, I also don't know the answer to that question, so this is kinda one of the primary concerns here. I mean, one possible answer seems pretty obvious to me: it would be better to keep your app vendor-agnostic (to be able to switch from OpenAI to Anthropic with 1 ENV var) if at all possible. Neither of articles and doc-pages I've read in the past few hours tries to answer to what extent this is possible and why, and if it even is supposed to be a real selling-point of langchain. TBH, I still have no idea what the main selling point even is.
Honestly, langchain solves no problems besides being an advertisement for langchain itself
It gets picked by people with more of a top-down approach maybe, who feel like adding abstraction layers (that don't abstract pretty much anything) is better. It isn't
Yeah langchain is not necessary for this. The author appear not to have shared his code yet (too bad, the visualizations are nice!), but as a poor replacement I can share mine from over a year ago:
Only using the plain OpenAI api. This was on GPT-3.5, but it should be easy to move to 4o and make use of the json mode. I might try a quick update this weekend
Anyway, I have a tangential question, and this is the first time I see langchain, so may be a stupid one. The point is the vendor-API seems to be far less uniform than what I'd expect from a framework like this. I'm wondering, why cannot[0] this be done with Ollama? Isn't it ultimately just system prompt, user input and a few additional params like temperature all these APIs require as an input? I'm a bit lost in this chain of wrappers around other wrappers, especially when we are talking about services that host many models themselves (like together.xyz), and I don't even fully get the role langchain plays here. I mean, in the end, all that any of these models does is just repeatedly guessing the next token, isn't it? So there may be a difference on the very low-level, there my be some difference on a high level (considering different ways these models have been trained? I have no idea), but on some "mid-level" isn't all of this utlimately just the same thing? Why are these wrappers so diverse and so complicated then?
Is there some more novice-friendly tutorial explaining these concepts?
[0] https://python.langchain.com/v0.2/docs/integrations/chat/