I built an Excel Add-In that allows my girlfriend to quickly filter 7000 paper titles and abstracts for a review paper that she is writing [1]. It uses Gemma 2 2b which is a wonderful little model that can run on her laptop CPU. It works surprisingly well for this kind of binary classification task.
The nice thing is that she can copy/paste the titles and abstracts in to two columns and write e.g. "=PROMPT(A1:B1, "If the paper studies diabetic neuropathy and stroke, return 'Include', otherwise return 'Exclude'")" and then drag down the formula across 7000 rows to bulk process the data on her own because it's just Excel. There is a gif on the readme on the Github repo that shows it.
I don't know. This paper [1] reports accuracies in the 97-98% range on a similar task with more powerful models. With Gemma 2 2b the accuracy will certainly be lower.
Y'all definitely need to cross validate a small number of samples by hand. When I did this kind of research, I would hand validate to at least P < .01.
She and one other researcher has manually classified all 7000 papers as per standard protocol. Perhaps for the next article they will measure how this tool agreed with them against them and include it in the protocol if good enough.
Great attitude! I recently built a tool for my wife that uses an LLM to automate a task. Is it production ready? Definitely not. But it saves her time even in its current state.
I am not going to claim or report any kind of accuracy, especially with such a small model and such a specific, context dependent use case. It is the user’s responsibility to cross validate if it’s accurate enough for their use case and upgrade model or use another approach if not.
A user buys a car because it gets them from point A to point B. I get what you’re saying though - we are earlier along the adoption curve for these models and more responsibility sits with the user. Over time the expectations will no doubt increase.
As far as i know in GSheet the scripts also run on the Google Servers and are not limited by the local computer power. So there larger models could be deployed.
Tried it out, very cool! Fun to see it chugging on a bunch of rows. Had a weird issue where it would recompute values endlessly when I used it in a table, but I had another table it worked with so not sure what that was about
Glad you tried it out! Excel triggers recalculation when a referenced cell updates, just like with any other formula. This is also why responses are not streamed, as every update would trigger recalculation. But if the async behavior of responses messes with the recalculation logic I am very interested in looking into it and you are most welcome to open an issue in the repo with steps to reproduce.
Looks like I'm out...
Would be great if there was a google apps script alternative. My company gave all devs linux systems and the business team operates on windows. So I always use browser based tech like Gapps script for complex sheet manipulation
Excel add-ins can be written with the Office JS API so that they can run on web as well as desktop for Windows and Mac. But I don't think OP's add-in is possible with that API unless the local model can be run in JS.
The nice thing is that she can copy/paste the titles and abstracts in to two columns and write e.g. "=PROMPT(A1:B1, "If the paper studies diabetic neuropathy and stroke, return 'Include', otherwise return 'Exclude'")" and then drag down the formula across 7000 rows to bulk process the data on her own because it's just Excel. There is a gif on the readme on the Github repo that shows it.
[1] https://github.com/getcellm/cellm