> Can ChatGPT materially and positively impact the code written by big companies? Can it do meaningful work in excel? Can it do meaningful PowerPoint work? Can it give effective advice on management?
> Right now we don’t know the answer to those questions.
I know the answer to the first three. Yes, yes, and yes. I've done them all, including all of them in the past few weeks.
(Which is how I learned that it's much better to ask ChatGPT to use Python evaluation mode and Pandoc and make you a PPTX, than trying to do anything with "Office 365 Copilot" in PowerPoint...)
As for the fourth question - well, ChatGPT can give you better advice than most advice on management/leadership articles, so I presume the answer here is "Yes" too - but I didn't verify it in practice.
> current systems aren’t creating real enterprise value at this moment in time
Yes, they are. They would be creating even more value if not for the copyright and exports uncertainty, which significantly slows enterprise adoption.
> We aren't seeing higher overall levels of productivity.
You can't measure productivity for shit, otherwise companies would look entirely differently. Starting from me not having to do my own finances or event planning or hundred other things that are not my job description, not my specialty, and which were done by dedicated staff just a few decades ago, before tech "improved office productivity".
> We aren't seeing the developers who start using copilot/gpt rush ahead of their peers.
That's because individual productivity is usually constrained by team productivity. Devs rushing ahead of their teammates makes the team dysfunctional.
> We aren't seeing any ability to cut back on developer spend.
Devs aren't stupid. They're not going to give you an opportunity if they can avoid it.
> We aren't seeing anything positive yet and many developers have been using copilot/gpt for >1 year.
My belief is that's because you aren't measuring the right things. But then, no one is. This is a problem well-known to be unsolved.
Perhaps we have added more meetings because developers have more free time.
Or perhaps developers were never the bottleneck.
We can see large productivity improvements when we make simple changes like having product managers join the developers daily standup meetings. We can even measure productivity improvements from Slacks/Zooms auto-summary features. Yet gpt/copilot doesn't even register.
> We can even measure productivity improvements from Slacks/Zooms auto-summary features.
While not code generation, this auto-summary is powered by the same tech. I think using it to sift through and surface relevant information, as opposed to generation of new things, will have the biggest impact.
By far the greatest value I get out of LLMs is asking them to help me understand code written by others. I feel like this is an under-appreciated use. How long has this feature been in Copilot? Since February or so? Are people using it? I do not use Copilot.
I use ChatGPT copilot etc to reduce my cognitive load and get a lot of things done quicker so I also have more time to fuck around. You're out of your goddamn mind if you think I'm going to increase my output for the mere chance that maybe I'll get an above inflation raise in a year. "We gave our devs a magic 10% productivity boost machine, but their output hasn't increased? I guess the machine doesn't work..." It's amusing how out of touch you are.
There is an ethical question in here that I don’t have an answer for. As an employee, I find a way to do my job more efficiently. Do I hand those efficiencies to my employer so I can get a pat on the head, or do I keep them to myself to make my own life less stressful? If I give them to the boss, do they even have the ability to increase my pay? Using the extra time to slack off rather than enriching the employer might be the best choice.
Passing on personal productivity gains to management is always a HUGE L for the individual worker.
As a dev, you can use the saved time to slow down and not be stressed, spend more time chatting with colleagues, learn new skills, maybe improve the quality of the code, etc. Or you can pass it on to management which will result in your workload being increased back to where you are stressed again and your slower colleagues will be let go, so now you get to feel bad about that and they won't be around to chat with.
I have never in my life seen workers actually get rewarded with pay raises for improved productivity, that is just a myth the foolish chase, like the pot of gold at the end of the rainbow.
I have also tried being the top performer on a team before (using automation tools to achieve it), and all I got was praise from management. That's nice, but I can't pay for my holidays with praise, so not worth it.
Writing code is just one part of the process. Other bottlenecks might prevent you from seeing overall productivity improvements.
For example:
- time between PRs being created and being picked up for review and merged
- time spent on releasing at end of sprint cycles
- time spent waiting for QA to review and approve
- extreme scrum practices like "you can only work on things in the sprint, even if all work is done"
How are you measuring developer productivity? Were those that adopted copilot and chatgpt now enabled to finally keep up with their faster peers (as opposed to outstrip them)? Is developer satisfaction improved, and therefore retention?
Yes, other bottlenecks might be preventing us from seeing overall productivity improvements. We might require large organisational changes across the industry in order to take advantage of the improvements.
I guess we will see if smaller startups without many of our bottlenecks are suddenly able to be much more competitive.
> How are you measuring developer productivity?
We use a host of quantitative and qualitative measures. None of them show any positive improvements. These include the basics like roadmap reviews, demo sessions, feature cycle time, etc as well as fairly comprehensive business metrics.
In some teams every developer is using copilot and yet we can't see any correlation with it and improved business metrics.
At the same time we can measure the impact from changing the label on a button on our UI on these business metrics.
> Were those that adopted copilot and chatgpt now enabled to finally keep up with their faster peers
No.
> Is developer satisfaction improved, and therefore retention?
> We use a host of quantitative and qualitative measures. None of them show any positive improvements. These include the basics like roadmap reviews, demo sessions, feature cycle time, etc as well as fairly comprehensive business metrics.
Those are very high level. If there's no movement on those, I'd guess there are other things bottlenecking the teams. They can code as fast as possible and things still move at the same pace overall. Nice thing to know.
If you want to really test the hypothesis that Copilot and ChatGPT have no impact on coding speed, look at more granular metrics to do with just coding. The average time from the moment a developer picks up a work item to the time it gets merged (assuming code reviews happen in a timely fashion). Hopefully you have historical pre-AI data on that metric to compare to.
Edit: and average number of defects discovered from that work after merge
> look at more granular metrics to do with just coding. The average time from the moment a developer picks up a work item to the time it gets merged (assuming code reviews happen in a timely fashion)
We do collect this data.
I personally don't put a lot of stock in these kinds of metrics because they depend far too much on the way specific teams operate.
For example perhaps Copilot helps developers understand the codebase better so they don't need to break up the tasks into such small units. Time to PR merge goes up but total coding time could easily go down.
Or perhaps Copilot works well with very small problem sizes (IMO it does) so developers start breaking the work into tiny chunks Copilot works well with. Time to PR merge goes way down but total code time for a feature stays the same.
For what it is worth I do not believe there have been any significant changes with these code level metrics either at the org level.
> We aren't seeing higher overall levels of productivity.
> We aren't seeing the developers who start using copilot/gpt rush ahead of their peers.
You think we are antsy worker bees, hastily rushing forwards to please the decision maker with his fancy car?
You are leadership. It's not hard. Cui bono, follow the money, etc. The incentives are clear.
If me and my peers were to receive a magic "do all my work for me" device I can assure you exactly zero percent of that knowledge will reach your position. Why would it? The company will give me a pat on the back. I cannot pay with pats on the back. Your Tesla cannot be financed with pats on the back. Surely you understand the nature of this issue.
If you write a spaghetti system where collecting the context for the AI is a big time sink, and there are so many service/language barriers that AI get confused, of course AI is going to suck. Of course, if you give your programmers a game pad and tell them to use it to program with a virtual keyboard, they're gonna suck ass too, so you should consider where the fault really lies.
Is it the superstars or the line holders that have been the first adopters? I could speculate, but I am actually curious what you are seeing in practice.
I think you’re thinking about things very locally. Of course ChatGPT can help with some coding - I use it for regex quite often cause I never really learned that well.
The problem is that at the average medium sized company code looks like this - you have 1mln lines of code written over a decade by a few hundred people. A big portion of the code is redundant, some of it is incomplete, much of it is undocumented. Different companies have different coding styles, different testing approaches, different development dynamics. ChatGPT does not appreciate this context.
Excel has some similar problems. First of all Excel is 2 dimensional and LLMs really don’t think in 2 dimensions well. So you need to flatten the excel file for the LLM. A common approach to do this with LLMs is using pandas and then using the column and row names to index into the excel.
Unfortunately, excels at companies cannot be easily read using pandas. They are illogically structured, have tons of hardcoding, intersheet referencing is weird circular ways and so on. I spent some time in finance and sell side equity research models are written by highly trained financial analysts and are substantially better organized than the average excel model at a company. Even this subset of real world models is far from suitable for a direct pandas interpretation. Parsing sell side models requires a delicate and complex interpretation before being fed into an LLM.
>Which is how I learned that it's much better to ask ChatGPT to use Python evaluation mode and Pandoc and make you a PPTX, than trying to do anything with "Office 365 Copilot" in PowerPoint...
Can you elaborate on what this saved over just making the ppt the old fashioned way?
"I have this set of notes attached below; would you kindly group them by X and tabulate, and then use Python with Pandoc to make me a PowerPoint with that table in it, plus an extra slide with commentary from the notes?"
Attach notes, paste, press Enter, wait half a minute, get back a PPTX you can build on, or just restyle[0].
Sure, it's faster to build the presentation yourself than to make ChatGPT make the whole thing for you. But the more time-consuming and boring parts, like making tables and diagrams and summaries from external data or notes, is something ChatGPT can do in a fraction of time, and can output directly into PPTX via Pandoc.
(There's a lot of fun things you can do with official ChatGPT and Python integration. The other day I made it design, write and train a multi-layer perceptron for playing tic-tac-toe, because why waste my own GPU-seconds :).)
--
[0] - In contrast, if you make the same request in PowerPoint's O365 Copilot, it'll barf. Last time I tried, it argued it has no capability to edit the document; the time before that, it made a new slide with text saying literally "data from the previous message".
> Right now we don’t know the answer to those questions.
I know the answer to the first three. Yes, yes, and yes. I've done them all, including all of them in the past few weeks.
(Which is how I learned that it's much better to ask ChatGPT to use Python evaluation mode and Pandoc and make you a PPTX, than trying to do anything with "Office 365 Copilot" in PowerPoint...)
As for the fourth question - well, ChatGPT can give you better advice than most advice on management/leadership articles, so I presume the answer here is "Yes" too - but I didn't verify it in practice.
> current systems aren’t creating real enterprise value at this moment in time
Yes, they are. They would be creating even more value if not for the copyright and exports uncertainty, which significantly slows enterprise adoption.