No evidence is provided. I'm curious whether the author actually has evidence or is just asserting that this is possible and likely happening.
IME, and I have lots of "E" (including training and finetuning my own models), this probably isn't entirely true in most writing-heavy courses. You can use the largest language models, in an interactive fashion, to generate portions of half-decent papers. Maybe even "ace" papers on certain subjects with certain grading criteria (namely, "looks reasonable", not "fine-toothed comb"). But in many courses the essays will be extremely low-quality, and even in the happy cases saying that the essay is "machine-generated" is eliding a lot of manual effort.
I agree that students are probably using LLMs for their homework, but I'm skeptical that they are all getting As on assignments that are designed as big assessments, or that the essays are actually fully machine-generated. I bet a lot of students -- the laziest ones -- are getting "WTF is this essay even about... did you have a stroke while writing this?!" feedback if they are using LLMs to generate essays whole-cloth.
Pedagogically, this matters. Think about calculator usage. There's a huge difference between allowing use of TI-83 on Calculus assignment with lots of word-heavy application problems and allowing use of Wolfram Alpha on a Calculus assignment that's "integration recipe practice".
Yeah, I could believe it "aces" homework in a really open-ended writing assignment. An assignment like: write an essay explaining a personal experience and what it meant to you. The people that's the biggest issue for at the moment are probably writing instructors, since the goal of those classes is to just practice writing something/anything. In computer science though, the writing I've had turned in in my classes that I suspect is LLM-generated usually gets an F. It tends to just ramble about the subject in general and not hit any of the specific points that I'm asking for.
Last year I had a take-home exam in an operating systems class that I suspect one student fed entirely as prompts to an LLM, and it was... odd. The answer to every question was a paragraph or two of text, even in cases where the expected answer was true/false, or a number. And even when I did want text as the answer, it was way off, e.g. in one I asked them to explain one strength and one weakness of a specific scheduling algorithm on a given scenario. The submitted answer was just general rambling about scheduling algorithms. Some of this is probably within the reach of an expert using clever prompting strategies, but students who can do that could probably also answer the original question. :-)
To be fair, I have seen the "ramble generically on the subject of the question" strategy manually implemented by humans too, in the hopes that if you throw enough BS at the question you might get partial credit by luck. Maybe designing assessments to be LLM-resistant will have the nice side benefit of reducing the viability of BSing as a strategy.
I used to have students who would write answers like that on in-class exams.
Every answer was at least one full, complete sentence, even for yes/no or true/false. And the “short answer” responses filled all available space when one sentence would do.
My only conclusion is that some undergraduate institutions around the world must be intentionally drilling it into their students to do this.
I suspect it starts in high school. A lot of AP subjects with written portions like AP biology or history are really hard to grade at scale so they have a relatively naive scoring system. The answer can be a total rambling mess but as long as the answer is self consistent (it doesn’t contradict itself) it gets points for any relevant information it gets right.
For example, if the question is about respiration a rambling answer that mentions “oxygen transport chain”, “Krebs cycle”, and “ATP” might get 3/5 points even if it doesn’t make much sense otherwise as long as the answer doesn’t confuse the Calvin and Krebs cycle or otherwise contradict like saying that glucose is a byproduct.
I was told by multiple teachers/professors that its never acceptable to write anything other than a full sentence on a test (unless it's a scantron, obviously). Not sure how common this is, but they could have been trained by other instructors.
I think students also believe they can hedge. If they just put down "yes" or "no" then their answer might be completely wrong, but if they drop a bunch of things in the answer then some of those things might be true and you might give partial credit, or, at least, they can argue about it later.
Its possible. I've had proffesors who always gave true/false questions with instructions to either "justify your answer", or "if false, justify your answer".
Practically speaking, there is fairly little downside to putting in extra in your answer, as tests are normally scored by how many points in the grading rubric you hit.
> To be fair, I have seen the "ramble generically on the subject of the question" strategy manually implemented by humans too, in the hopes that if you throw enough BS at the question you might get partial credit by luck.
This is the basic speech strategy of politicians. Don't answer the question asked, just talk about something related that you want to talk about.
I don't think it'd do well even for an open-ended assignment. The best language models I've seen are still easy to detect as bots if you read multiple paragraphs of output.
> To be fair, I have seen the "ramble generically on the subject of the question" strategy manually implemented by humans too, in the hopes that if you throw enough BS at the question you might get partial credit by luck.
I had a college professor that knew to recognize this and actively warned against it during the mid-term and final.
He said that every question will be answerable in 2 or 3 sentences, and that if you write 2-3 paragraphs instead, he would mark you down even if the answer was correct because you're wasting his time and may have dropped in correct statements that answered the question by luck.
So often in school, we'd be getting quizzes/tests back, and I'd peek over at someone else's paper as it was being handed back and notice they wrote an entire paragraph to answer, whereas I answered it in a single sentence and got full credit for a correct answer, and I was always left wondering what the hell they wrote about.
When my parents were in school, they hand wrote essays and used type writers. Correcting a mistake meant rewriting an entire page! When they needed to research something, this meant spending a day in the library manually searching for quotes/citations. When I was in school I had a rudimentary spellchecker, Microsoft word, and Wikipedia.
Now a grade school student has access to grammarly. In a few years they’ll probably have automated fact checks and text generation.
What will happen? My bet is that we’ll expect a lot more from students a lot earlier.
Evidence is provided, of a sort. The first link goes to a report by a journalist who interviews redditors who claim they are doing this and talk about why.
Reddit is filled with shameless habitual liars who claim to be airline pilots in one thread then plumbers in another. The incentive structure of reddit, the internet point skinner box, incentivizes shameless lying and ""creative writing"".
There are many, many bot posts. More so re-posting other highly-upvoted comments from related threads, or re-posting previously posted pictures/videos/links, but bots are certainly farming karma.
It's hard to be sure. Just as in the (possibly apocryphal) quote from a Yosemite Park Ranger, "There is considerable overlap between the intelligence of the smartest bears and the dumbest tourists" there's considerable overlap between the best text generation bots and the dumbest Redditors.
Those with "karma" systems are particularly susceptible, including HN. But I think Reddit is even worse in this regard than HN because there are many times more users (usernames are less likely to be recognized across threads), and reddit makes it into more of a game with various kinds of flare and other 'rewarding' baubles.
One of those students doesn't mention using GPT to generate essays. The only mention generating lists and other short-response questions. I find that believable.
The other student mentions essay writing, but also says that they "didn't ace the essay" (no mention of the grade).
So, the article linked literally isn't evidence for the claim.
I agree it's not very good evidence that there's a real problem here, the articles and report are more of a good starting point for interesting discussion. On the other hand, the report isn't literally zero evidence either. There are students stating that they're doing this, even if they don't name GPT-3 specifically (does it really matter what model they use?).
> There are students stating that they're doing this
But there literally aren't. There are not students, quoted in that article, stating that they are "acing their homework by turning in machine-generated essays". Literally. There aren't.
I don't doubt that this is possible, in some sense, but the details really matter. Per my original comment:
>>> Pedagogically, this matters. Think about calculator usage. There's a huge difference between allowing use of TI-83 on Calculus assignment with lots of word-heavy application problems and allowing use of Wolfram Alpha on a Calculus assignment that's "integration recipe practice".
What was the assignment? What was the purpose of the assignment? What were the grading standards?
Eg, I have assigned homework that could be completed by a combination of Copilot and GPT-2. That homework was graded on a very coarse rubric. Today, a student could get an A on that assignment using GPT-2 and Copilot. If I were still teaching today I would not worry about it because:
1. they're only cheating themselves
2. they will still fail the course if they don't learn the material
3. it would save very little time to use those tools for these assignments. Maybe 5-10 minutes max, for a total of 5-10 assignments over the course of an entire semester that are collectively worth less than 1% of the final grade. So it's an hour and a negligible portion of their grade that will almost certainly be completely washed out in the curve/adjustments at the end of the semester (I don't do knife's-edge end of course grade assignments -- I identify clear bifurcations in the cohort and assign final letter grades to each bifurcation).
I believe copilot and gpt can do those assignments. I'm also 100% confident that those tools cannot complete -- and can barely even help -- with assignments that actually counted toward student's grades.
So, again, the context matters. Not all assignments are assessments and not all assessments need to be cheat-proof.
Acing a term paper that's 50% of the grade means something.
Acing a paper designed as an opportunity to practice and graded mostly for completion -- but with plenty of detailed feedback in preparation for a term paper -- doesn't really mean anything and really only cheats the student of feedback prior to the summative assessment.
This, btw, is why I'm more interested in what educators are saying than what students are saying. The teacher's intent for the assessment and the grading rubric matter a lot when determining what "getting an A" means. Acing a bulleted list graded for completion is possible with a 1990s Markov chain.
I bet a lot of students -- the laziest ones -- are getting "WTF is this essay even about... did you have a stroke while writing this?!" feedback if they are using LLMs to generate essays whole-cloth.
This comes across as very ill informed. I suggest you actually use some of the AI essay-writing services because they are pretty indistinguishable at this point from human writing.
I was going to play your game, but this product requires a valid email address and phone number, so generating examples from this product and sharing them here without doxing myself to an unknown company requires way too much effort.
Maybe you can help by copy and pasting the first 50 pages of output from that model for the prompt that is shared by a user below:
write a 50 page paper describing the impact of Teutoburg Forest on Roman politics.
IME, and I have lots of "E" (including training and finetuning my own models), this probably isn't entirely true in most writing-heavy courses. You can use the largest language models, in an interactive fashion, to generate portions of half-decent papers. Maybe even "ace" papers on certain subjects with certain grading criteria (namely, "looks reasonable", not "fine-toothed comb"). But in many courses the essays will be extremely low-quality, and even in the happy cases saying that the essay is "machine-generated" is eliding a lot of manual effort.
I agree that students are probably using LLMs for their homework, but I'm skeptical that they are all getting As on assignments that are designed as big assessments, or that the essays are actually fully machine-generated. I bet a lot of students -- the laziest ones -- are getting "WTF is this essay even about... did you have a stroke while writing this?!" feedback if they are using LLMs to generate essays whole-cloth.
Pedagogically, this matters. Think about calculator usage. There's a huge difference between allowing use of TI-83 on Calculus assignment with lots of word-heavy application problems and allowing use of Wolfram Alpha on a Calculus assignment that's "integration recipe practice".