"Mayo’s LLM split the summaries it generated into individual facts, then matched those back to source documents. A second LLM then scored how well the facts aligned with those sources, specifically if there was a causal relationship between the two."
It doesn't sound novel from the article. I built something similar over a year ago. Here's a related example from Langchain "How to get a RAG application to add citations" https://python.langchain.com/docs/how_to/qa_citations/
I don't think you're getting it, it's not traditional RAG citations.
They are checking the _generated_ text by trying to find documents containing the facts, then rating how relevant (casually related) those facts are. This is different from looking up documents to generate an answer for a prompt. It's the reverse. Once the answer has been generated they essentially fact check it.
It doesn't sound novel from the article. I built something similar over a year ago. Here's a related example from Langchain "How to get a RAG application to add citations" https://python.langchain.com/docs/how_to/qa_citations/