A reason to take this with a grain of salt: transcript length is the biggest technical effect in RNA sequencing. Longer transcripts get broken into more fragments and get sequenced more deeply. What this means is that if you perform any experiment you tend to get a "length effect" of some sort. The second feature they mention in the GTEx data is GC-content, which is probably the second biggest technical bias in RNA sequencing and again basically any experiment has a "GC-content effect" of some sort. But I don't interpret those as meaning that there is something directly acting on long transcripts or high-GC transcripts, rather that whatever is happening biologically ends up appearing as a length or GC effect after sequencing. It's a little fishy that the only features they find are features that I would expect to always find.
The most compelling reason to think that's not simply the case here is that seem to be noticing a consistent downward trend across all long transcripts with age which is more compelling than merely noting that long transcripts change (some up and some down).
It's a good point. Why would the length effect you describe be be associated with age, across many organs, cell types, datasets, and species? The technical effect would be a good explanation for this finding in one dataset, but it seems unlikely that many datasets would have a technical length effect that correlates with age by chance.
What I'm saying is that it looks like there's an effect and that effect is visible as a change in expression vs length but that I wouldn't expect it to be too related to length in a meaningful way biologically. If you take one population of transcripts and another and you measure the lengths, it's likely that you'll see a shift in the median - regardless of whether length is important, particularly due to the specific ways in which length relates to sequencing depth. And on top of that, comparing across genes requires compensating in some way for the length of the gene and it's not obvious how to do that correctly - could they be finding an artifact of how they normalized for length? (Eg: a "gene" actually doesn't have a single length, it's multiple possible variations in transcripts of different lengths and most reads from the sequencer is ambiguous as to which it came from. Quantify the different transcripts incorrectly - and it's impossible to do it correctly - and you may be mis-estimating the effective length and mis-normalizing.) It's a starting point of an investigation, not an end point.
(And they do try to take the next step to make that investigation and they report that they see a further decrease in a gene related to transcribing long transcripts. However it's 27th in their list of related genes and I'm not sure how unlikely having one of the top N genes has a reported connection to transcription. Hopefully they will follow up with a biological experiment involving knock-down of this gene and seeing an accelerated aging phenotype or something of that sort.)
The most compelling piece of evidence in my mind here is that the effects they report are consistent in direction across conditions. The most worrisome is that they tested a bunch of factors and the only ones they report as consistently informative are the ones that confound technical aspects the most and therefore are confounded with any number of underlying biological changes.
Thanks, your points make sense. It’s definitely a worrisome coincidence given the multiple tests they ran but didn’t correct for. I hope to see that knockout experiment you describe!
My understanding of the text, is that there is a reduced expression in long genes (or locus) when people age.
Is this correct?
The text mentions an ALS gene (FUS) and contains this sentence which I have problem to understand (I am not an English native): Furthermore, we observe an anticorrelation among neurodegenerative disorders such as amyotrophic lateral sclerosis (ALS) and Alzheimer’s disease.
Please, what does those findings mean for Amyotrophic lateral sclerosis (ALS) and Alzheimer’s disease?
It means people with ALS don't get Alzheimer's according to a correlation. Now, how is the inherent lifetime and lifestyle difference bias corrected? No idea.
I think it's actually a poorly worded sentence and it's saying that there's an anticorrelation between gene length and the relative expression levels between healthy people and those with ALS (and also between healthy people and those with Alzheimer's). This means that ALS and Alzheimer's both have similar effects as aging does and decrease expression of longer genes, according to their study. The sentence before it seems to be specifying the "anticorrelation" that they're talking about and the sentence in question is saying it holds in ALS and Alzheimer's as well as in aging.
Yes, so longer genes are more likely to be impacted as expected given genomic damage theories and random damage model. (Likewise if transcription cellular machinery starts to introduce more random errors longer proteins would be impacted more.)
What to do about it? Edit whole genomes so they're more stable somehow? You cannot reasonably expect to remove all environmental insults.
There's such a thing as gene over-expression to artificially increase the number of transcripts of a gene expressed. It's generally less easy or reliable than gene under-expression where you interfere with the expression. Doing it to get all of the affected genes back to their "healthy" levels at once would be very challenging. There's about ~30,000 genes total and at one point the study was looking at the top and bottom 5% by length of those genes so you'd be looking to over-express 1500 genes - I've never heard of a study doing anything like that but it might be possible.
On top of that, there are also feedback loops so if you put more of a certain transcript in it may induce more/less of another or get the cell to stop production of that transcript and therefore counter-act what you've done. So it would be extremely hard to get to the desired levels in all of them.
Makes sense, I wonder if it would be feasible to reduce the number of genes to be over expressed by correlation with known medical conditions e.g. cardiovascular disease, kidney disease, or other cosmetic signs of aging.
I lightly follow the current understanding of telomerase, which is responsible for adding back the “buffers” (telomeres) at the end of genes that RNA can’t copy. It was clear this was part of what we’d need for anti-aging treatment, but last I heard we didn’t understand the mechanisms or how to activate them. Is transcriptome part of this process? It wasn’t immediately clear from the article.
The most compelling reason to think that's not simply the case here is that seem to be noticing a consistent downward trend across all long transcripts with age which is more compelling than merely noting that long transcripts change (some up and some down).