But this is technically an invalid answer to the problem of file compression. An...

But this is technically an invalid answer to the problem of file compression. An easy (!) way for data to be losslessly compressed would just be to have a registry of all files in existence, and to refer to each file by its index in the registry. (Like the prisoners who had told each other all their jokes so many times that now they could just quote them by number and everyone would laugh.) Data compression is supposed to be able to deal with novel but likely documents without needing to add them to a global registry.