> In fact, for lossless compression, it’s proven that we can’t do much better th...

wwwhizz · on Oct 18, 2019

There is a whole scientific field on this, called Information Theory. Compressing entropy is limited by its randomness.

For instance see: https://en.wikipedia.org/wiki/Entropy_%28information_theory%...

and

https://en.wikipedia.org/wiki/Kolmogorov_complexity

throwaway_bad · on Oct 18, 2019

What if most of the data humans care about is not random?

For example to recover the state of an entire simulated universe you just need the value of the initial seed and the generator.

dredmorbius · on Oct 18, 2019

Different classes of data compress differently.

For complex reason, human language (spoken and written) is about 50% redundant, across a wide range of independent languages.

Tabular data can be vastly more compressible, and I'd routinely see 90% or better compression across a range of datasets (mostly business, financial, and healthcare data). Data of highly random events might be somewhat less so.

Image, audio, and video data, when in codecs is already highly compressed. When you're working with raw (WAV, TIFF, BMP, RAW) datatypes, there's a huge opportunity for compression, but mp3, ogg, mp4, png, gif, jpg, etc., are pretty highly compressed. There's a distinction between lossy (jpg, mp3) and lossless (png, AALC) formats. You get smaller files with lossy formats, but you're actually losing some of the original data, whilst lossless codecs allow fully reconstruction of the original source image, audio, or video.

Your comment about simulated universes gets to a key philosophical question about information, truth, and models. Generally, any representation we have of the universe is at best an abstraction of it, and hence a small, lossy, model.

This needn't necessarily be the case:

https://en.wikipedia.org/wiki/On_Exactitude_in_Science

strbean · on Oct 18, 2019

I think statements about theoretical limits on compression are ignoring emergent properties. We know you can "compress" certain things infinitely; for example, the Mandelbrot set.

Compressing arbitrary inputs using emergent properties may never be practical, but in it seems reasonable that you could trade computation for compression to an arbitrary extent (searching an emergent series for chunks of data that match your input).

willis936 · on Oct 18, 2019

No information we capture has infinite precision anyway, so fractal based compression falls under lossy compression. It is a much more complicated task to identify fundamental limits on lossy compression performance and an even harder task to have a collective agreement on "good enough" for a given purpose.

throwaway_bad · on Oct 18, 2019

I was actually thinking more about natural languages than fractals. Maybe human thought is so utterly derivative you can just generate a random stream of words and it will contain most of the text that humanity will ever produce.

Then it can be compressed down to an index into the libraryofbabel

Aperocky · on Oct 18, 2019

It’s basically conservation of energy/entropy being the reason. As in you cannot violate physics.