Simple pixel-by-pixel "diffs" are of course possible, but only useful in trivial cases.
What if someone changed the color scheme (which affects almost all pixels)? What if someone moved some part of the image to another part? What about resolution/size changes and multiple layers? What about vector graphics?
And the most important thing: How to display that in a way to be easily understood by non-technical people?
For hard-core techies, there is of course the option to use a text-based image format. For raster graphics, the "plain" variants of PNM come to mind. For vector graphics, SVG or EPS might be good choices. Then, a good (indention-aware) textual diff should produce sensible results - especially if only details were changed.
"merge"
Automatic merges are only useful if they aren't too "clever". That's important for text and especially important for graphics.
So if two distinct areas of an image are edited, a simple merge can and will work. But overlapping changes or even global changes should always result in a conflict.
However, if only trivial merges are desirable, most changes will cause conflicts, which would be not much different from the current situation.
Also, that kind of merges will already happen automatically with the current (text) merge anyway, provided that formats like PNM, SVG and EPS are used.
I don't think image comparison is the problem here. The problem is that designers works in rather different flows. To us a file can be many different states at once.
Kind of like a developer would have 10KLOC but be commenting the 8K of them in and out all the time.
In other words the document have many states at once depending on what you make visible or not.
Both problems seem "hard but solvable" to me. Isn't there software doing that already?