Hacker News new | past | comments | ask | show | jobs | submit login

I came here to post something similar. I'll add that many of the hard things we do with Git seem to do with re-ordering or re-combining of the underlying changes. If we want to make it easier to reason about changes to a set of changes, then I think we really want those changes to have some properties which they don't currently have.

It's powerful for Git to treat changes as line-by-line text diffs, because it allows us to manage changes to any textual data. But what if, instead, we borrowed an idea from distributed databases, and implemented all changes as commutative operations on a Conflict-free Replicated Data Type (CRDT)?

I think almost every example of difficult rebasing would get significantly easier, but at what cost? We'd have to completely rethink how we write programs, because this would drastically limit the types of changes to a program that were valid. I wouldn't be surprised if this would require us to develop in entirely new languages.

There might be some meat to this idea, but again, I don't think we'd get there by mining existing Git graphs.




Git doesn't operate on diffs. It stores full content using delta compression. Subtle difference, but it can create ours reverse merges that don't have a diff, but radically change the content of the repo.

What you're talking about is patch theory, which is used by darcs and pujil. Pujil does a better job of explaining the theory.

At the end of the day, the point of version control is to keep a universally consistent snapshot of a sequence of bytes. Patch theory only tells you how to resolve conflicts. TreeDoc, etc simply resolve the conflicts differently based on consistency of ordering, as patches may be applied out of order for it to be a CRDT.


Curious, have you compared this idea to what Darcs does (I don't know Darcs well enough to do justice to it, but it sounds related).


An example of what I'm thinking about, which I don't think Darcs can do (I'd love to be wrong):

Alice and Bob both branch off of master at the same point. In Alice's branch, she moves function `foo` into a different module/file. In Bob's branch, he changes `foo` to handle a new condition. Both wish to merge into master.

Whoever merges later is going to have a merge conflict, and have to resolve it manually, using their human understanding of the semantics of both changes. It's clear to me how that conflict should likely be resolved, but as long as those changes are presented as text diffs, I don't expect my VCS to be smart enough to figure that out on its own.

It would be interesting to explore other ways of representing changes, such that a computer would understand how to compose them in more situations like this.

You can quickly come up with examples of changes which conflict in a way that should probably always require human intervention: Say Alice and Bob each wish to assign the same constant to different values.

So, I don't expect that you could completely remove the need for developers to manually resolve tricky conflicts. At least, not without completely changing how we express changes to programs, which may well be a non-starter for practical purposes.


There is a product called semanticmerge that does this.


neat! thanks


I'm unfamiliar with Darcs, but thanks for calling it to my attention. Based on a quick look, it appears Darcs uses text diffs, so it's not quite what I'm talking about, but it's definitely interesting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: