Nice breakdown. I agree that data versioning is the one area with limited standa...

jdoliner · on April 5, 2018

Pachyderm, a project I work on, is probably as close as you'll find to something that ties all 3 together. In my mind the major unsolved problem here was the data versioning so that's the first thing we tackled. Code versioning is already quite well solved so we just integrate with existing tools for that. I'm not convinced that model versioning is actually distinct from data versioning, models are just data after all. So I think without an established system for versioning models, such as Git + Github is for code, treating models as data and versioning them that way is good enough for government work. From what I can tell CometML isn't quite versioning models so much as tracking versions of models. It expects that models to be stored and versioned elsewhere but it gives you a way to get deeper insight into how those models are performing, how they're changing, the hyper-parameters used to train them etc. Tracking this is also a very important problem that CometML seems to solve quite elegantly.