CometML wants to do for machine learning what GitHub did for code

ellisv · on April 5, 2018

Data science has 3 areas for "versioning":

1. code versioning 2. data versioning 3. model versioning

Code versioning is primarily dominated by GitHub and is a fairly saturated space (Bitbucket, GitLab). Data versioning is either not happening, or being done through regular data pulls, database snapshots, etc. It is not well standardized or adopted. CometML is tackling model versioning.

It would be really nice to have a single solution for all of these but that is unlikely. Hopefully new standards evolve from this.

alexilliamson · on April 5, 2018

Nice breakdown. I agree that data versioning is the one area with limited standardized options. I would add that in addition to versioning the data, there is also the related problem of integrating the 3 areas of versioning... tying the "data version" to the "model version" and the "code version". That seems to me like it might be a good place to start in tackling data versioning, or is that too trivial? Is there a product out there that already does this?

jdoliner · on April 5, 2018

Pachyderm, a project I work on, is probably as close as you'll find to something that ties all 3 together. In my mind the major unsolved problem here was the data versioning so that's the first thing we tackled. Code versioning is already quite well solved so we just integrate with existing tools for that. I'm not convinced that model versioning is actually distinct from data versioning, models are just data after all. So I think without an established system for versioning models, such as Git + Github is for code, treating models as data and versioning them that way is good enough for government work. From what I can tell CometML isn't quite versioning models so much as tracking versions of models. It expects that models to be stored and versioned elsewhere but it gives you a way to get deeper insight into how those models are performing, how they're changing, the hyper-parameters used to train them etc. Tracking this is also a very important problem that CometML seems to solve quite elegantly.

davvolun · on April 5, 2018

Interesting. Can you point me to a deeper discussion of this division of "versioning"?

I'm inclined to think something like Django data migrations or EntityFramework Code First Migrations tackles what I immediately thought of as "model versioning" and to some degree "data versioning" (though incomplete or probably impossible, for some things).

gidim · on April 5, 2018

We actually do code and model versioning (and simple data versioning). One thing to keep in mind is that code/results/hyperparams must be coupled. If you have a git branch with some training code and you do not know what the hyperparams/results are then it's not very valuable.

sandGorgon · on April 5, 2018

Could you talk more about a simple data versioning architecture. I have wondered about this coupling problem and would love to hear more.

gidim · on April 5, 2018

I was mostly referring to coupling code with results. For example you have a code that loads a dataset from S3 and then trains a neural network. If you only use git you're likely to lose the hyperparms info (which is often passed as command line arguments) and your metrics/results.

zitterbewegung · on April 5, 2018

This software performs model versioning. https://mitdbg.github.io/modeldb/

nicodjimenez · on April 5, 2018

1. and 3. can be combined with git lfs

yodon · on April 5, 2018

Can you talk about how CometML fits into the real world state of ML training tracking, which is a pretty terrible Wild West of non-reproducible practices and processes?

There have been articles and comments here on HN about the sorry state of ML trackability, with papers being published on models whose training is not reproducible because no one really knows how it was trained. One in particular (I apologize for not having retained the link) described researchers starting with partially trained models they had lying around (with undocumented and unknown prior training applied), manually changing hyper parameters mid training while watching the learning progress, swapping different training sets in and out, and etc.

From what I see, the problems in ML reproducibility aren’t in the code, they are in the external human processes that are used to drive and train the models (essentially bad DevOps practices more than bad dev practices). Do you help with these kind of real-world trackability and reproducabilty scenarios?

gidim · on April 5, 2018

That's a great point! Machine Learning reproducibility is a huge problem, both in Academia and within companies. Comet.ml tracks every run of your script, the hyper-params (pulled automatically from the ML library when possible) and your results. So in the example you gave we would be able to track the DS loading pre-trained weights, manually changing hyperparams and the datasets hash.

In such complex cases there's still a little discipline required on the user behalf but overall it's just including a few calls to our Experiment object.

jdoliner · on April 5, 2018

My sense is that CometML is tackling some but not all of the problems you're talking about here. Specifically I think it's pretty focused on tracking ML models so you can keep track of how models are performing, how they're improving etc. But I think, and the other comment seems to suggest, that external systems and practices will be required to make it so these results can be reproduced. Making ML experiments reproducible is a different and fairly challenging problem. You need to have a way to essentially snapshot everything that goes into training a model, training data, code, tuning parameters maybe even specific hardware it needs to run on. As you mention, this is as much a DevOps problem as it is an ML problem, maybe even more so. I think you'd have good luck pairing CometML with a system like Pachyderm [0] (which, full disclosure, is a system I helped write). Pachyderm can handle the Devopsy reproducibility piece and be paired with CometML to get insight into what's actually going on with the models.

[0] https://github.com/pachyderm

laythea · on April 5, 2018

Im no expert but reading that it sounds like a bunch of people poking at knobs and dials in order to see which dials work and which don't, and then not understanding why...?

tedivm · on April 5, 2018

Their pricing is ridiculous. $50 per user per month for their cheapest plan that works with private spaces, $149 per user per month for self hosted.

If you're a company with 15 people working on a single model you're going to be paying more than a team with five people who have 20 different models they are working on. The actual load and cost for the service is completely detached from the price.

I was strongly considering looking at this for a project, but not at that cost.

Macuyiko · on April 5, 2018

Great to see ML "governance" work being done on the training-part of the pipeline. Seems like this provides a Domino Data Labs based dashboards but without the walled garden environment.

I've yet to see similar great initiatives also tackling the deployment-part. E.g. something similar you can stick on top of your model's API (or scheduled batch predictive outputs), as well as incoming instances, to monitor usage patterns, population shifts through time, probability distributions, newly popping up missing values or categorical levels, logs, etc, in order to provide warning lights to indicate that a retraining might be in order, for instance.

Google's "What's your ML test score" paper provides some great insights, but I hope someone will tackle this with a turnkey solution as well.

gidim · on April 5, 2018

Thanks! We indeed solve a similar pain point as Domino but we unlike them we allow you to train your models on your own infra/laptop.

As for monitoring production models that's something we're also working on. It was important to get the training part out first so we can measure those distributions changing.

gidim · on April 5, 2018

Hi, I’m one of the founders of Comet.ml. We built comet.ml to allow machine learning teams to automatically track their machine learning code, experiments, hyperparameters and results. We think that reproducibility is really important so we’re also giving free access to students, academics and open source projects.

Feedback is welcome. Ask me anything.

ah- · on April 5, 2018

Are you planning to open source it?

A lot of your competitors have, like http://pipeline.ai/, https://github.com/pachyderm/pachyderm and recently https://github.com/polyaxon/polyaxon.

mmq · on April 5, 2018

@ah- thanks for mentioning Polyaxon and congrats to the CometMl team for building this nice tool, it's good to see that many projects are trying to solve problems related to reproducibility in ML/DL, many people had to build an internal tool for the companies they work for to solve this issue, and many got frustrated after joining a new team and were not able to reproduce any results.

I would like to outline a couple of differences between CometML and Polyaxon, as mentioned before, we are also trying to solve issues related to technical debt in ML, but not only, Polyaxon tries also to simplify training and scheduling parallel and distributed learning. there are also a couple of differences, I see CometML as dashboard, Polyaxon does not have an extensive dashboard as CometML, but it leverages Tensorboard for most of the visualisations. We use the CLI or the API for programatic access to the platform. Most importantly, Polyaxon aims to be an open source and to be installed on premise or in the cloud, it solves the issue related to code tracking based on an internal git and a docker registry, and as someone else mentioned that resources for running an experiment could be an issue for future reproducibility, Polyaxon restarts the experiments with the same resources and dockerfiles, it also tracks hyper params as part of the configuration.

For hyper params tuning and suggestion, Polyaxon can also do hyper params search based on a couple of algorithms, and for the next release, it will include also a service similar to vizier for suggesting more experiments/group of experiments based on a given search space.

Disclaimer: I am the author of Polyaxon

jchung · on April 5, 2018

Looks awesome. I'd love to give it a try with my team. Would you be open to extending the free teams plan to high tech nonprofits in addition to the access you provide to students, academics, and open source projects? We tend to work on distributed projects frequently with industry experts doing pro bono work for us, and something like comet could simplify our collaboration. The size of these pro bono project teams tend to ebb and flow, much like an open source project, so effective collaboration tools are critical for ramping up new folks as well as retaining learning when folks cycle off.

gidim · on April 5, 2018

Sure. Shoot us an email and we'll get you started! [email protected]

shackenberg · on April 5, 2018

Hi, looks great! Love the feature of tracking code changes. But how does this work? Does it upload the code to the servers every time I launch a training?

gidim · on April 5, 2018

It depends if it's a git project but pretty much yes.

shackenberg · on April 5, 2018

So you would considered the latest committed version as the 'code of the current experiment'? Maybe to rephrase: I start a training with my local code and then I change one variable in the code or comment some processing and start the next training. Would I need to do something for comet to know how the code has changed?

gidim · on April 5, 2018

No. You'll be able to see both runs with the code diffs.

ReverseCold · on April 5, 2018

Does "did what GitHub did for code" sound like a negative thing to anyone else?

It turned a decentralized platform (git) into basically the only place individuals store code.

sbraden · on April 5, 2018

Since a few people commented on the cost of using CometML (or its competitors), I wanted to suggest an open source project (that's been around for a while) I found helpful for organizing ML experiments and tracking. It has two different frontends to choose from (I like SacredBoard). If you like open source this might be the ML experiment tracker for you!

edit (forgot the link): https://github.com/IDSIA/sacred

softawre · on April 5, 2018

Did you forget to suggest the project? Is it "sacred"?

https://github.com/IDSIA/sacred

andbberger · on April 5, 2018

+1 for sacred. It's changed my life.

henripal · on April 5, 2018

Cool, thanks.

For those of you who want to tinker, there's a much rougher, open source library based on Vuejs, postgres, and Flask with some momentum on GitHub right now, LabNotebook https://github.com/henripal/labnotebook

(Disclaimer: I'm one of the authors)

spraak · on April 5, 2018

I get so excited when I see ___ML but then I'm let down when I realize the context is Machine Learning instead of Meta Language. Not that Machine Learning isn't cool, but I really like ML family languages.

erk__ · on April 5, 2018

When I saw that name the first thing I thought about was some new SML compiler or like MoscowML.

jdpigeon · on April 5, 2018

I have been anxiously waiting for something like this! Very excited to try it out.

How does it handle data storage? Could we use CometML to store our continuously growing set of labeled data or is there a smart way to link it to Google Cloud Platform?

gidim · on April 5, 2018

We do not store your data as it's usually a very sensitive. You can host your data on GCloud or AWS and use Comet.ml to track where it was coming from and if it changed between experiments.

jorgemf · on April 5, 2018

I would like to use it but I think the price doesn't justify the tool. For a team of 5 people github is $25 a month, you are $745 a month. I do understand a price a bit higher that github but not 30 times more expensive.

shackenberg · on April 5, 2018

I like, that the price is so high. That makes it a least seem to be a sustainable business. And when you pay your ML people 10k a month, the $25 is less than the coffee they will drink in the office.

jorgemf · on April 6, 2018

145 per user per month, not 25. Not all countries pay 10k a month to their employees

gidim · on April 5, 2018

Thanks @jorgemf. Keep in mind that $745 also includes unlimited usage of our hyper-parameter optimization service.

maksimum · on April 5, 2018

Is hyper-parameter optimization a cherry on top or one of the core value propositions?

IME hyper-parameter optimization doesn't require much in terms of implementation effort (e.g. [1]), but requires compute. I would be surprised if a professional ML/DS user were to seriously consider paying for the implementation of the optimization.

[1] https://people.eecs.berkeley.edu/~kjamieson/hyperband.html

XnoiVeX · on April 5, 2018

Are there any similarities or distinctions between this article (link below) and how your system works to tune hyper-parameters?

https://blog.coast.ai/lets-evolve-a-neural-network-with-a-ge...

gidim · on April 5, 2018

This article seems to discuss genetic algorithms which could be used for hyperparam optimization. We use another method called Bayesian (GP) hyperparam optimization. According to our internal benchmarks and academic research Bayesian methods outperform genetic algorithms. Another thing to keep in mind that we automate the entire process for you. You only need to provide a list of parameters you'd like to tune.

jorgemf · on April 6, 2018

That service doesn't justify the price tag for me.

p1esk · on April 5, 2018

How do you handle dataset/checkpoint management and versioning? Ideally with powerful dataset filtering options. Right now we’re using excel and panda dataframes, but are interested if your tool does it well.

gidim · on April 5, 2018

Since we do not host your data we cannot provide filtering on the actual dataset content. We do allow you to track where the data was coming from and if it changed (by hash). Same for checkpoints, you can log their ___location (S3/local path) and hash.

XnoiVeX · on April 5, 2018

CometML vs Tensorboard? Thanks in advance for your feedback.

gidim · on April 5, 2018

Both CometML and Tensorboard help track metrics/weights during training in a similar way. CometML also tracks your hyperparams, code, dependencies. We also allow you to compare models, collaborate by sharing projects and experiments and much more. TB is an amazing tool but it doesn't help with reproducibility.

If you're already using Tensorboard just throw in our one liner: comet_ml.experiment(api_key="your-key") and you'll get everything TB gives you + our added value.

nicodjimenez · on April 5, 2018

That's cool! I also wrote my own service https://losswise.com

henripal · on April 5, 2018

Cool. I'm guessing from trying it out that you're using Highcharts. I've run into really unpleasant memory leaks/slowness when streaming data (especially when the existing chart is already thousands of data points). Are you seeing something similar?

nicodjimenez · on April 5, 2018

Yes using Highcharts. You've had issues with Highcharts? Yeah it's not designed to stream data extremely rapidly but it's a great "good enough" product, especially for something like Losswise where the differentiation is the overall design and architecture and developer experience, not the prettiest possible graphs.

henripal · on April 5, 2018

Yes... For example if I'm running three experiments at the same time, auto-refreshing the chart every two seconds, it essentially freezes the app to a crawl after a thousand points or so. So we reverted to manual updates.

If you know of any better alternatives for data streaming, I'm curious. I tried benchmarking a couple libs recently: https://github.com/henripal/ChartingLibBenchmark

TorsteinHonsi · on April 6, 2018

As a Highcharts developer, I had a look at your benchmarking, and have some thoughts about optimizing for Highcharts. The first step is to turn off animation, which helps a lot. The default Highcharts animation on addPoint is 250ms, so with a refresh rate of 100ms you will get a lot of redrawing going on for nothing. The second thing that possibly optimizes a bit is to use hard-coded axis values so that it doesn't have to recompute axis values for each iteration.

With those modifications the performance is much better: http://jsfiddle.net/highcharts/1o5ghqc8/

nicodjimenez · on April 5, 2018

Yeah I have no idea. If you need really high performance that Highcharts doesn't provide you probably need to write your own specialized charting library.

dspoka · on April 5, 2018

Do you have any information on pricing?

nicodjimenez · on April 5, 2018

Current feature set is completely free as the service is in early beta stage. We plan to charge later for additional features and services. The project started as an internal project for Mathpix.com and therefore already pays for itself.

suff · on April 5, 2018

Great idea if AWS had not already solved this with the release of SageMaker.

tedivm · on April 5, 2018

SageMaker is awesome but it's a very different product. There is nothing in SageMaker that lets you track performance over time and no fancy dashboard. That being said SageMaker's pricing is actually reasonable, and adding a dashboard on top of it shouldn't be that complicated.

nightski · on April 5, 2018

Sorry if this is off topic but does anyone else just instantly close the page or spew profanity when those stupid chat bubbles pop up on a page? No I don't want to talk to your ridiculous sales chat.

euler_ · on April 5, 2018

Is there an explore option like on github?