Right now it runs in a dev-mode on a single EC2 t3.large instance with loadavg ~0.30, but the inference load is quite tiny right now: around 3-4 reranking requests per second. And yes, as a typical open-source project it still crashes from time to time :)
There is an option to run this thing in a distributed mode:
* training is done using a separate batch job running on Apache Flink (and on k8s using flink's integration)
* feature updates are done in a separate streaming Flink job, writing everything in Redis
* The API fetches latest feature values from Redis and runs the ML model.
The dev-mode I've mentioned earlier is when all these three things are bundled together in a single process to make it easier to play with the tool. But we didn't spent much time testing distributed setup, as this thing is still a hobby side-project and we're limited in time spent developing it.
From reading some of the repository and architecture overview, I think this is true, but: could you confirm that users of metarank can self-train their own models from scratch?
What budget for cloud infrastructure for 100K/mo buyers to an ecommerce website, approximate range, with typical purchase habits? I am new to Flink. We use Redis in production.
The training dataset is not that huge (see https://github.com/metarank/ranklens/ for details, it's open-source), so we do a full retraining directly on the node right after the deployment, and it takes around 1 minute to finish. We also run the same process inside the CI: https://github.com/metarank/metarank/blob/master/run_e2e.sh
There is an option to run this thing in a distributed mode:
* training is done using a separate batch job running on Apache Flink (and on k8s using flink's integration)
* feature updates are done in a separate streaming Flink job, writing everything in Redis
* The API fetches latest feature values from Redis and runs the ML model.
The dev-mode I've mentioned earlier is when all these three things are bundled together in a single process to make it easier to play with the tool. But we didn't spent much time testing distributed setup, as this thing is still a hobby side-project and we're limited in time spent developing it.