As I'm working on a very similar problem right now, the difficulty is that to save the fitted sklearn model you have to pickle it (pickled decent size random forest are several megabytes). Then, at classification time, you have to import pickle, sklearn (and numpy), depickle the object, run the example through the classifier and extract the output. Perhaps the Openscoring model is more efficient?
You can use `all_model_filenames = joblib.dump(model, filename)` after fit on your dev enviroment. joblib will store each numpy array in the model datastructure as an independent file and `all_model_filenames[0] == filename` refers to the file holding the main pickle structure.
Then on your prediction servers, ensure that you have a copy of `all_model_filenames` in the same folder. You can then load the model with `model = joblib.load(filenames[0], mmap_mode='r')`. This will make it possible to use shared memory (memory mapping) for the model parameters of a large random forest so that all the Gunicorn, Celery or Storm worker processes running on the same server will use the same memory pages, making it a very efficient way to deploy large models on RAM constrained servers.
You can even use docker to ship the model as part of a container and treat the model as binary software configuration.
As I said, run a seperate service for this. That way you only have to load the model (or even train it) once per service process. That is one thing the Openscoring service also does...
If you are more familiar with Python than Java, like me, then that would be a more attractive option.