Ask HN: Any good self-hosted image recognition software?

yeldarb · on Sept 22, 2022

Roboflow does! You can train your own model or choose from over 10,000 others have trained & shared[1].

To self-host just run

npx @roboflow/inference-server

Then you can POST an image at any of the models to localhost:9001 eg

base64 yourImage.jpg | curl -d @- "http://localhost:9001/some-model/1?api_key=xxxx"

And you get back JSON predictions.

There are also client libs[2] and sample code[3] for pretty much any language you might want to use it in. You can also run any of the models directly in a browser with WebGL[4], in a native mobile app[5], or on an edge device[6].

[1] https://universe.roboflow.com

[2] https://github.com/roboflow-ai/roboflow-python

[3] https://github.com/roboflow-ai/roboflow-api-snippets

[4] https://docs.roboflow.com/inference/web-browser

[5] https://docs.roboflow.com/inference/mobile-ios-on-device

[6] eg https://docs.roboflow.com/inference/luxonis-oak

throwaway675309 · on Sept 24, 2022

I see references to API keys, so I'm a little confused when you say that it's self hosted. If it makes calls to your servers then it's not really self hosted. Navigating to their site for pricing seems to indicate that this is definitely not something that you can run in a self-contained manner but I would love to be proven wrong.

I also really think that you should've at least put a disclaimer that you're affiliated with them.

EDIT: All the deployment options (post model training) seem to indicate that there is pricing involved.

giantg2 · on Sept 22, 2022

Wow, thanks for all the info!

fhaltmayer · on Sept 22, 2022

Usually this is done in three steps. The first step is using a neural network to create a bounding box around the object, then generating vector embeddings of the object, and then using similarity search on vector embeddings.

The first step is accomplished by training a detection model to generate the bounding box around your object, this can usually be done by finetuning an already trained detection model. For this step the data you would need is all the images of the object you have with a bounding box created around it, the version of the object doesnt matter here.

The second step involves using a generalized image classification model thats been pretrained on generalized data (VGG, etc.) and a vector search engine/vector database. You would start by using the image classification model to generate vector embeddings (https://frankzliu.com/blog/understanding-neural-network-embe...) of all the different versions of the object. The more ground truth images you have, the better, but it doesn't require the same amount as training a classifier model. Once you have your versions of the object as embeddings, you would store them in a vector database (for example Milvus: https://github.com/milvus-io/milvus).

Now whenever you want to detect the object in an image you can run the image through the detection model to find the object in the image, then run the sliced out image of the object through the vector embedding model. With this vector embedding you can then perform a search in the vector database, and the closest results will most likely be the version of the object.

Hopefully this helps with the general rundown of how it would look like. Here is an example using Milvus and Towhee https://github.com/towhee-io/examples/tree/3a2207d67b10a246f....

Disclaimer: I am a part of those two open source projects.

giantg2 · on Sept 23, 2022

Thanks!

HanClinto · on Sept 22, 2022

If you can do it without machine learning, then OpenCV and Python is the easiest way to go. Example of finding an object in a known reference image ("needle") in another target image ("haystack") with OpenCV: https://docs.opencv.org/3.4/d1/de0/tutorial_py_feature_homog...

If you need something a little more complex or that can recognize a wider variety of object variations within a single class, then you might like to experiment with something like Teachable Machines to see how you can train your own machine learning model. You can then export and download this trained model and run it locally with something like Python or Javascript on your own computer: https://teachablemachine.withgoogle.com/

Use that site to capture images from your web camera to find examples of each class of object and see if this tool can work for you.

giantg2 · on Sept 22, 2022

Thanks! I'll look into these.

csteubs · on Sept 22, 2022

Liner.ai and Lobe.ai are both solid GUI-based platforms that run well on desktops. Your hardware will of course play a role in training speed, but it still runs well (albeit a bit slow) on an 2014-era HP tower I use for Windows. Heads up that Lobe is a Microsoft project and doesn't support the newer M-series Apple chips.

danbrooks · on Sept 22, 2022

Sounds like an "object detection" task. There are various python libraries (i.e. PyTorch, Tensorflow) that work with this sort of thing. You might need to create a training set.

giantg2 · on Sept 22, 2022

There's a little more than object detection since I then want individual identification (tell one object of the same type from others).

danbrooks · on Sept 23, 2022

Two ways to do that: 1. Separate classes for each type 2. Object detection, then classification. This is commonly used for street sign identification

astrange · on Sept 22, 2022

The keyword for that is instance or panoptic segmentation.

fiat_fandango · on Sept 22, 2022

I've been looking for tools that make it possible to overlay graphics onto someone's face in real-time. Basically facebook's AR suite but something I can a) use at 1080p and b) actually record from or pipe into OBS.

giantg2 · on Sept 23, 2022

Thanks!

oth001 · on Sept 22, 2022

YOLOv5?

giantg2 · on Sept 22, 2022

Thanks!

Kalanos · on Sept 22, 2022

tensorflow is just a downloadable python library.

giantg2 · on Sept 22, 2022

I guess I'm confusing this with a Google Cloud service. I'll look into this further. Thanks!

NortySpock · on Sept 22, 2022

OpenCV and Python?

giantg2 · on Sept 22, 2022

Thanks!