Hacker News new | past | comments | ask | show | jobs | submit | goodmattg's comments login

I love this! looked into doing a similar project, you're competing against Hudl but using the phone instead of custom hardware (always preferred). Highlight segmentation may be a challenge with SOTA cv methods, but there are lot of directions you can go in.


Yeah Hudl is definitely a beast in the space though I think of them as almost more focused on orgs and teams rather than consumers.

Re: highlight segmentation was and still is a challenge that we work on. In the beginning we had a hard time dealing with false positives when our models thought a shot was in but it wasn't. This has gotten better over time with more data and is hovering near 92% acc these days but obv not 100%. But i'm optimistic as data grows and the sota methods get better, this isn't the hardest ml problem in the world


A few thoughts as I've done academic research & built products in this area:

- if you're using SMPL body parameters this will have to stay research / open-source - is this leveraging some sort of monocular depth estimation to estimate the wall in 3D space? Also, do you have assumed camera parameters, or is that also estimated? If there isn't any depth information, this will be highly inaccurate on any cliff routes, but still useful on flat wall climbing.

Overall, a good idea (that I've also thought about building as a climber) - the tricky part that I'm impressed you have a solution to is path planning up the wall. Even assuming a flat wall with no depth estimation, it's still looks effective.


Yes, this will be open source.

This is an end-end system that just takes in video frames. Camera parameters are one of the things that is predicted. It gives promising results for a wide variety of environments (cliffs, diff types of bouldering walls, diff outdoor walls, etc.), though not always accurate. Path planning is also part of the end-end system. Will share more details in the paper.


sweet can't wait to read it! I'll also be at CVPR this year if you're presenting


> the tricky part that I'm impressed you have a solution to is path planning up the wall.

I'm assuming this is evolutionary / brute force of some nature, given OP's comments about it being expensive to run.


Not OP, but why does it have to be open sourced? Copy left license?


commercial license - the research group formed a corporate entity that licenses the body model and all derived work (SMPL-X, etc.): https://meshcapade.com/SMPL


sell enough units to convince Khosla Ventures that you aren't just burning their money, forcing them to double down on the vision and bring in new investors for the next round. Rinse and repeat until the Large Action Model is either extraordinarily valuable or you've torched all the money.


It's true, lack of highly available public transportation between the cities puts us behind other regions (not to mention other countries). I've been doing the ATX -> HTX drive for years.

Best private alternative I've found is Vonlane (https://vonlane.com/) - takes longer than flying but it's a business class bus so you can get work done.


This is a stellar project


Joining the nuance parade - Google expanded their entry points to Search years ago via Maps and the URL bar.

Do I use Google traditional Google Search (google.com) to find things? Rarely. Do I use Google Maps to find bars / restaurants / order food? All the time.

I guess the nuance is what we're trying to find. LLM's swallowed the listicle - I don't need google.com to find an autogenerated list of "best restaurants in London". But if I'm in London at at lunch, and I need a coffee nearby, it's still useful.


I also purchased and read this. If you enjoy the minutae of scientific advances you’ll love this, but not a page-turner. Also a helpful scientific / historical primer for anyone who wants to understand graphics (pixels, shaders, etc)


This is great, but how is your performance on multi-table joins? We've been working on a homebrew solution using OpenAI internally conditioned on our schema, and can't bridge the multi-table gap.

Also I'm skeptical if this generalizes, do you have measures in place to prevent query hallucination?


On simpler multi-table joins we've been able to product good results, and we've done a lot of prompt engineering to make sure it takes the schema very seriously so that prevents hallucinations too. We're always finding new edge cases and fixing those as we go.


I understand this is point-cloud diffusion but Ajay Jain et. al. (BAIR + Google Research) accomplished the first version I saw of this back in June with their Dream Fields paper (CPVR'22).

As always this goes to show that if you can't be the first, be the loudest. OpenAI has the most well oiled media machine I've seen in awhile.

Seeing the waves of publicity OpenAI gets with every new release, I think we're seeing a new model for big-tech AI research groups. It isn't enough to just hire world-class research talent that publish area-defining papers. There has to be a commensurate investment in media to publicize the research. Obviously, if you don't have the research, you have nothing to market. But it should say something that OpenAI prioritizes great design, communication, and publicity in addition to the world-class research team. It wouldn't surprise me if we see Google AI / DeepMind / FAIR / double-down with their own investments to expand the media presence of their AI orgs.

[1] https://ajayj.com/dreamfields


Uses coordinate-based neural networks to model the scene volumetrically. However, in the case of this paper does not use an MLP to represent the scene. Instead, proposes to directly learn a voxel grid representation.

For an excellent review check out Advances in Neural Rendering: https://arxiv.org/abs/2111.05849


> learn a voxel grid representation

But isn't that what photogrammetry does?


I think photogrammetry produces point clouds


Yes, and then polygonal models (and other things) are built from those.

For anyone who wants a more technical dive into the photogrammetry pipeline, here's a video I made for a company called Mapware for NVIDIA GTC 21: https://youtu.be/ktDVWzshR4w?t=331


Some techniques for downsampling point clouds use voxelgrid representations but in general you're mapping pixel data from varied images to each other in space and producing points from that to try and capture surface geometry.


Typically it creates polygonal models with the photos used to directly texture them.


so basically Agisoft Photoscan, a photogrammetry software based on casting rays through a voxel grid?


That's not how Photoscan works.


But it does? Agisoft will first estimate depth maps and then project them into a voxel volume for extracting the high-resolution mesh. Debug logging even lists the voxel grid dimensions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: