More

goodmattg · 2024-07-25T15:20:20 1721920820

I love this! looked into doing a similar project, you're competing against Hudl but using the phone instead of custom hardware (always preferred). Highlight segmentation may be a challenge with SOTA cv methods, but there are lot of directions you can go in.

grub007 · 2024-07-25T17:47:06 1721929626

Yeah Hudl is definitely a beast in the space though I think of them as almost more focused on orgs and teams rather than consumers.

Re: highlight segmentation was and still is a challenge that we work on. In the beginning we had a hard time dealing with false positives when our models thought a shot was in but it wasn't. This has gotten better over time with more data and is hovering near 92% acc these days but obv not 100%. But i'm optimistic as data grows and the sota methods get better, this isn't the hardest ml problem in the world

goodmattg · on May 8, 2024

A few thoughts as I've done academic research & built products in this area:

- if you're using SMPL body parameters this will have to stay research / open-source - is this leveraging some sort of monocular depth estimation to estimate the wall in 3D space? Also, do you have assumed camera parameters, or is that also estimated? If there isn't any depth information, this will be highly inaccurate on any cliff routes, but still useful on flat wall climbing.

Overall, a good idea (that I've also thought about building as a climber) - the tricky part that I'm impressed you have a solution to is path planning up the wall. Even assuming a flat wall with no depth estimation, it's still looks effective.

smandava · on May 9, 2024

Yes, this will be open source.

This is an end-end system that just takes in video frames. Camera parameters are one of the things that is predicted. It gives promising results for a wide variety of environments (cliffs, diff types of bouldering walls, diff outdoor walls, etc.), though not always accurate. Path planning is also part of the end-end system. Will share more details in the paper.

goodmattg · on May 9, 2024

sweet can't wait to read it! I'll also be at CVPR this year if you're presenting

ethbr1 · on May 8, 2024

> the tricky part that I'm impressed you have a solution to is path planning up the wall.

I'm assuming this is evolutionary / brute force of some nature, given OP's comments about it being expensive to run.

syntaxing · on May 8, 2024

Not OP, but why does it have to be open sourced? Copy left license?

goodmattg · on May 8, 2024

commercial license - the research group formed a corporate entity that licenses the body model and all derived work (SMPL-X, etc.): https://meshcapade.com/SMPL

goodmattg · on April 25, 2024

sell enough units to convince Khosla Ventures that you aren't just burning their money, forcing them to double down on the vision and bring in new investors for the next round. Rinse and repeat until the Large Action Model is either extraordinarily valuable or you've torched all the money.

goodmattg · on April 3, 2024

It's true, lack of highly available public transportation between the cities puts us behind other regions (not to mention other countries). I've been doing the ATX -> HTX drive for years.

Best private alternative I've found is Vonlane (https://vonlane.com/) - takes longer than flying but it's a business class bus so you can get work done.

goodmattg · on Feb 6, 2024

This is a stellar project

goodmattg · on Dec 28, 2023

Joining the nuance parade - Google expanded their entry points to Search years ago via Maps and the URL bar.

Do I use Google traditional Google Search (google.com) to find things? Rarely. Do I use Google Maps to find bars / restaurants / order food? All the time.

I guess the nuance is what we're trying to find. LLM's swallowed the listicle - I don't need google.com to find an autogenerated list of "best restaurants in London". But if I'm in London at at lunch, and I need a coffee nearby, it's still useful.

goodmattg · on May 15, 2023

I also purchased and read this. If you enjoy the minutae of scientific advances you’ll love this, but not a page-turner. Also a helpful scientific / historical primer for anyone who wants to understand graphics (pixels, shaders, etc)

goodmattg · on May 12, 2023

This is great, but how is your performance on multi-table joins? We've been working on a homebrew solution using OpenAI internally conditioned on our schema, and can't bridge the multi-table gap.

Also I'm skeptical if this generalizes, do you have measures in place to prevent query hallucination?

akisej · on May 15, 2023

On simpler multi-table joins we've been able to product good results, and we've done a lot of prompt engineering to make sure it takes the schema very seriously so that prevents hallucinations too. We're always finding new edge cases and fixing those as we go.

goodmattg · on Dec 21, 2022

I understand this is point-cloud diffusion but Ajay Jain et. al. (BAIR + Google Research) accomplished the first version I saw of this back in June with their Dream Fields paper (CPVR'22).

As always this goes to show that if you can't be the first, be the loudest. OpenAI has the most well oiled media machine I've seen in awhile.

Seeing the waves of publicity OpenAI gets with every new release, I think we're seeing a new model for big-tech AI research groups. It isn't enough to just hire world-class research talent that publish area-defining papers. There has to be a commensurate investment in media to publicize the research. Obviously, if you don't have the research, you have nothing to market. But it should say something that OpenAI prioritizes great design, communication, and publicity in addition to the world-class research team. It wouldn't surprise me if we see Google AI / DeepMind / FAIR / double-down with their own investments to expand the media presence of their AI orgs.

[1] https://ajayj.com/dreamfields

goodmattg · on June 14, 2022

Uses coordinate-based neural networks to model the scene volumetrically. However, in the case of this paper does not use an MLP to represent the scene. Instead, proposes to directly learn a voxel grid representation.

For an excellent review check out Advances in Neural Rendering: https://arxiv.org/abs/2111.05849

chroem- · on June 14, 2022

> learn a voxel grid representation

But isn't that what photogrammetry does?

rasz · on June 14, 2022

I think photogrammetry produces point clouds

CharlesW · on June 14, 2022

Yes, and then polygonal models (and other things) are built from those.

For anyone who wants a more technical dive into the photogrammetry pipeline, here's a video I made for a company called Mapware for NVIDIA GTC 21: https://youtu.be/ktDVWzshR4w?t=331

Frost1x · on June 15, 2022

Some techniques for downsampling point clouds use voxelgrid representations but in general you're mapping pixel data from varied images to each other in space and producing points from that to try and capture surface geometry.

CyberDildonics · on June 14, 2022

Typically it creates polygonal models with the photos used to directly texture them.

fxtentacle · on June 14, 2022

so basically Agisoft Photoscan, a photogrammetry software based on casting rays through a voxel grid?

chpatrick · on June 15, 2022

That's not how Photoscan works.

fxtentacle · on June 16, 2022

But it does? Agisoft will first estimate depth maps and then project them into a voxel volume for extracting the high-resolution mesh. Debug logging even lists the voxel grid dimensions.