Hacker News new | past | comments | ask | show | jobs | submit | mlepath's comments login

This is an awesome project! Thanks for taking time to document this. What's next on your plate? How do we follow you?


All companies that are doing something with the current iteration of "AI" are underwater. Sam Altman says that the pro tier of ChatGPT is losing money, Adobe is losing tons on firefly, ... This is pretty typical for silicon valley though, we always burn investor money to corner the market and then tech usually catches up. Most enterprises don't need to be first adopters.


> This is pretty typical for silicon valley though, we always burn investor money to corner the market and then tech usually catches up.

Yes though the cost breakdown has traditionally been large upfront development costs and low to moderate running costs. This time around the running costs are astronomical and Moore's law ain't what it used to be.


It seems like a repeat of Uber's play but at 10x scale. Lose 10s of billions of dollars scaling a product that loses you money on every sale in the hopes that it will position you well for the massive disrupter of [self-driving | AGI]. Uber's play didn't shake out so now they are very very slowly digging out of a 30bn hole. I guess its just a question of how big the AI hole gets before they either make AGI or give up and start shoveling.


While there is no reasonable explanation for why Uber can't just turn a profit (and it looks like they have for the last few years), deep-learning models have very hard physical constraints on how much they cost.


I would argue that physical cars driven by people have much harder constraints on cost than LLMs which can see huge cost savings (for the same performance model) as hardware improves. I agree that they aren't perfect parallels but in principle there's nothing stopping AI companies from massively cutting R&D and raising prices until marginal revenue is positive, it just would mean accepting not getting "take over the world" level profitability or getting run out of town by someone willing to keep burning money.


> we always burn investor money to corner the market and then tech usually catches up

This is most R&D. You research, build a prototype, bring it to market, and it only hits profitability at volume.


Sam Altman’s admission that ChatGPT Pro loses money was about operating profits, not including the R&D that went into it.


"We don't make money off the $200/mo option" is embarrassing.


Ah, the WeWork strategy...


In ML everything is a tradeoff. The article strongly suggests using dot product similarity and it's a great metric in some situations, but dot product similarity has some issues too: - not normalized (unlike cosine simularity) - heavily favors large vectors - unbounded output - ...

Basically, do not carelessly use any similarity metric.


Traditional word embeddings (like word2vec) were trained using logistic regression. So probably the closest would be σ(u.v), which is of course nicely bounded.

(The catch is that during training logistic regression is done on the word and context vectors, but they have a high degree of similarity. People would even sum the context vectors and word vectors or train with word and context vectors being the same vectors without much loss.)


Awesome project! I love the idea of not sending my data to a big company and trust their TOS.

The effectiveness of coding assistant is directly proportional to context length and the open models you can run on your computer are usually much smaller. Would love to see something more quantified around the usefulness on more complex codebases.


I hope for proliferation of 100% local coding assistants, but for now the recommendation of "Works best on $10K+ GPU" is a show stopper, and we are forced to use the "big company". :(


It’s not really that bad. You can run some fairly big models on an Apple Silicon machine costing £2k (M4 Pro Mac Mini with 64GB RAM).


The author appears to be confused about the difference between research and production. In research more generic approaches typically win because they get resourced much better (plus the ever-growing compute and data have helped, but there is no guarantee that these will continue).

On production side of "AI" (I don't love the term being thrown around this loosely as true AI should include planning, etc not just inference) the only question is how well do you solve the one problem that's in front of you. In most business usecases today that problem is narrow.

LLMs drive a minuscule (but growing) amount of value today. Recommender systems drive huge amount of value. Recommender systems are very specialized.


> This reinforce the idea that there is no real strategic advantage in owning a model

For these models probably no. But for proprietary things that are mission critical and purpose-built (think Adobe Creative Suite) the calculus is very different.

MS, Google, Amazon all win from infra for open source models. I have no idea what game Meta is playing


> I have no idea what game Meta is playing

Based on their business moves in recent history, I’d guess most of them are playing Farmville.


Meta's entire business model is to own users and their content.

Whether it be Facebook, Instagram, Threads, Messenger, WhatsApp, etc. their focus is to acquire users, keep them in their platforms, and own their content - because /human attention is fundamentally valuable/.

Meta owns 40% of the most popular social media platforms today, but their attention economies face great threats: YouTube, TikTok, Telegram, WeChat, and many more threaten to unseat them every year.

Most importantly, the quality of content on these platforms greatly influences their popularity. If Meta can accelerate AI development in all forms, then it means the content quality across all apps/platforms can be equalized - video on YouTube or TikTok will be no more high quality than on Facebook or Instagram. Messages on Threads will be no more engaging than that on Twitter. Their recent experiments with AI generated profiles[0] signals this is the case.

Once content quality - and luring creators to your platform - are neutralized as business challenges that affect end users lurking on the platform and how effectively they can be retained, then it becomes easier for Meta to retain any user that enters their platforms and gain an effective attention monopoly without needing to continue to buy apps that could otherwise succeed theirs.

And so, it is in their benefit to give away their models 'for free', 'speed up' the industry's development efforts in general, de-risk other companies surpassing their efforts, etc.

[0] https://thebaynet.com/meta-faces-backlash-over-ai-generated-...


Or, to put it another way:

Meta makes money from ads. To make more money, they either need to capture more of their users' time and show more ads, or show better ads that users click more often.Meta is betting on AI models making it easier to do both.

Better generative AI means you can make more ads faster, which means there are more ad variants to a/b test across, which means it's easier to find an ad that users will click.

To make users stay on their platforms, Meta figures out what content will keep them there, and then shows them that content. Before gen AI, they were only able to show existing content from real users, but sometimes the "ideal" thing for you hasn't been created yet. They bet on the fact that they'll be able to use AI to create hyper-personalized content for their users that engages them better than human-made content.


Word. I was mostly just making a joke about FarmVille— the classic engagement-vampire facebook game.


Can you explain how development of better generative AI (which I assume is what you mean when you say AI) will mean that “content quality across all apps/platforms can be equalized”? Unless you mean the content quality will go to shit equally everywhere (as it did in their AI profile experiment) I’m not sure I understand what you’re saying.


Meta’s definition of quality is not the same as your definition of quality. For them, quality is (within reason) what drives “engagement” (aka time spent in their apps).

It might be that many people’s aesthetic sensibility is that AI-generated content is slop, but I’d still bet that tailored-perfectly-to-you content (and ads) will be highly engaging


> I have no idea what game Meta is playing

I think they're commoditizing their complement [1]. Engaging content helps Meta, and LLMs make it easier to create that content. Their business model has never been selling API access and releasing the model enables the community to improve it for them.

[1] https://gwern.net/complement


Meta seems to be playing the “commoditize your complements” game. Which is good for the rest of us who get close to SotA open weights models.


Wait, I thought every pornsite already had age verification, that modal that pops up and says "are you sure you are over 18?"


If your in several states they'll ask for a picture of your ID now.


As ML practitioner I am not a big fan of the concept of singularity and wholly agree that IF it happens it will be more like sunrise than a light switch, but...

I think thinking about it IS useful. Problems like value alignment are really difficult. No one alive knows what to do about value alignment and even if it takes us 500 years to reach real singularity it may take that long to solve value alignment problem


I worked at Meta, I am not sure the hiring was ever "diverse".

DEI always seemed like an activity they did for show. This changes nothing honestly.


Great work!

Do you see any artifacts from having trained on synthetic data? Is there a natural benchmark dataset (real tables in the wild)?

In my experience synthetic data can only take you so far, it has all the quirk the dataset creator can think of but the real value is usually in patterns they cannot. Vision took a huge leap forward with ImageNet dataset release


Thanks a lot! We don't see clear artifacts for the synth data. Part of the "trick" is to keep the capacity of our model low, it has only about 11M parameters. That forces the model to "learn an in-context learning algorithm" or in other words "do in-context learning rather than in-weigthts learning". Adding real data on top will help, agreed! The synthetic data is very broad, we started by a synth data prior that was just BNNs samples with differing sizes and thus super broad. Our new data samples functions more densely that are simpler to explain but could still sample almost any function (with the constraints that our networks aren't infinitely complex).


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: