More

amrrs · 2025-04-29T17:06:35 1745946395

It's thinking process to go about guessing a place is further fascinating. Even o4 mini high is quite good[1] and very fast.

But unlike a geogussr, it uses websearch[1] [1] https://youtu.be/P2QB-fpZlFk?si=7dwlTHsV_a0kHyMl [1]

amrrs · 2025-04-05T20:35:03 1743885303

The entire licensing is such a mess and Mark Zuckerberg still thinks Llama 4 is open source!

> no commercial usage above 700M MAU

> prefix "llama" in any redistribution eg: fine-tuning

> mention "built with llama"

> add license notice in all redistribution

thawab · 2025-04-05T20:39:38 1743885578

Who has above 700M MAU and doesn't have their own LLM?

daemonologist · 2025-04-06T04:58:13 1743915493

Well, Wikipedia, but I take your point.

AIPedant · 2025-04-06T00:00:55 1743897655

I am still dismayed how quickly we gave up on including the pre-training data as a requirement for "open-source" LLMs.

As someone who thinks LLMs as akin to Lisp expert systems (but in natural language): is like including the C source code to your Lisp compiler, but claiming the Lisp applications are merely "data" and shouldn't be included.

andy99 · 2025-04-06T10:59:58 1743937198

You forgot the most egregious term which is that users have to abide by an acceptable use policy that only allows you to use it for what Meta says you can.

amrrs · 2025-02-23T19:11:04 1740337864

It still answers Elon Musk as the biggest misinformation spreader - https://x.com/i/grok/share/5N2eKM8sRiaCQB6eOoYZUUwIv

gusmd · 2025-02-23T20:02:27 1740340947

I wonder how that differs from the sibling post with the exact same prompt? https://x.com/i/grok/share/fov27TB0Zn9jH5ZYIV70nTqN2

Is there some entropy or randomness at play here? Or some sort of RAG? Even if it was RAG, the "reasoning" is very different and doesn't mention the clear censorship in the initial prompt that the one I linked mentions.

daquisu · 2025-02-24T01:18:09 1740359889

See: the parameter "temperature" for LLMs

amrrs · 2025-02-20T20:49:01 1740084541

It's funny they don't have Hugging Face as their Partner. Literally, the biggest face of Open LLMs sitting right in Europe, but somehow it's not a partner.

Fnoord · 2025-02-20T22:03:19 1740088999

Hugging Face is an American start-up, by French people. It resides in Brooklyn, New York. Saying they sit right in Europe is dishonest.

Although, as a Dutch person, I'd like to point out Brooklyn technically is Dutch. ;)

DocTomoe · 2025-02-20T20:58:49 1740085129

When you are the front runner, you don't associate with the also-ran and the wannabes. They will drag you down, and drown you in their endless discussion and alignment meetings.

cess11 · 2025-02-20T21:21:24 1740086484

What would you expect HF to provide? Are they in possession of larger supercomputers than the research institutions involved?

amrrs · 2025-02-06T18:47:32 1738867652

Great to see Unsloth here, how long did the training process take??

Also, the different version of the same og Colab didn't make a 135M model to learn the XML tags, so do you think 8 billion should be the minimum for use this?

danielhanchen · 2025-02-06T18:50:10 1738867810

Hey! Oh the notebook takes 1 to 3 hours on a free GPU - it's best to run it for 1 day for a full run - faster GPUs can be much better.

Yep - bigger models should help! Definitely give Llama a try!

amrrs · 2025-02-04T18:10:53 1738692653

I'm sorry trying to clarify - why would you use Cline (which is coding assistant) for RAG?

jondwillis · 2025-02-05T23:58:45 1738799925

I may have misunderstood, but it seems the OPs intent was to get the benefits of RAG, which Cline enables, since it performs what I would consider RAG under the hood.

amrrs · 2025-01-31T20:23:14 1738354994

I ran the distilled models locally some of the censorships are there.

But on their chat (hosted), deepseek has some keyword based filters - like the moment it generates Chinese president name or other controversial keywords - the "thinking" stops abruptly!

prettyblocks · 2025-01-31T20:25:38 1738355138

The distilled versions I've run through Ollama are absolutely censored and don't even populate the <think></think> section for some of those questions.

amrrs · 2025-01-26T20:28:56 1737923336

This has been the problem with a lot of long context use cases. It's not just the model's support but also sufficient compute and inference time. This is exactly why I was excited for Mamba and now possibly Lightning attention.

Even though the new DCA based on which these models provide long context could be an interesting area to watch;

amrrs · 2025-01-23T21:54:37 1737669277

He is already running a hacker house in bangalore with cohorts of researchers and practiontioners getting together - https://turingsdream.co/

so probably it's picking one of those ideas or investing in it!

amrrs · 2025-01-23T18:34:01 1737657241

For those who don't know, He is the gg of `gguf`. Thank you for all your contributions! Literally the core of Ollama, LMStudio, Jan and multiple other apps!

kennethologist · 2025-01-24T02:45:47 1737686747

A. Legend. Thanks for having DeepSeek available so quickly in LM Studio.

sergiotapia · 2025-01-23T19:18:29 1737659909

well hot damn! killing it!

halyconWays · 2025-01-23T19:47:56 1737661676

[flagged]

kamranjon · 2025-01-23T22:28:50 1737671330

They collaborate together! Her name is Justine Tunney - she took her “execute everywhere” work with Cosmopolitan to make Llamafile using the llama.cpp work that Giorgi has done.

halyconWays · 2025-02-02T02:52:37 1738464757

She actually stole that code from a user named slaren and was personally banned by Gerg from the llama.cpp repo for about a year because of it. Also it was just lazy loading the weights, it wasn't actually a 50% reduction.

https://news.ycombinator.com/item?id=35411909

kamranjon · 2025-02-08T19:43:21 1739043801

That seems like a false narrative, which is strange because you could have just read the explanation from Jart a little further down in the thread:

https://news.ycombinator.com/item?id=35413289

madeforhnyo · 2025-01-23T22:22:12 1737670932

Someone did? Could you pls share a link?