Hacker News new | past | comments | ask | show | jobs | submit | amrrs's comments login

It's thinking process to go about guessing a place is further fascinating. Even o4 mini high is quite good[1] and very fast.

But unlike a geogussr, it uses websearch[1] [1] https://youtu.be/P2QB-fpZlFk?si=7dwlTHsV_a0kHyMl [1]


The entire licensing is such a mess and Mark Zuckerberg still thinks Llama 4 is open source!

> no commercial usage above 700M MAU

> prefix "llama" in any redistribution eg: fine-tuning

> mention "built with llama"

> add license notice in all redistribution


Who has above 700M MAU and doesn't have their own LLM?


Well, Wikipedia, but I take your point.


I am still dismayed how quickly we gave up on including the pre-training data as a requirement for "open-source" LLMs.

As someone who thinks LLMs as akin to Lisp expert systems (but in natural language): is like including the C source code to your Lisp compiler, but claiming the Lisp applications are merely "data" and shouldn't be included.


You forgot the most egregious term which is that users have to abide by an acceptable use policy that only allows you to use it for what Meta says you can.


It still answers Elon Musk as the biggest misinformation spreader - https://x.com/i/grok/share/5N2eKM8sRiaCQB6eOoYZUUwIv


I wonder how that differs from the sibling post with the exact same prompt? https://x.com/i/grok/share/fov27TB0Zn9jH5ZYIV70nTqN2

Is there some entropy or randomness at play here? Or some sort of RAG? Even if it was RAG, the "reasoning" is very different and doesn't mention the clear censorship in the initial prompt that the one I linked mentions.


See: the parameter "temperature" for LLMs


It's funny they don't have Hugging Face as their Partner. Literally, the biggest face of Open LLMs sitting right in Europe, but somehow it's not a partner.


Hugging Face is an American start-up, by French people. It resides in Brooklyn, New York. Saying they sit right in Europe is dishonest.

Although, as a Dutch person, I'd like to point out Brooklyn technically is Dutch. ;)


When you are the front runner, you don't associate with the also-ran and the wannabes. They will drag you down, and drown you in their endless discussion and alignment meetings.


What would you expect HF to provide? Are they in possession of larger supercomputers than the research institutions involved?


Great to see Unsloth here, how long did the training process take??

Also, the different version of the same og Colab didn't make a 135M model to learn the XML tags, so do you think 8 billion should be the minimum for use this?


Hey! Oh the notebook takes 1 to 3 hours on a free GPU - it's best to run it for 1 day for a full run - faster GPUs can be much better.

Yep - bigger models should help! Definitely give Llama a try!


I'm sorry trying to clarify - why would you use Cline (which is coding assistant) for RAG?


I may have misunderstood, but it seems the OPs intent was to get the benefits of RAG, which Cline enables, since it performs what I would consider RAG under the hood.


I ran the distilled models locally some of the censorships are there.

But on their chat (hosted), deepseek has some keyword based filters - like the moment it generates Chinese president name or other controversial keywords - the "thinking" stops abruptly!


The distilled versions I've run through Ollama are absolutely censored and don't even populate the <think></think> section for some of those questions.


This has been the problem with a lot of long context use cases. It's not just the model's support but also sufficient compute and inference time. This is exactly why I was excited for Mamba and now possibly Lightning attention.

Even though the new DCA based on which these models provide long context could be an interesting area to watch;


He is already running a hacker house in bangalore with cohorts of researchers and practiontioners getting together - https://turingsdream.co/

so probably it's picking one of those ideas or investing in it!


For those who don't know, He is the gg of `gguf`. Thank you for all your contributions! Literally the core of Ollama, LMStudio, Jan and multiple other apps!


A. Legend. Thanks for having DeepSeek available so quickly in LM Studio.


well hot damn! killing it!


[flagged]


They collaborate together! Her name is Justine Tunney - she took her “execute everywhere” work with Cosmopolitan to make Llamafile using the llama.cpp work that Giorgi has done.


She actually stole that code from a user named slaren and was personally banned by Gerg from the llama.cpp repo for about a year because of it. Also it was just lazy loading the weights, it wasn't actually a 50% reduction.

https://news.ycombinator.com/item?id=35411909


That seems like a false narrative, which is strange because you could have just read the explanation from Jart a little further down in the thread:

https://news.ycombinator.com/item?id=35413289


Someone did? Could you pls share a link?


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: