Yeah this is my absolute dream language. Something that lets you prototype as easily as Python but then compile as efficiently and safely as Rust. I thought Rust might actually fit the bill here and it is quite good but it's still far from easy to prototype in - lots of sharp edges with say modifying arrays while iterating, complex types, concurrency. Maybe Rust can be something like this with enough unsafe but I haven't tried. I've also been meaning to try more Typescript for this kind of thing.
You should give Julia a shot.
That’s basically that. You can start with super dynamic code in a REPL and gradually hammer it into stricter and hyper efficient code. It doesn’t have a borrow checker, but it’s expressive enough that you can write something similar as a package (see BorrowChecker.jl).
Some Common Lisp implementations like SBCL have supported this style of development for many years. Everything is dynamically typed by default but as you specify more and more types the compiler uses them to make the generated code more efficient.
I quite like common lisp but I don't believe any existing implementation gets you anywhere near the same level of compile time safety. Maybe something like typed racket but that's still only doing a fraction of what rust does.
At my company we seem to have moved a little in the opposite direction of observability 2.0. We moved away from the paid observability tools to something built on OSS with the usual split between metrics, logs and traces. It seems to be mostly for cost reasons. The sheer amount of observability data you can collect in wide events grows incredibly fast and most of it ends up never being read. It sucks but I imagine most companies do the same over time?
> The sheer amount of observability data you can collect in wide events grows incredibly fast and most of it ends up never being read.
That just means you have to be smart about retention. You don't need permanent logs of every request that hits your application. (And, even if you do for some reason, archiving logs older than X days to colder, cheaper storage still probably makes sense.)
> That just means you have to be smart about retention.
It's not a problem of retention. It's a problem caused by the sheer volume of data. Telemetry data must be stored for over N days in order to be useful, and if you decide to track telemetry data of all tyoes involved in "wide events" throughout this period then you need to make room to persist it. If you're bundling efficient telemetry types like metrics with data intensive telemetry like logs in events them the data you need to store quickly adds up.
Agree. The new wide event pipeline should fully utilize cheaper storage options-object storage like S3. Includes both cold and hot data and maintains performance.
I'm totally in favor of cold storage. Just beware of how you are storing data, the granularity of the files and how frequent you think you'd want to access that data eventually in the future, because what kills in these services is the API cost. Oh and deleting data also trigger API costs AFAIK so there is that too...
HDD-based persistent disks usually have much lower IO latency comparing to S3 (microseconds vs hundreds of milliseconds). This may help improving query performance a lot.
sc1 HDD-based volumes are cheaper than S3, while st1-based volumes are only 2x more expensive than S3 ( https://aws.amazon.com/ebs/pricing/ ). So there is little economical sense in using S3 over HDD-based persistent volumes.
> The sheer amount of observability data you can collect in wide events grows incredibly fast and most of it ends up never being read.
Yes! I know of at least 3 anecdotal "oh shit" stories w/ teams being chewed by upper management when bills from SaaS observability tools get into hundreds of thousands because of logging. Turns out that uploading a full stack dump on error can lead to TBs of data that, as you said, most likely no-one will look at ever again.
I agree with the broad point- as an industry we still fail to think of logging as a feature to be specified and tested like everything else. We use logging frameworks to indiscriminately and redundantly dump everything we can think of, instead of adopting a pattern of apps and libraries that produce thoughtful, structured event streams. It’s too easy to just chuck another log.info in; having to consider the type and information content of an event results in lower volumes and higher quality of observability data.
A small nit pick but having loads of data that “most likely no-one will look at ever again” is ok to an extent, for the data that are there to diagnose incidents. It’s not useful most of the time, until it’s really really useful. But it’s a matter of degree, and dumping the same information redundantly is pointless and infuriating.
This is one reason why it’s nice to create readable specs from telemetry, with traces/spans initiated from test drivers and passed through the stack (rather than trying to make natural language executable the way Cucumber does it- that’s a lot of work and complexity for non-production code). Then our observability data get looked at many times before there’s a production incident, in order to diagnose test failures. And hopefully the attributes we added to diagnose tests are also useful for similar diagnostics in prod.
I'm currently working with Coroot, which is an open source project trying to create a solution for this issue of logs and other telemetry sources being too much for any team to reasonably have time to parse manually. Data is automatically imported using eBPF and Coroot will provide insights into RCA (with things like mapped incident timeframes) to help with anything overlooked in dumps.
Worked for all the AI labs as well. Turns out you can steal the entirety of copyrighted works in existence on the internet without consequence if the resulting company is big enough.
The very top of the US government is partnering with and supporting OpenAI, Meta etc. None of these lawsuits are going to amount to anything more than a slap on the wrist. Their logic seems to be that AI is going to be a matter of national importance, and other countries will infringe the copyright anyways, so it must be allowed for the US AI industry to stay competitive.
There's a list of cases here [1]. The case against Github Copilot was already mostly dismissed despite it producing identical samples to license-restricted code. The cat is so far out of the bag now anyways with many open models and datasets containing the stolen data - there is nothing anyone can do about it now.
My gitignore is just a pile of things I _always_ want to ignore:
# OS
.DS_Store
# Editors
.vscode-server/
.idea/
# Javascript
.npm/
node_modules/
...more stuff per language
I find it really nice to just keep all that in one place and not have to configure it per project. There's nothing too broad or situational in there that might get in the way - those kinds of things can go into the project specific .gitignores.
There's also `git status --ignored` to check if you're missing anything.
My reason for having each and every common ignore in each individual repo is that on the off chance that someone else wants to contribute to any of my projects, those files are already ignored for those people too.
And also, sometimes I work from different machines and I don’t really want to have yet another dotfile to sync between all my current and future machines.
(Yes, I know dotfile managers exist, but I literally only care about syncing my zsh config files and one or two other dotfiles mainly so I do that with my own little shell script and basically don’t care about syncing other dotfiles.)
Generally agree with you but I'm not going to clutter my project's .gitignore with stuff that's in the responsibility of the user to keep in their own ignore list like .DS_Store.
E.g. Each js project gets a /.npm /node_modules, each py proj a .pyc etc...
Editor is generally one per project which checked in config (.vscode) and if you want to use vim, have your own rules for ~ which you likely have anyway.
Also, both are not exclusive: .env can be checked in and in .gitignore
I just mean, it’s intentionally not a fancy setup with all kinds of things.
Just the most essential stuff and some symlinks. For the few dotfiles I really care about.
In an ideal world, I wouldn’t need any dotfiles at all. And my home directories would only contain files that I myself put there. Like my photos, my music, and code that I write. Those kinds of things.
That’s the main reason I don’t like managing all of my dotfiles. Because I don’t really want them in the first place. The only thing I want less in my home dirs but which I also have to live with is all of the other garbage that programs I use put there on their own. Like caches and different weird files that the programs decided should be there.
I like it. It's pretty basic but it is very good for a broad audience and covered things many people don't understand. I liked that you mentioned not to anthropomorphize the model. We would greatly benefit from 50+ year old policymakers and more taking the course even more than 19 year old freshmen.
Besides the current drama, I'm glad someone of his stature agrees with and can call out the horrible processes and tooling involved in the kernel. Using email and a pile of hacks to mess around with patches just sounds nuts and makes it so much harder to understand or contribute. I don't think decentralized necessitates such a terrible workflow - you can run a local website with a distributed DB, distributed git forges exist, you can use federated chats instead of email, there has to be a better way than email.
I don’t think there is enough demonstrable benefit to sacrifice the ubiquity and flexibility of email for a newer solution, especially for existing maintainers who are probably very comfortable with the current workflow.
Harder to understand and contribute is a bad, but unless there is a proposal for a similarly flexible system that has minimal downsides and massive advantages, the preference of existing maintainers should dominate over potential future contributors. Especially factoring in how expensive of a migration it would be.
I can understand this mindset, but I also think this is how communities die. They go to great lengths to avoid inconveniencing existing members while neglecting to welcome new ones. In the short term, the better choice is always to favor the existing contributors right up until they start dropping out and there's no one left to replace them.
Linux is so ubiquitous & important that might never happen, maybe it will just become increasingly captured by corporate contributors who can at least build long lasting repos of knowledge and networks of apprenticeship to help onboard newbies. Open source in name only.
I really like the way sourcehut integrates mailing list patches with a good UI. I’d like to see that become more common in some of these “classic” open source projects.
Afaik Linus tried Github in the past, but had several significant complaints about it hiding information, messing with basic git operations, generating bad commit messages, etc. . So it is not as if they wouldn't use something better, there just isn't anything that has feature parity with a workflow they have been optimizing for decades.
That optimization includes things like email filters and email client customization that is individualized to longtime contributors, not to mention that it is just what Linus and others are used to. And the long time contributors have had years, or decades to incrementally set up their tools, and become familiar with the workflow. The problem is that new contributors and maintainers don't have that, and learning the workflow, and setting up tools so that the email based workflow is manageable is daunting and takes a lot of time.
I won't contest that there are advantages to the linux Kernel's workflow, but there are downsides too, and a major one is that it scares off potential contributors.
That said GitHub definitely is far from perfect as well, and has different strengths and weaknesses from email based flows. As do any other options.
But just because there isn't currently anything that is unilaterally better doesn't mean things can't be improved. There is clearly a problem with onboarding new developers to the linux workflow. That should be acknowledged, and a solution sought. That solution doesn't have to be switching to GitHub or similar. Maybe there just needs to be better documentation on how to set up the necessary tools, that is oriented towards developers used to the Github process. Maybe there needs to be better tooling. Maybe the mailing lists need to be organized better, or have the mailing list automatically add metadata in a standard, machine-readable format to emails. Etc.
I always thought it was a pretty blatant "vibe check" to filter out people who are so uncomfortable with software that they can't customize their environment to create an email workflow that works for them.
That sounds about right - the medium is the message. If you can't stand the clunky-but-working, decades-old, patch process, you probably won't stand the clunky-but-working decades-old code.
I'm grateful the kernel still supports MIPS, which means an old appliance of mine still works perfectly fine and is upgradable. I would be cery sad if someone were to rip-out support of an old MIPS arch, just because it's old and unwieldy
I've contributed to a couple of projects that use email based workflows. I can customize my environment, but it takes a lot of time, and I would rather do something else than figure out how to filter the firehose of a mailing list to the few emails I actually care about, or learn how to use a new email client that is slightly better and handling patches.
The first few times, it took me longer to figure out how to send a patch than it did to fix the bug I was writing a patch for.
I guess technically that’s true, but it cannot possibly take long to learn how to use `git format-patch`, and everyone should already know how to attach a file to an email. Even if you have to spend half an hour reading the entire output of `git format-patch --help`, is that really enough to prevent you from sending your first patch?
Ok, let me get this straight. You diagnosed a problem in the Linux kernel. You debugged it and figured out how to fix it. You edited the kernel source code, recompiled it, and tested your fix. After all that, if you have to read a man page you’ll just give up?
By that same token, there’s no reason for you to expect kernel developers to adopt a different way of working either. Their time is even more valuable than yours.
As someone who has never used mailing lists before (for software development), how much harder/less advantageous it is to migrate to an issues or thread-based approach, like with Github?
- Distributing patches via email is more scalable than web hosting. Even GitHub could not host the level of activity happening on the LKML mailing list
- Web hosting has a variety of access and legal problems when working with international teams; email crosses borders more easily
- Email is a decentralized and distributed system that prevents any single company from controlling a project's development infrastructure (release infrastructure is another story, but multiple companies will generally manage their own release process anyway)
This is totally wrong. The majority of immigrants that people have a problem with are poor immigrants from rural areas of India. They spend their life savings to get a college degree which used to guarantee a work visa and path to residence.
Yes precisely but the corporations asking for cheap labor did want poor immigrants with no options and the government caved and did basically no vetting for millions of people.
I’m adjacent to some people who do AoC competitively and it’s clear many of the top 10 and maybe 1/2 of the top 100 this year were heavily LLM assisted or wholly done by LLMs in a loop. They won first place on many days. It was disappointing to the community that people cheated and went against the community’s wishes but it’s clear LLMs can do much better than described here
To be fair, there's probably not a lot that "humans" add when the solution is solved in 4 and 5 seconds respectively. That's clearly 100% automated. Most humans can't even read the problem in that timeframes.
A better implication would be that proper use of LLMs implies more than OP did here (proper prompting, looping the answer w/ a code check, etc.)
I think you're trying to dunk on me without actually knowing much about the matter. You can look at the leaderboard times and the github repos too. They are fully automated to fetch input, prompt LLMs and submit the answer within 10 seconds.
reply