I have been working and thinking about data and ML models for many years and this is probably the most concise and lucid summary of a post NN philosophy I have heard so far. It also resonates with me. Thanks for sharing.
Unlike some of those top comments, I actually like your curation a lot. Prefer it to The Economist, New York Book Review, and the other sites ppl linked in the comments. Good job!
I know you have an RSS feed, and the crowd here probably is all for RSS, but I would love it if I could leave my email somewhere and get a notification when a new submit was posted.
I was thinking along those lines in the first half of my twenties. Now in the second half I feel lucky that I found a topic that I am going deep on and I am using that to go wide in some other connected areas.
In hindsight there was some time where I had to actively force myself to stick with one topic longer than usual to go deep. Once that hurdle is taken I have less of a problem now integrating adjacent topics.
I also agree with the advice that focusing on one topic improves professional success. Long-term is TBD though.
My strategy is to chunk the topics I'm interested in. At least on a per day basis but usually longer, reading several books on a topic, experimenting and taking notes.
Quick question: What's the reason for making "Transform" part of the Feature Store definition. I've been evaluating a couple of feature stores (incl. Tecton and Feast - great job by the way willempienaar) and I'm wondering if that doesn't complicate things. Especially if you already have your own data processing pipes.
With Tecton, transformations are an optional component of the system. Similar to Feast, you can bypass the Transform component to ingest data directly from external pipelines. You typically do this when you have existing data pipelines and you want to make the values available in a Feature Store.
However, if you don't have an existing stream / batch data pipeline infrastructure that your data scientists / data engineers can easily contribute to, a Feature Store's Transform component is an easy way for them to be fully self-sufficient. Tecton makes it easy to express feature transformations using Spark's native DataFrame API, Python, SQL or Tecton's DSL.
Besides the self-sufficiency, there are a few other advantages you get from having a feature store manage your feature transformations:
- Feature Versioning: If you change a feature transformation, the Feature Store will know to increment the version of that feature and ensure that you don't accidentally mix features that were computed using two different implementations
- End-to-end lineage tracking and reproducibility: If a feature store manages your transformation, it can tie exact feature definitions all the way through a training data set and a model that's used in production. So, if years later you want to reproduce a model of a certain time in the past, a Feature Store that supports transformations would be able to recreate that model as long as the raw data still exists
- Trust: It's more likely that a data scientist will trust and then reuse another user's feature, if they can peek under the hood and see how the feature is actually calculated
- On-Demand Features: These transformations cannot be executed by existing data processing pipelines because they have to be computed in real-time when the prediction is made — which happens in the operational environment.
In reality, you will frequently see multi-stage data processing workflows in an organization: You will have a lot of data cleaning and preprocessing happening in an organization's standard and ML-independent data processing infrastructure. Afterwards, a Feature Store will pick up and transform that preprocessed data and turn it into feature values.
"Operation Transformation" = "a system that supports collaboration functionalities by separating the high-level transformation (or integration) control from the low-level transformation functions"
Source: OT's Wikipedia article
But I felt the same. Never heard of "Operation Transformation" before and both OT and its alias were equally opaque to me.
Hearing friends talking about flaking, I would say it is a very common theme. I would also think, that you may not want to meet up with that person, because either not in the same city, or because you may match with a lot of different people which also may change over time.
I also thought of it more like a self-validation thing, a source of more information, recommendations, etc. What you would get back may be an online conversation or a feed of relevant websites.
there's plenty to discuss whenever you've read something someone else has read. just think about book clubs that get together and talk about books that everyone has just read based on questions that they might have shared before hand. you can always share your insights into the book, your opinions on the content, connections to other things, etc. It's funny to me that people are saying that conversations around these things would get stale.
My tip would be to have a group of friends, potentially online in a group chat or in person, all read one or two _short_ pieces that you have selected so you can discuss them, and then have another person select pieces, etc. I highly doubt that you will find another person out in the ether that has read a lot of the same stuff as you have at the level you want or expect.
I haven't tried Mixnode yet, but the way I understand it, it lets you query websites and retrieve their HTML content that you can then parse - without you having to crawl the site. Looking at their Github, they seem to utilize WARC, so they may also allow you to request the website for certain timestamps?
That being said, I find this highly interesting, if it works like that. We are working on a peer-to-peer database that lets you query a semantic database, popularized mostly by public web data, but with strong guarantees of accurate and timely data, and this could be a great way to write more robust linked-data converters.
I am wondering if the technique described in this Nature publication from yesterday [1] to potentially operate quantum computers at room temperature, could be used for the atomic storage as well. Does anyone know?