More

laplacesdemon48 · on April 6, 2021

Not the author, just thought this was a cool collection of simulations to display physics concepts.

Recently used this one [1] to explain far/nearsightedness

laplacesdemon48 · on April 6, 2021

This is the data/methodology behind NYC’s official COVID statistics, such as: https://www1.nyc.gov/site/doh/covid/covid-19-data-totals.pag...

laplacesdemon48 · on April 6, 2021

1) How similar is this to creating a resnet? [1]. What are some key differences and similarities?

2) Has a CNN version of this been implemented in PyTorch?

[1] https://arxiv.org/pdf/1512.03385.pdf (Figure 2)

laplacesdemon48 · on April 5, 2021

Also from the Lancet link:

* VRC01 was shown around a decade ago to be one of several antibodies generated...that achieves broad neutralisation of several HIV strains

* 97% of participants who received an HIV vaccine immunogen candidate developed VRC01-class IgG B cells, precursors to broadly neutralising VRC01-class antibodies

* Results from another set of studies presented at HIVR4P (the Antibody Mediated Prevention [AMP] trials) showed that although intravenous administration of the VRC01 antibody at 8-week intervals did prevent infections with some strains of HIV, only 30% of the strains circulating in the trial regions of sub-Saharan Africa, South America, Switzerland, and the USA were sensitive to VRC01

To me this appears to be a great breakthrough but doesn't seem like a silver bullet.

laplacesdemon48 · on March 19, 2021

>> The biota formed with the Great Oxidation Event, a temporary increase in atmospheric oxygen, and became extinct from marine anoxia when the event was terminated by the drop in oxygen levels of the Lomagundi Excursion Event. The biota represents the earliest known experiment in multicellularity, with no extant multicellular descendants.

laplacesdemon48 · on Jan 19, 2021

This is a great article and I'm happy the author mentions "an explanation of combustion exothermicity in terms of Pauling electronegativities is not convincing". I was incorrectly taught this view and it took me some time to unlearn it.

I keep seeing comments about how certain science topics are initially presented in an overly complicated fashion. But there are two forces at play here: correctness versus accessibility. The theory presented in this paper "predicts most heats of combustion with an error of only a few percent". This is good enough for most practical applications, especially if your goal is to introduce students to this topic.

But this is not the most "correct" description of reality that we currently have. A better model to decrease the error would incorporate information about the 3D structures/conformations of the molecules and some funky terms related to the quantum mechanics.

Anybody publishing literature about/teaching something like organic chemistry is stuck between this balancing act of correctness versus accessibility. This difficulty is compounded by the fact that you basically have to unlearn some of the stuff you picked up during the path to "accessibility" in order to properly move into the "correctness" phase. I genuinely sympathize with anybody dealing with the balancing act.

After studying organic chemistry for an extended period of time, it dawned on me that the well-thought-out explanations in my textbooks were just post-hoc rationalizations the field uses to avoid delving into the true quantum mechanical nature of the reactions. I'm happy I sacrificed some correctness for the huge amount of accessibility I got. But I'm also happy to unlearn some of the stuff on my path to correctness.

laplacesdemon48 · on Jan 18, 2021

I've been using R nonstop for pretty much 5+ years. I'm happy that there's established competition coming from Python and new competition coming from Julia. Having these languages compete over similar types of programmers pushes each one to be better, which is awesome. I'm not a die-hard R person, I'd be more than happy to switch under the right circumstances.

But...I think one thing gets overlooked way too often. For "data scientists" or "statisticians" or [insert new term here], the majority our non-modeling time is spent on just plain old data wrangling. To me, R is unbeatable here. I've tried Python ~2 years ago and pre-1.0 Julia.

Using tidyverse you can do pretty much anything to any dataset, often *without a monstrous amount of keystrokes*. (The pipe syntax is awesome). If you really need speed you can always switch over to data.table for uglier but faster code. I really tried but I could never replicate the "brain cycles to keystrokes" speed of R in Python/Julia. That is, being able to intuitively and quickly just convert my thoughts into readable data wrangling code.

Sure the base R language is not that "fast" and Julia/Python benchmarks are way faster. But in practice this doesn't matter to me. Most of the performance sensitive packages are written in C/C++/Fortran anyway (rstan, brms, glmnet, caret). I don't care that I could write 3x faster loops. The extra 5 seconds for that one piece of code doesn't make up for the absence of a good data wrangling ecosystem.

My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr). I know that DataFrames.jl exists but it just doesn't even come close. There's a difference between "you can do this in Julia too" and "here's a clean/intuitive way to do this better without extra baggage".

I'm sorry if the above seems harsh. I genuinely appreciate the Julia team's efforts. I can only imagine how hard it is to create a new language. I just wanted to be honest.

veddox · on Jan 18, 2021

I deeply loath R for its terrible type idiosyncracies, syntax, and slowness.

However, even I must admit that it is incredibly good at what it was meant to do - analyse and display data. (And yes, the tidyverse is a huge improvement of the syntax, although it's telling that they basically reinvented the language to do so.)

As an ecological modeller, I create my actual simulation models in Julia, because it is a much, much better language for any real programming. But I still analyse the output in R.

dm319 · on Jan 18, 2021

I don't understand how people can loath R. If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful. Much nicer than trying to, say, divide a time period by an integer in Go.

kescobo · on Jan 19, 2021

> If you take a functional approach, especially using pipes, dplyr and a split, apply, combine style, it is quite beautiful

Sure, but what if you don't? Sometimes, this is the right way to do things, other times there are other approaches that are more natural/beautiful. In many cases, a loop with conditionals is much easier to understand.

grayclhn · on Jan 18, 2021

I use a lot of R, and like many aspects of it. But the fact that `f(stop("Hi!"))` may or may not throw an error depending on the internals of `f` is a little maddening. (And there are tons of similar issues.)

_v7gu · on Jan 19, 2021

Isn't that just lazy evaluation?

CoolGuySteve · on Jan 18, 2021

When it comes to data wrangling, one huge advantage of Julia over tidyverse/R dataframes/Pandas is that you can write a damn for loop and it won't be brutally slow.

It's so much simpler and faster to use a loop that says "pick this row only if this and that and this other thing are sometimes true" vs having to construct an algebra of column filters to do the same.

laplacesdemon48 · on Jan 18, 2021

I think that is absolutely a fair criticism. Personally, I rarely run into an issue where I absolutely am bottlenecked by a slow loop. But this sort of thing drew me to Julia in the first place.

There was also an R update in ~2017 that introduced some JIT speed-ups for loops, which made a noticeable difference.

If this is a problem you run into often, I suggest converting your object to a data.table. You can pass a function row-wise over the object very quickly:

https://stackoverflow.com/questions/25431307/r-data-table-ap...

dm319 · on Jan 18, 2021

I think loops are not ideal for data analysis. They are prone to human error, especially ones that modify the data, and in a way that can be hard to sort (i.e.iterating over the dimensions of the wrong object). A stepwise creation of new logical fields using mutate, and then a vectorised ifelse command is more robust and you can clearly see steps of the logic.

tikej · on Jan 18, 2021

I also like pipe syntax and I've found there is nice support for it in Julia. There are some nice packages to improve it over base [1].

Have you checked queryverse [2]?

[1] https://github.com/jkrumbiegel/Chain.jl [2] https://www.queryverse.org

laplacesdemon48 · on Jan 18, 2021

I haven't heard of queryverse, thank you for that. This also brings up a good point I wanted to highlight.

I get that Julia is a young language with a growing ecosystem. But the lack of "one obvious way to do something" may scare new users away.

"I want to quickly wrangle data. Do I use Query.jl, DataFramesMeta.jl, SplitApplyCombine.jl or something else?"

"I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

For a new R user it seems so much simpler:

1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

oxinabox · on Jan 18, 2021

I mean I get yout point. Julia has a bit of a Lisp's Curse http://winestockwebdesign.com/Essays/Lisp_Curse.html Writing a performant and easy to use data wrangling library for R is a bunch of work and means dealing with C/C++ etc. So few people are willing to do so, and just contribute to a small number of libraries like dplyr. (I feel like there are at least 2 other major compeditors to that in R?) Where as in julia it's really easy to write a new data wrangling library. Its just not that much work. So people: A) do it for just fun / student projects (None of those ones are though). B) do it because they have a nontrivially resolvable opinion (e.g. Queryverse has a marginally more performant but marginally harder to use system for missing data)

Nice thing about julia, especially for tabular data (thanks to Tables.jl), is everything works together. It's actually completely possible to mix and match all of those libraries in a single data processing pipeline. Which while is generally a weird thing to do, it does mean if you have a external package uses any of them it works into a pipeline of another. (One common case is that queryverse has CSVFiles.jl, but CSV.jl actually is generally faster, and you can just swap one for ther other, inside a Query.jl pipeline)

I absolutely argee this makes learning harder.

---

Also that particular example:

> "I need pipes to help me wrangle data more efficiently do I use Base Julia, Chain.jl, Pipe.jl, or Lazy.jl?"

It's piping. Something would have to massively be screwed up if any of those options were more or less efficient than the others. The only question is what semantics do you want. Each is pretty opinionated about how piping should look.

kazinator · on Jan 18, 2021

The Lisp Curse was written by then inexperienced web developer, with (then, and likely now still) zero Lisp experience, based on extrapolating something he read about Lisp in an essay by Mark Tarver. He prefers it not be submitted to HN due to the embarrassment, yet for some reason keeps the article up (probably because it generates traffic).

phonebucket · on Jan 18, 2021

> For a new R user it seems so much simpler:

> 1. run "library(dplyr)" 2. Google "how to XYZ in dplyr" 3. ??? 4. Profit

I beg to differ here. There’s much to be said for using data.table and base R instead of the tidyverse.

This article is worth a read in my view: https://github.com/matloff/TidyverseSkeptic

disgruntledphd2 · on Jan 18, 2021

Yeah, NSE (non-standard evaluation) is really annoying to work with in dplyr/tidyverse codebases, and this definitely inhibits people from building on top of them.

They are an 80% solution for a lot of data analytic needs, but base-R is 100% the right choice if you want your code to run for a long time without needing updates.

I've never really gotten into data.table for some reason, normally dplyr is fast enough, or I'm using something more efficient than R.

arduanika · on Jan 18, 2021

What a constructive, positive, down-to-earth, well-written comment, and what a nice reprieve from everything that's broken about the tone of web discussions these days. You point out that there's still another player in this space (R), but not in a way that's whiny, dismissive, or doctrinaire, and you celebrate the healthy competition. You suggest a streamlined path toward Julia ecosystem maturity, rooted in real-world needs. Nicely done!

I have no real dog in this fight, but I hope Julia team members (and/or aspiring Julia ecosystem contributors) will read and consider your point.

DNF2 · on Jan 18, 2021

This whole thread seems to be quite civilized. I can see no name-calling or off-topic rants, only a frank exchange of opinions, mixed in with some facts.

Your post seem to indicate that there is some sort of 'fight' going on, or that the tone is broken. I disagree. If most web discussions were like this one, we would have fewer problems in this world.

arduanika · on Jan 18, 2021

Oh, that's exactly what I mean -- when I say "everything that's broken about the tone of web discussions these days", I'm talking about threads and topics other than this one. I don't see any 'fight' here, and that's what's so refreshing.

DNF2 · on Jan 18, 2021

All right! I got the impression you were contrasting that particular post with the rest of this discussion, but apparently not. Still slightly confused here. Oh well, carry on.

j7ake · on Jan 18, 2021

Another big thing that R has an edge over python (and I guess Julia, but not sure) is making quick yet presentable plots of data that contain different factors that you want to show together. The matplotlib equivalent requires tracking different indices and manually adding layers for different indices.

amkkma · on Jan 18, 2021

Julia has plenty of plotting solutions that are better for stats than matplotlib:

https://github.com/JuliaPlots/AlgebraOfGraphics.jl https://github.com/queryverse/VegaLite.jl https://github.com/JuliaPlots/StatsPlots.jl

nextos · on Jan 19, 2021

Gadfly is amazing too, and Makie is the future.

laplacesdemon48 · on Jan 18, 2021

Here's a neat website that captures this: https://www.r-graph-gallery.com/

If you click into any of the plots and scroll down you can see how little code is needed for most of these plots.

For example: https://www.r-graph-gallery.com/135-stacked-density-graph.ht...

notagoodidea · on Jan 18, 2021

I worked with R and Python during the last 3 years but learning and dabbling with Julia since 0.6. Since the availability of [PyCall.jl] and [RCall.jl], the transition to Julia can already be easier for Python/R users.

I agree that most of the time data wrangling is super confortable in R due to the syntax flexibility exploited by the big packages (tidyverse/data.table/etc). At the same time, Julia and R share a bigger heritage from Lisp influence that with Python, because R is also a Lisp-ish language (see [Advanced R, Metaprogramming]). My main grip from the R ecosystem is not that most of the perfomance sensitive packages are written in C/C++/Fortran but are written so deeply interconnect with the R environment that porting them to Julia that provide also an easy and good interface to C/C++/Fortran (and more see [Julia Interop] repo) seems impossible for some of them.

I also think that Julia reach to broader scientific programming public than R, where it overlaps with Python sometimes but provides the Matlab/Octave public with an better alternative. I don't expected to see all the habits from those communities merge into Julia ecosystem. On the other side, I think that Julia bigger reach will avoid to fall into the "base" vs "tidyverse" vs "something else in-between" that R is now.

[PyCall.jl]: https://github.com/JuliaPy/PyCall.jl

[RCall.jl]: https://github.com/JuliaInterop/RCall.jl

[Julia Interop]: https://github.com/JuliaInterop

[Advanced R, Metaprogramming] by Hadley Wickham: https://adv-r.hadley.nz/metaprogramming.html

kescobo · on Jan 19, 2021

Out of curiosity, when was the last time you looked at DataFrames.jl? A huge amount has happened in the last year. Plus, if you want more tidy-like syntax, you can go with Query.jl, (or DataFramesMeta.jl, though that isn't quite finished updating to the the new DataFrames syntax), or of you just want pipes on DataFrame operations, there's Pipe.jl and Chain.jl.

I don't think your comments are harsh, you need what you need and you like what you like. I do mostly data wrangling too, but feel much less constrained with Julia than with tidyr. Sometimes having constraints and one right way to do things is good, but it's not for me.

Also worth noting it's not necessarily on the language developers to do this. Even in R, tidyverse is in packages, not in the base language.

valarauko · on Jan 19, 2021

My experience with R was somewhat different. R was my first computational language in 2006 (version 2.3, IIRC), and parsing real life data (biological, in my case) into a format acceptable to R was a non-trivial exercise. I had somebody write me a perl script to parse the raw data into a clean CSV, but that has its own problems. The tools that were the kernel of the tidyverse (created 2014) were just beginning to show up, and even magrittr pipes were many years away. The only tidyverse tool even close to mature at the time was ggplot. For me data munging was the limiting factor, and at some point I discovered many people prefer Python for these initial steps. In 2013 I learnt Python with the explicit aim of data munging, while continuing analyses in R. With Pandas I could cover 80% of my use case for R, and eventually dropped it completely. Again, this predates the creation of the tidyverse, which I noted with some irony.

For what its worth, Hadley Wickham was asked in a Reddit AMA several years ago about which platform he'd choose if he was just starting out. He pointed to Julia as his pick.

snicker7 · on Jan 19, 2021

> My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr).

How about the Queryverse?

https://www.queryverse.org/

blablablerg · on Jan 18, 2021

Coudn't agree more! Everytime I look at Julia, I check if they have an alternative to the tidyverse (esp. dplyr and tidyr) yet.

ryndbfsrw · on Jan 18, 2021

If we removed dplyr, then R scripts would absolutely scream so I find the speed argument for 'why switch to X' unconvincing. If users cared so deeply about speed, almost no one would be using tidyverse instead we'd all be using base-R or data.table.

Multiple dispatch? Hmm is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out. If the goal of Julia is to replace R/Python then their priorities feel way off the mark

goatlover · on Jan 18, 2021

> If the goal of Julia is to replace R/Python then their priorities feel way off the mark

There's a lot more to scientific computing than wrangling tabular data. Julia is competing in that overall space with R/Python/Fortran/Java/C++. If R or Pandas is better at data wrangling, then Julia won't win out there. But so be it. No PL is best at everything.

laplacesdemon48 · on Jan 18, 2021

> There's a lot more to scientific computing than wrangling tabular data.

Also a point that gets ignored way too often. My original post differentiated between time spent writing models and time spent data wrangling.

I would never even attempt to write a symplectic integrator in base R (OK maybe Rcpp would be fine but that's not really "R"). Julia, by design, is better at that. But the R ecosystem is so good that I can use the best practical implementation of a symplectic integrator to solve common modeling problems via RStan.

Yes, Stan is a standalone framework that can be accessed from Julia as well. But the following workflow can be done in R much easier:

  1) Read in badly formatted CSV data
  2) Wrangle the data into a useable form
  3) Do some basic exploratory analysis (including plots)
  4) Write several models in brms/raw Stan (via rstan)
  5) Simulate from the priors and reset them to more sensible values
  6) Run the model over the data to generate the posterior
  7) Plot/run posterior predictive checks, counterfactual analysis, outlier analysis (PSIS or WAIC), etc.

Again, the above represents my common use case. I fully appreciate that people use Julia to do awesome stuff like "the exploration of chaos and nonlinear dynamics." [0]. I understand that the modern R ecosystem isn't really built for this.

[0] https://juliadynamics.github.io/DynamicalSystems.jl/latest/

ryndbfsrw · on Jan 18, 2021

Totally agree there. It is not a replacement and it is trying to solve a different problem. I dont believe Julia contributers are lying awake at night upset that other languages exist and feel they need to put a stop to that. My point (put across clumsily I see) is that IF that was their goal then they are going about it the wrong way as most R/Python users have different priorities. But it is a moot point as that would be an absurd motivation to create a whole new language

BadInformatics · on Jan 18, 2021

> is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out

Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

Also, why shouldn't dplyr perform comparably against data.table? Seems like there would be no need for a fragmented library ecosystem here if the abstractions the tidyverse is built upon were lower-cost. Moreover, what if my data isn't CSV or in a table-like shape at all? "real world" does not mean the same thing across different domains.

[1] http://docs.juliaplots.org/latest/recipes/

disgruntledphd2 · on Jan 18, 2021

> Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

This is literally the whole conception behind generic functions in R (print, plot, summary etc).

I agree it's great, but Julia is building on a lot of prior art here.

BadInformatics · on Jan 18, 2021

For sure, and one would be remiss not to mention Dylan, CL/CLOS and Clojure here as well. My quibble was with the claim that multiple dispatch rarely shows up in practice, which you've pretty clearly shown is not the case in R!

disgruntledphd2 · on Jan 18, 2021

Yup, the R-FAQ specifically calls out Dylan and CL as influences.

ryndbfsrw · on Jan 18, 2021

'highfalutin ivory tower' is a great name for a band :D

Naturally you are correct and I am wrong to dismiss it as unimportant. What I'm saying is that the majority of R/Python users today are not looking for ultimate speed or sophisticated programming paradigms. Most users are doing the unsexy bread and butter of 'Take some tabular data' -> analyse -> report on it and I want to dismiss the argument of 'users will migrate to Julia because of these nifty features' because it ignores the very reasons the existing users use these tools in the first place. It would be as absurd as proclaiming Excel users will switch to Python because the accounts deparment suddenly cares about NLP.

laplacesdemon48 · on Dec 23, 2020

Can somebody please explain the challenges associated with miniaturizing and speeding up the ELISA test?

On wiki [0] I see:

> In the most simple form of an ELISA, antigens from the sample to be tested are attached to a surface. Then, a matching antibody is applied over the surface so it can bind the antigen. This antibody is linked to an enzyme and then any unbound antibodies are removed. In the final step, a substance containing the enzyme's substrate is added. If there was binding the subsequent reaction produces a detectable signal, most commonly a color change.

What are the pain points in this process?

[0] https://en.wikipedia.org/wiki/ELISA

perukoip · on Dec 24, 2020

The big game changer for these microfludic ELISA chips is the increased surface area / volume ratio. While this can speed up incubation quite a bit, you can also run into problems at low concentrations of where (1) you can start depleting the sample, and (2) mass action kinetics no longer apply.

Also figuring out how to reliably manufacture these to within a certain tolerance is a very difficult feat of engineering.

I haven’t been active in this field for a while though, so these may no longer be issues.

amelius · on Dec 23, 2020

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5325810/

laplacesdemon48 · on Dec 23, 2020

I'm on a path to becoming a MD. Was in finance before and discovered HN probably 4 years ago.

I can't speak for the original poster but I think HN users are attracted to complexity and raw truth. Medicine and programming have a lot in common in that regard.

An example is the discovery and isolation of insulin [0]. Banting barely convinced somebody to give him lab space for 2 measly months. He then experimented with tying off the ducts of dog pancreases or removing the pancreases altogether. He realized he could keep a severely diabetic dog alive with injections from another dog's pancreatic juices. There was some drama around the subsequent purification of insulin and the Nobel Prize.

The full story almost sounds like software hacking and startup drama.

[0] https://www.sciencehistory.org/historical-profile/frederick-...

laplacesdemon48 · on Dec 20, 2020

I stumbled upon this article after reading some more recent commentary on MSG [0], in which the author wrote:

"There’s no evidence to substantiate the claim that MSG causes ill effects in most people who consume it. A minority of people are hypersensitive to glutamate and MSG in food—added or natural—and in a study where MSG was given at 3 grams, in the absence of food, sensitive individuals had short-term, transient adverse reactions."

[0] https://peterattiamd.com/should-we-still-be-worried-about-ms...