Hacker News new | past | comments | ask | show | jobs | submit login

This post states that a data scientist uses compact languages such as SQL and R.

Genuine question - do people really believe that being able to write and understand complex SQL makes you a data scientist?

I ask because, I've been writing some of the nastiest, most difficult looking SQL around for probably at least 15 years. And yet, I would NOT call myself a data scientist because I know and can work with data and use SQL. It might make me a data engineer.

What would make me a scientist is the process, method and rigor I apply to data-driven research and in practice. It's not about what tool I use or how complicated that tool is.

I often get a whiff of imposter syndrome over this because, if being "great at SQL and R" is enough to get the big bucks as a data scientist, then I'm clearly doing it wrong. But, then again, maybe I'm being too literal thinking that a scientist means something different.




I've been working as a data scientist for several years and have written some pretty gnarly looking SQL myself. I have a background in math and hard science so I have some understanding of the scientific method as well. While I respect our DBAs I wouldn't call any of them qualified to be data scientists.

While I have been able to hold my own in this job I went back to school to pursue a graduate degree (partly) because being in the field has shown me how much more there is to know. While it's easy enough to train a simple model in R there are so many ways to fool yourself and produce an invalid analysis and so many variations on otherwise-simple problems.

It seems this field has a lot of variation. A glorified report writer might get the DS title but they're not going to get the really cool jobs.

If you're interested in data science try out a kaggle competition and try to place high. The variety of methods and tricks people try to improve their entries can be illuminating, I think.


I'll preface this with I've not had a look at any Kaggle competition, but I always assumed Kaggle competitions was on par with programming competitions in terms of how the skills transfer professionally. A great programmer is not necessarily great at programming competitions after all.

Am I way off here?


No, there's way more to data science than competitions. But for someone who is already a data engineer more or less, I think it could be a good window into the complexity of modeling.


Nope. Kaggle just covers the modelling part, which is normally much easier than figuring out how to solve business problems using data.


Firstly, it states that "a data scientist uses compact languages such as SQL and R". It doesn't state "everyone who uses SQL is a data scientist".

That said, the term data scientist itself is a bit frustrating. It gets thrown around a lot as if it is a well-defined role, and it is anything but. In my experience, the role of a "data scientist" is about as well defined as the role of an "engineer": it has connotations about the type of work and maybe a few shared skills, but the specifics of what an "engineer" does and their skillset varies widely depending on if they are a software engineer, an electrical engineer, or a civil engineer.

So while I think that most data scientists know SQL or use SQL frequently, I don't think that all data scientists use it, nor do I think that everyone who uses SQL works in a role that would probably be considered that of a data scientist.


That covers the "data" aspect, for my work however, the "scientist" aspect is just as important. While I'm expected to use SQL and R to generate reports, I need the thought process of an epidemiologist to construct my analytic samples. I also require the scientific knowledge and background to interface with MDs and clinical PhDs, who need me to bridge the gap between data and science.


> do people really believe that being able to write and understand complex SQL makes you a data scientist?

Many data scientists use R and SQL, that does not mean that many of those who use R and or SQL are data scientists.

Many lawyers use word. Yes I’m not a lawyer just because I use word.


Your second sentence does not follow from your first. Just because Y's do X, doesn't mean everyone who does X is a Y.


youre being too hard on yourself and you should go apply for the big bucks. most scientists barely deserve the title




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: