This post states that a data scientist uses compact languages such as SQL and R.
Genuine question - do people really believe that being able to write and understand complex SQL makes you a data scientist?
I ask because, I've been writing some of the nastiest, most difficult looking SQL around for probably at least 15 years. And yet, I would NOT call myself a data scientist because I know and can work with data and use SQL. It might make me a data engineer.
What would make me a scientist is the process, method and rigor I apply to data-driven research and in practice. It's not about what tool I use or how complicated that tool is.
I often get a whiff of imposter syndrome over this because, if being "great at SQL and R" is enough to get the big bucks as a data scientist, then I'm clearly doing it wrong. But, then again, maybe I'm being too literal thinking that a scientist means something different.
I've been working as a data scientist for several years and have written some pretty gnarly looking SQL myself. I have a background in math and hard science so I have some understanding of the scientific method as well. While I respect our DBAs I wouldn't call any of them qualified to be data scientists.
While I have been able to hold my own in this job I went back to school to pursue a graduate degree (partly) because being in the field has shown me how much more there is to know. While it's easy enough to train a simple model in R there are so many ways to fool yourself and produce an invalid analysis and so many variations on otherwise-simple problems.
It seems this field has a lot of variation. A glorified report writer might get the DS title but they're not going to get the really cool jobs.
If you're interested in data science try out a kaggle competition and try to place high. The variety of methods and tricks people try to improve their entries can be illuminating, I think.
I'll preface this with I've not had a look at any Kaggle competition, but I always assumed Kaggle competitions was on par with programming competitions in terms of how the skills transfer professionally. A great programmer is not necessarily great at programming competitions after all.
No, there's way more to data science than competitions. But for someone who is already a data engineer more or less, I think it could be a good window into the complexity of modeling.
Firstly, it states that "a data scientist uses compact languages such as SQL and R". It doesn't state "everyone who uses SQL is a data scientist".
That said, the term data scientist itself is a bit frustrating. It gets thrown around a lot as if it is a well-defined role, and it is anything but. In my experience, the role of a "data scientist" is about as well defined as the role of an "engineer": it has connotations about the type of work and maybe a few shared skills, but the specifics of what an "engineer" does and their skillset varies widely depending on if they are a software engineer, an electrical engineer, or a civil engineer.
So while I think that most data scientists know SQL or use SQL frequently, I don't think that all data scientists use it, nor do I think that everyone who uses SQL works in a role that would probably be considered that of a data scientist.
That covers the "data" aspect, for my work however, the "scientist" aspect is just as important. While I'm expected to use SQL and R to generate reports, I need the thought process of an epidemiologist to construct my analytic samples. I also require the scientific knowledge and background to interface with MDs and clinical PhDs, who need me to bridge the gap between data and science.
Genuine question - do people really believe that being able to write and understand complex SQL makes you a data scientist?
I ask because, I've been writing some of the nastiest, most difficult looking SQL around for probably at least 15 years. And yet, I would NOT call myself a data scientist because I know and can work with data and use SQL. It might make me a data engineer.
What would make me a scientist is the process, method and rigor I apply to data-driven research and in practice. It's not about what tool I use or how complicated that tool is.
I often get a whiff of imposter syndrome over this because, if being "great at SQL and R" is enough to get the big bucks as a data scientist, then I'm clearly doing it wrong. But, then again, maybe I'm being too literal thinking that a scientist means something different.