Hacker News new | past | comments | ask | show | jobs | submit login
The Big Data Brain Drain (jakevdp.github.io)
82 points by barry-cotter on Nov 14, 2014 | hide | past | favorite | 31 comments



> Those with the skills mentioned in this article could easily ask for several times that compensation in a first-year industry job, and would find themselves working on interesting problems in a setting where their computational skills are utilized and valued.

I have to admit that this one always baffled me from the reverse perspective. I always hear these discussions of the great talent shortage and how desperate people are for software engineers. I hear tales of people taking six week crash courses on coding and getting well paying industry jobs. But I just can't believe them.

I don't consider myself a great coder, but I'll forgo humility for a moment to say that I'm decent. Over the course of my PhD, I needed to:

* Write cluster based numeric code for the university supercomputer

* Reverse engineer an undocumented binary network protocol with a packet sniffer

* Work in six languages in a single day (C, Tcl, Labtalk, Scheme, Labview, Python)

* Write code so low level that I had to debug it with an oscilloscope

* Design and implement a data acquisition system that accepted concurrent input from hardware, network, and and user sources

* Port a Monte-Carlo simulation to the GPU

* Write user facing visualization systems

I guess that what I'm trying to say is that I can easily FizzBuzz.

After my PhD, I looked into going into industry, but I couldn't even get a no-thank-you. People complain about dealing with recruiters bugging them about job possibilities, but I couldn't even get one to take my CV. I relocated across an ocean to an employer who didn't even pay for said relocation partially because they're the only people who even got back to me.

Don't get me wrong - I love my current position, but I've known plenty of academics who were miserable. Individuals who were more skilled than myself and sometimes only making as much as Zappos pays their warehouse staff.

If the talent shortage is that bad, why aren't more of these people being poached? Why aren't we seeing a larger brain drain?


While there is a great demand, hiring people is actually a lot more complicated than most would-be employees have an idea of. If you're not from the same country (as you seem to be implying), it is even more problematic. The thing is, most employers are anal about hiring the right kind of person because an employee represents a significant investment: not only do you have to pay for his/her benefits, but also he/she will be writing code that will go into their infrastructure, and they want to be really really sure that the person is good.

Now, how do employers find this kind of person? One way is to go to well known universities; most US universities have recruiting events that are chock-filled with tech companies; and this is by far the best place to get a great entry-level position. Once you work at a known company for even a little while, that experience is a signal to other companies (and recruiters) that you are someone who is hireable and they will then actively pursue you. That is the impression that I got, and it has fairly matched with my personal experience.

Outside of university recruiting events, I think your best shot is Meetups. There will definelty be some of those in any city, go there. Introduce yourself, ask questions, learn news stuff and talk to people. I know this is really really hard for a lot of people with an academic background who tend to be rather shy. But hey, you want to work for a great company with the nice pay and benefits, then you have to step outside your comfort zone.

I cannot over-emphasize the need for networking for advancing your career.


The problem isn't your CV or you. It's that the job your credentials seem to command isn't the type of gig any company is willing to risk hiring a fresh out of college guy to do. your academic creds, while wonderful, are simply NOT real world experience.

Your PHD leads recruiters to think you'll demand 6 figures and a SR position. But your experience doesn't show you could actually do the role you'd be expected to. If a hiring manger puts their signature of the offer letter and it turns out you suck, it looks BAD on them, REALLY bad.

I've done enough hiring of SR folks to form the opinion that the problem isn't lack of talent, it's lack of provable talent. As a hiring manager, in most companies, it's better to leave a role open than hire someone that sucks or a chance of sucking.

The fix on your side is the path almost everyone I know that's in a high level position has taken, startups and small companies. You slog it out in an underpaid gig for a few years, think of it as a residency. Then start looking for a real gig.


What do you mean by SR?


Senior. Basically an engineer you can handle a project(1month to 1 year time frame) to and they'll take it from there calling in resources when necessary.


I think it's less important what you've done and more important how you've done it. Replace all this with a single well-written, well-documented open-source project that people actually use and your situation would be different. I'm not trying to diminish your accomplishments, but rather diminish how much the industry values this kind of stuff. The problems you typically work on in industry are much more cut-and-dry, but the important part is not getting a solution in the first place, but a maintainable, well-tested one. Without making too strong of a claim as to whether it's correct, there is a strong negative association for academic code in the industry. I think there's actually a lot of validity to this criticism, though: for each of the projects listed, did you maintain the project at all? Were there other users for the code, or was it just you (or your research group)? This is a place where the academic experience is lacking as preparation for engineering work.


All I have is anecdotes, but I'll hazard some guesses.

The talent shortage has a strong geographic component. In SV or NYC, job search sites (a poor way to gauge actual demand, but whatever) seem to list 1,000s of available Java/C# gigs, with substantial numbers of listing that include javascript. But I would probably lump those three languages together as placeholder flags to indicate "standard enterprise developer" jobs. Most of these job listing seem to fall on the entry-level, with a steep fall-off in mid-level offerings, and a few (mythical?) mega-$$ listings. These types of jobs feel more "credential sensitive" to me - if your resume (not CV) doesn't say Comp Sci or maybe MIS, you probably get binned immediately.

Leave those regions, job listings seem to drop. If we can trust these listings to reflect some level of region-comparable demand, we might guess that in Seattle, WA or Chicago, IL, there is about 1/2 the demand of SV and NYC. Cut it down even more in places like Austin, TX or RTP, NC, maybe 1/6 or 1/8 the demand of SV and NYC? I'm sure we could include a bunch of other cities in those tiers, but maybe those are loosely representative.

I often wonder how much job demand looks inflated on these job boards though. A quick glance at Java in NYC, NY on indeed.com shows a few big staffing companies with hundreds of listings. I usually assume there are bunch of duplicate listings from staffing companies for a single position.

I think a lot of lower-level gigs out there are filled by kind-of-passable candidates not making anything like the SV salary numbers that get thrown out on HN frequently. Among strong engineers I personally know in my region, several are frequently contacted by recruiters for jobs that are, at best, side-ways moves for them (eg doing the same gig at some different place for roughly the same money). A PhD (and maybe even a MS) might scare employers at this level?

The other part of this, especially regarding the original article, is that deep, strong statistical/machine learning/magic fairy dust hacking gigs are much less numerous that the standard enterprise dev jobs and that those gigs are even more restrictive in recruiting nature.

Meaning, if you are really recruiting for a top-notch "data science" person, you are probably(?) looking for intensive credentials or portfolios. So MS or PhD in Applied Math or Statistics with easily demonstrated programming skills or a set of completed projects cast in the "data science" space. And if you're not recruiting at that level, the "data science" label means "query some data sources to generate a tabular report, maybe with some rollups."


A recruiter here in Chicago tells me he's dying for entry level ASP.NET and back-end Java coders. I think a lot of these companies just run these guys through at largely unimpressive wages, run a bunch of candidte through as temp to perm, and hope to god one guy out of 20 isn't a complete moron. I've had the displeasure of working with some of these guys. They have some paper certs and some basic understanding of what they're doing but they have zero big picture and any love of what they are doing. Don't get me started on their work from a security or stability perspective. These guys are just in the wrong field. They heard the siren's song of being a coder, did some coding bootcamp or took a community college class or two, and are shoved into big departments churning out junk code.

I think these jobs echo how a lot of basic IT support staff in the late 90s and early 2000s got work. Companies were scrambling for someone to sit down with end users and explain how Outlook worked, install software, and maybe reboot the occasional server. These guys all ran to braindump sites, got a MCSE or CCNA, and now are lifers at the company they fooled into hiring them.

There's something somewhat sad about all this. These people probably would have been better off in a different field but are now stuck in unpromotable positions because they just don't have what it takes to move up. Our own support guy is in his 40s and is barely competent at providing basic level 1 PC desktop support. He's totally in the wrong field. I'm not saying this stuff is a calling, but in some ways it is. Like a lot of technical jobs, you kinda have to make it your religion and invest a bit of your personal time into it because its so fast moving that if you treat it like a 9-5 desk job, you'll fall behind very quickly.


> I think a lot of these companies just run these guys through at largely unimpressive wages,

This.

When I graduated with a MS in CS 3 years ago, I interviewed at over a dozen companies. I got offers at all of them. The highest offer I got was for 45k. The lowest was 32k. Outside of SV and NYC and the big tech hubs, wages are terrible. For comparison, waiters at the restaurant I worked at (by no means high end) made up to 40k.

Companies want to pay dirt-cheap wages and get tons of highly qualified applicants and shockingly they just can't find anyone!

That said, three years in and my salary has almost doubled (after job-hopping - which did make me sad, as that first job was a lot of fun even if it was a lot of work), so at least there's that, but that's just about the cap for private sector work in the region.


> Like a lot of technical jobs, you kinda have to make it your religion and invest a bit of your personal time into it because its so fast moving that if you treat it like a 9-5 desk job, you'll fall behind very quickly.

I feel this statement should be emphasized a lot more. The technology scene is changing so fast that you HAVE to invest time in learning new things all the time, or at bare minimum hang out with people who do. I feel that this is kinda the flip side to working in technology: people see the great pay and benefits and wonder why they are being paid so well? Well, its because you're skills are in demand....for now. They may not be very soon, so be sure that your skills are upto date.


I have a PhD in mathematics and left academia ultimately to end up as a data scientist. Some people at previous prospective employers felt intimidated by my PhD despite the fact that I took steps to downplay it ("No, you don't have to call me Dr. Maney. 'Jack' is fine."). And I've missed out on a few jobs because of the impression that I'd be too bored.

And yes, I've taken on "data science" positions where I essentially became a chart monkey. That, however, is altogether another rant...


So, here's the deal. I've run and hired data science teams.

The first obvious problem is ___location.

The second obvious problem is money -- you're probably wildly overestimating the value of math, and underestimating the value of knowing your way, in a non-theoretical sense, around the standard ML and nlp tools, and the toolkits that implement them.

Third, I dump all resumes that say mathematics, particularly phd math, unless there is a very strong indication you can code. The problem is this: most data science, in practice, is at least 50% coding / coding type activities. Data is scattered across multiple databases, requires serious cleaning and merging, requires merging external data sources, etc. I also need at least some productization of the output, rather than tossing a hacked-together R/python script with a pile of un-automated db extractions to an engineer.

So I'm faced with a dilemma: if you can't code, where code means talk to multiple databases, extract data from them in a somewhat self-sufficient matter, and deal with a lot of the other tasks, I either have an incredibly unproductive data scientist or I have to dedicate half to 3/4 of an engineer to you, leaving me with a severe negative productivity increase from hiring you, because I don't exactly have too many available engineers.

Now, as I'm sure you'll protest, you can learn all these things! Yes, you may well be able to (though some math people never get particularly competent at them.) However, that's 6-18 months where you'll be pretty unproductive for me, and I need the productivity much sooner than that.

And yes, lots of companies essentially lie about data science jobs, and they're really data engineering / writing data pipelines. I left a former employer after 8 weeks because of this...


When did I say I couldn't code?

When did I say that I wasn't comfortable munging data from various sources?

I don't know the source of the chip on your shoulder. I have no idea who you think you're responding to when you read my post above, but it isn't me.


So you want help or not? If that description doesn't apply to you, fine. But in the commenters experience, they'd encountered all that.

Frustration can make anybody testy. But unless this topic is a rhetorical question, then it'd be nice to respect honest responses that add to the conversation.


When did I say that I wanted help?

In case it isn't clear, I neither need nor want the kind of "help" that is being offered in this thread, thanks.


[flagged]


0_o

Whatever you say... By all means, please continue living in your tiny little revenge-fueled fantasy world where I'm permanently unemployed.


Where do you get all this stuff? We're talking away here, and out comes this vitriol and bitterness. There's so much to be had from this community. But instead, we get sarcasm.

Here's a thought: people don't just hire folks based on their skills. At root, they have to fit in. Which at a minimum means no corrosive attitude; no offensive meanness; a minimum of courtesy and respect. Because no matter how smart or skilled somebody is, there's always somebody else at about the same level, who Isn't offending everybody in sight.

Ah, I see. I assumed because this thread was appended to the 'data scientist can't get employed' thread that your comments related to that. My mistake. Sorry if I offended.


> My mistake. Sorry if I offended.

Apology accepted.


[flagged]


I sincerely hope that we never cross paths. I would never want to work on the same team as you.

For what it's worth, though, you do have a point in that ___location is also an issue. Regardless, your "help" was not asked for and is most definitely not needed.


I think a lot of the problems with your teams may have been you.


That sounds pretty darn impressive. Maybe the PhD made you look a bit academic? Where did you come from and where did you move?


You either weren't looking hard enough or looking in the wrong places if you think there aren't many companies interested in your skill-set.

Send your resume to the email in my profile, if you'd like. We're always looking for guys like you and we can most certainly put those PhD credentials to work.


Those are not the things that most developers do. When I'm hiring you, I need to know that you can write clean, maintainable, well tested code. We interface with SQL, web services, messaging, rules systems, etc. IE, the majority of development is not "low level", it is "high level".

My company (Red Hat) is always looking to hire. Yes, there are lots of developers looking for work, but lots of them just are not very good. The shortage is a shortage of capable developers.


A more recent article addressing this essay from the same author:

https://jakevdp.github.io/blog/2014/08/22/hacking-academia/

The tl;dr summary from this second article referencing the first:

  a quick summary is this: scientific research in many 
  disciplines is becoming more and more dependent on the
  careful analysis of large datasets. This analysis requires
  a skill-set as broad as it is deep: scientists must be 
  experts not only in their own ___domain, but in statistics, 
  computing, algorithm building, and software design as 
  well. Many researchers are working hard to attain these 
  skills; the problem is that academia's reward structure is 
  not well-poised to reward the value of this type of work. 
  In short, time spent developing high-quality reusable 
  software tools translates to less time writing and 
  publishing, which under the current system translates to 
  little hope for academic career advancement.
I think the HN title is somewhat misleading, the central thesis from the original article is stated as:

  the skills required to be a successful scientific
  researcher are increasingly indistinguishable from the 
  skills required to be successful in industry
This is probably a stunningly awesome outcome - it means that industry has a place for advanced degree holders who will not find a classic academic position. In the linked essay above, from the same author:

  the number of PhDs granted each year far exceeds the 
  number of academic positions available, so it is simply 
  impossible for every graduate to remain in academia.
I wish people would not use 'big data' as a label in these discussions. I think the essential truth is that being able to apply a scientific, quantitative thought process to problems combined with the ability to write software to provide others with solutions to those problems is valuable across academia and industry. That doesn't really have much to do with the 'big data' meme flaming across the skies these days.


Academia can easily retain us. Pay us more and support our work.

I love my field very much; the only three things that have me considering leaving academia: Poor salary outlook; PhD + 10 years ___domain expertise? $50k. Poor job liquidity; zero geographic choice. Professors rarely appear to be happy/terrible work-life balance due to administrative overhead and the need to secure future funds.

All of these concerns can be resolved with money, through both higher salaries and better funding. If they were resolved, I wouldn't be looking outside at all.

If you have great management and need someone who's good with precision hardware, sensing, and data analysis in the Seattle area, please get in touch.


An interesting read (along with the sister article pointed out by tom_b, linked in the original). It is relevant for me as within about a year I will leave my current position as a postdoctoral scientist. I consider myself more heavily weighted towards the "___domain specialist" and "probability/statistics" foundation, but would still be considered a data scientist by many. The article is concerned with making academia more attractive so as to stop off loss of data scientists to industry, but I am also wondering whether or not there might be some fit within industry that has perks like those in academia. So I will try to ask for some advice, as someone with an open mind towards both academic and industrial position, but with no experience in the latter...

In applying for position in industry what sort of strategies are there to maximize time allocated to discretionary research/independent projects/etc. Lets say I am willing to work a maximum of 60 hours a week, and that I have some idea in mind for a minimum salary (I dare not say the actual number). How can I negotiate a contract where 10-15 of those hours go towards personal projects that might be only loosely aligned with the firm's objective? Is this even possible for someone just starting out? For example, could I take a pay deduction? Or should I just think about reducing the total number of hours I work for the company?

Does anyone have experience applying for research grants within industry (of small to medium size)?


The "unreasonable effectiveness of data" in finding publishable results and creating commercially viable products ought to be blinding us (or rather "them" - AI researchers of whom I'm not one) to better ways to learn. A child learns to speak and recognize objects from much less data than contemporary machine learning/data mining/AI/whatever you call it needs.

Of course it's much better to keep your program small and your data sets huge than the other way around - we'd all do it if we could. But it ought to be a lot like keeping a 2^64-N look-up table to implement 32b integer multiplication; it's ultimately larger and costlier than figuring out how multiplication works, and you give wrong results for the N entries missing in your table.

Or something along those lines. (Of course natural languages are "scruffier"/not as "neat" as multiplication - but demonstrably not as scruffy as to be impossible to reasonably learn in any way except reading everything ever written and translated into another language.)


>"A child learns to speak and recognize objects from much less data than contemporary machine learning/data mining/AI/whatever you call it needs."

You think?

When a kid starts talking at 1.5, 2, 3 years or whenever they're doing so after being exposed to many thousands of hours of input, much of it effectively guided.

Also, as far as I know, we don't have nearly enough understanding of the way the brain works to make a useful comparison between "records" fed to a model and all the analog input that goes into making a person.


I think the key factor here is that with the new tools and power in computation big data is everywhere. Intelligent and creative people can use this tools to create models and new hypothesis that they can refine and sharp with the new data. This is like Lisp and its meta capabilities. We can design programs that create general hypothesis (macros) and then those tests are macro-expanded to create functions that specialize and refine the model and hypothesis. The power of computation and the availability of rich sources of data (sensors, NLP, twiiter, and so on) allow us to think in ways that were a dead end before the big data era.

Is the new tools and powerful computational capabilities what rewards those with the inspiration and creativity required to get the most of it. Industry or Academia is a false dichotomy, what is needed here is a new way of thinking to explore and create hypothesis in a way that was not possible before the big data era.


I'm also interested in the unreasonable effectiveness of data in both reenforcing an illusion of control and rationalizing our preexisting prejudices - a commonality between quants in finance, data science in business, and mainstream science (as Ioannidis is teaching us.)

Anything that improves and grounds statistics in some sort of concrete, more falsifiable context is a good thing. Maybe new people attacking from a different perspective can help.


Whenever I hear about "brain drain" it always means "someone provides better conditions".




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: