I can do one quickly. I need to take affiliations from papers and work out which...

abalone · on Aug 2, 2019

> For example, if you're doing text classification then tfidf+svm is a solid first thing to try

This is screening for specific ___domain knowledge (text processing) not general programming aptitude. That's ok if you want specific kinds of prior knowledge on Day 1 but it is not a way to hire generally smart people.

> I guess this would fall under some data science fundamentals, but the approach I think works for CS fundamentals. What data structures could you use? What are the tradeoffs? It's not about finding the one optimal solution, but about how to proceed.

This is exactly how most algorithm interview questions work.

I'm trying to understand what the OP meant by "real problems" not "academic puzzles". It sounded like they avoided hard algorithms yet "got a sense" of CS fundamentals somehow.

IanCal · on Aug 2, 2019

> This is screening for specific ___domain knowledge (text processing) not general programming aptitude. That's ok if you want specific kinds of prior knowledge on Day 1 but it is not a way to hire generally smart people.

Not really, the ___domain knowledge here is much more the bibliometrics stuff, which we don't need usually. What I do need is someone who knows that they can't just take any data source they find and throw the latest deep learning hotness at it and call it a day because the F score is over some random threshold.

You can absolutely use this to hire generally smart people, what you are right about is I won't be hiring people who are generally smart but have no basic understanding of any of the types of solutions they'll need to work on and with. Based on team size and where we are, that's completely fine for me right now.

I think this helps find people that:

1. Are able to talk through a problem

2. Understand the kinds of issues they may face, and discuss how to go with that (including business level work, when to / not to use humans, etc)

3. Have experience working on the kinds of problems they are going to face

The tfidf+svm example was not intended as "ah yes, they said the algorithm I wanted" but as a springboard into a further discussion. Maybe they talk about word vectors, whatever, can they explain what the pros and cons are? Where might it fail, or more importantly what would they want to test? Where do we get training data, how long might that take for reasonable quality, how do you measure that, etc.

> This is exactly how most algorithm interview questions work.

The puzzles I think they're talking about are "here is a theoretical problem, find the optimal solution". Like "you have X eggs and need to find the highest floor you can drop them from without them breaking" or the classic "implement a doubly linked list with a single pointer" despite the fact _almost nobody_ would actually implement that.

I'm talking about giving an actual issue they're likely to face and talking through it. Maybe that's "how would you implement user flow for X, given that we've got institutional customers with multiple clients" or "we need to do rate limiting and have X servers, how would you go about that?". For the latter that might go into a discussion on what the problems are with different approaches in complexity, what the cost is of letting someone exceed their rate, of cutting someone off early, etc. Those aren't really my field so maybe the questions are a bit off, but the point is can they contribute to a discussion on the way forwards on a problem which represents something they will realistically face.