It amazes me how quickly our industry has forgotten the need for DBAs. With these MongoDBs, MPP cloud dbs and Hadoops, everyone seems to have assumed that engineers can now do all db work. This is reflected in the titles too: Data "Engineer".
But from my perspective, this is delusional. There is a lot that goes into DBA's experience that is not solved by the performance improvements in databases over the past decade. But there are more choices. 20-30 years ago, you would have been forced to write code on Oracle and you would have asked for help before deciding how to structure the data. Today, with more choices, you just read some online opinions, and jump on it without any internal resource to guide you.
Not saying the world of Oracle was great, but the young on this thread (me included) would benefit from respecting the experience of the old.
I agree. The issue I see is it's pretty tough to hire very good DBA that knows the new cool technologies to be very honest. I've worked with some DBAs but they have no experience with Cassandra or whatever !MySQL !Postgres !Oracle !SQL, and they also have very difficult time integrating themselves with the developers. It turns out the developers have better understanding of Cassandra than the DBAs and DevOps/Ops. As Ops we just learn from them and from incidents.... Good DBAs also tend to do a lot of testing and development besides reading manual and utilizing past experiences.
So having benchmarks tests is great as a general guideline for what works under different architectures/schema designs. Unfortunately, benchmarking is highly subjective to the initial choices. I am a big fan of BigQuery (enough to go through Google's vetting process), but there are plenty of performance issues that I've run into it that would have been easily resolved with Redshift. Here are some concrete examples:
1) Running a Query across several very small tables. It turns out that occasionally querying small tables causes heavy network traffic within Google's distributed system. The solution on Redshift would be to adjust distribution. On Google, however, you don't have any control over this. You just have to hope that Google's algorithms pick up the issue based on usage (they don't).
2) Joining large tables. Avoid joining large tables in BigQuery. In Redshift the join would have been done by making sure that the sortkey is set on the column that is used for a join on the (typically) right table. Then having a common distkey between the two tables (this way the relevant data on both tables lives on the same node. BigQuery just throws resources at the problem. Well, it turns out that throwing resources at the problem is super slow (think 5-15 Redshift seconds vs. 200 BQ seconds).
Re: Snowflake. Can't speak to it as I haven't had personal experience. I have worked with Data people who had opinions on both favorable and negative sides of the spectrum. This just suggests to me that just like Redshift and BigQuery, Snowflake is not a universal solution. You really need to understand:
1) what your goals are for the usage among varying consumers
2) what skill set do the various users of the database have
Not interested in a job, but would love to get to know your team better. I think there might be some alignment. Anyways, you can check out what I do here: https://www.caura.co
So I hate to disappoint everyone. Such contract really cannot be viewed the way this was summarized through an email. One paragraph has to be interpreted in the context of everything else.
Moreover, everyone uses this language. It is funny, but I suspect that most of Silicon Valley just recycles the same 3-4 contracts, that individual lawyers just modify slightly.
I have now worked with 60+ tech companies (Looker, Gigster, Strava, etc) - with 15 of them I had to look over the verbiage on my own.
With a first couple, I was just as suspicious. But after discussing with lawyers, I learned that there are two major issues:
1) not everything put in the contract is enforceable. In fact, just because all lawyers recycle the same contract, does not make it more enforceable. It is a simple leverage in bullying that lawyers depend on, should something occur
2) individual paragraphs have to be interpreted in the context of the entire engagement. In other words, did you have have access to Client's data on other projects. Yes, then you bet, your IP rights should be waived as they pertain to those projects.
Anything not enforceable shouldn't be in the contract in the first place. The only purpose of contracts is to enforce earlier promises when people don't agree anymore.
Caura Consulting | Data Analyst | Remote Only | Part-Time
Caura Consulting, a sponsoring organization of http://www.innerjoins.org, is looking for a very experienced Analyst to help grow the non-profit community, InnerJoin. Your job is to facilitate conversations, help answer some analytical questions, and prompt new topics for discussions.
Daily availability is required. Hours are flexible.
But from my perspective, this is delusional. There is a lot that goes into DBA's experience that is not solved by the performance improvements in databases over the past decade. But there are more choices. 20-30 years ago, you would have been forced to write code on Oracle and you would have asked for help before deciding how to structure the data. Today, with more choices, you just read some online opinions, and jump on it without any internal resource to guide you.
Not saying the world of Oracle was great, but the young on this thread (me included) would benefit from respecting the experience of the old.