Hacker News new | past | comments | ask | show | jobs | submit login
SQL Server 2005 vs. MySQL for a startup?
9 points by tejus on Feb 16, 2008 | hide | past | favorite | 29 comments
We are wondering whether to use SQL Server 2005 or MySQL as our database. The database will store the following information (for stocks & bonds):

1. Market activity information – price, volume, open, high, low, close for various instruments. 2. Fundamental information – balance sheet line items, income statement line items, news stories etc. 3. Technical indicator values – commonly used moving averages, RSIs etc.

Typical use scenario will be that a browser will request the following information:

1. Data for an stock/bond 2. Data for a certain day/time (across a bunch of stocks).

The predominant concern here is : SPEED. The application should very quickly be able to handle requests and spit out the necessary data to a .NET client.

Do you have any idea what might be the fastest solution in this situation? Can you ask your coders for any suggestions?

I am tempted to go with MySQL since it is free vs. high licensing costs for SQL Server.

The main reason I am considering SQL Server is because of a notion in my head that if it's a .NET/Windows application, SQL Server may be faster.

Thanks!




Postgresql is free, it is less encumbered, possibly, than MySQL in the long run. Postgresql is more totally Open Source, whereas MySQL is owned by Sun Microsystems. And it is not free for all commercial cases. Postgresql is.

MySql gets its speed rep from the 'MyIsam' table type. But you don't want that for financial data. You want a stronger database. They have another several table types that are more robust and standardized, but not as fast.

Postgresql is plenty fast for large installations when well tuned. And it has good referential integrity, a long-enough history so you can trust it, and stored procedures. MySql is simpler to get started with.

I've never heard that SQL Server was faster, but I'd estimate they have very nice integration with .NET, as it's the same company. This nice integration will undoubtedly lock you in to using their database forever, though. And who knows how their price will change?

Whichever you choose, you will gain if you have at least one person around who has used the DB before and (for MySQL) table-type before. Even once in a while.

Paid consultants are available for any database you choose.


I tend to be fairly techagnostic. So, even though I tend to prefer a Unixy/OSS environment for web apps. I researched MS offerings for a startup idea last summer. I'd say that if you're going with a .NET stack, you're probably better off going with MSFT if you can afford it.

Have you considered an Action Pack subscription? If not, you might want to check in to it: https://partner.microsoft.com/Indonesia/program/managemember... You get a +lot+ of MSFT software for around $300 bucks. Granted, the license is only for two years, but that should be enough to get a startup off the ground. If you don't have enough money to buy MSFT licenses after 2 years, you have bigger problems than licensing costs.

And, as long as you're willing to keep going the MSFT way, there's ways to get cheap bulk subscriptions to MSFT licenses. You just have to keep up with MSFT certification process. They actually make it pretty easy to start your own software company based on the MSFT stack.

I really can't speak to speed of SQL server vs MySQL, however. It sounds like most of the data in your DB is going to be text based. I'd be a bit more concerned about speed if you were going to be serving video or a lot of graphics out of your DB.

One option that you might consider before investing a year or two an a DB, is downloading a trial version of each RMDB and do some tests.

MSSQL Server 2005 download: http://msdn2.microsoft.com/en-us/bb498185.aspx

MySQL 5 download: http://dev.mysql.com/downloads/mysql/5.0.html#downloads

Your schema sounds pretty simple. So, design your schema, and run some tests. Find out how quick your server will run with each DB. I bet if you spent a week or two running tests like that, the time spent would more than pay for itself in the long term. Platform and DB choices tend to lock you in for a long time, and if you hate your platform, those are headaches you'll have for quite a while.


It sounds like your dataset will fit comfortably in memcached. Run that on top of whatever DBMS you trust and whose SQL dialect you prefer to work with. For me this means Postgre or Oracle, although MySQL 6 looks promising and I might have to drop my grudge against them once it's released. SQL Server 2005 is very buggy and sooner or later it will bite you in terrible ways.

Bear in mind when you take this advice that the DBMSs I dislike the most are also the ones I have the most experience with, so it's possible that Postgre and Oracle are just as bad, and I just haven't discovered it yet :-)

Don't store technical indicators in your database; compute them on-the-fly. Memoize them if necessary, but it's probably better not to since I don't think you'll be CPU-bound and the added latency will be negligible.

If you're dealing with live quotes, don't store them in your database. Just keep them in memcached and re-fetch them if your server goes down. Do all your DB writes at once, at the end of the day after the market closing bell. (Aside from user-specific data, which obviously needs to be committed immediately.)


If speed is the overriding concern, see if you can use Berkeley DB, which has ACID transactions and blows away all relational databases in speed. It sounds like your data model may be simple enough for it.

You should also consider a mix of both; you could use BDB as a kind of cache used for rapid delivery whereas, say, the relational database could be used for non-time-critical processing and storage or something.


Start with a open-source DB(can't go wrong with Postgre or mysql) -- if you start hitting a wall, start looking into specific database optimized for financial data like Kx(http://kx.com/)

Do you have any SLA/other constraints on how fast you have to deliver financial data?


I chose SQL Server 2005 for Loopt, but that's because at the time I made the choice MySQL didn't meet all my requirements. I should have looked more into PostgreSQL, but didn't.

In your case, it sounds like MySQL should work just fine. For fast response times, just partition the data across a lot of cheap servers that can keep it all in RAM. This approach lends itself to a free database. That said, speed will be affected much more by what pre-processing you do before inserting the data (to optimize lookups), and how you design your schema than by your database vendor choice.

If you go with MySQL or PostgreSQL, be prepared to fix bugs in the ADO connectors from .Net to the database. I've used PostgreSQL with .Net and npgsql needs some work. Perhaps MySQL connectors are better.

I'd start with PostgreSQL or MySQL and only switch to SQL Server 2005 if you run into problems. The switching cost won't be that high if you don't use stored procedures and stick to using ADO.Net generic functionality.

Microsoft has been a great partner and their licensing model hasn't caused us any problems, so don't be afraid to go with them if they're the right choice. You'll take some heat for it though ;)


> If you go with MySQL or PostgreSQL, be prepared to fix bugs in the ADO connectors from .Net to the database.

Likewise for SQL Server and ODBC, but then the bugs are in SQL Server so you can't fix them. However, .NET has a perfectly good ODBC client library. Use that, not ADO, to talk to MySQL.


If you find a legitimate bug in SQL Server, you can open a support ticket, and Microsoft will fix it. Frequently these fixes will make it into the next service pack, or if they're severe, a Windows update. This is all at no (extra) cost to you.

In speed tests I've seen using ODBC be as much as 10x slower than npgsql in talking to PostgreSQL. I didn't look into why, and just used ADO.Net.


I don't think I ever encountered a bug for which I'd be able to write a coherent support ticket. They tended to be horrendous mandelbugs that were difficult to reliably reproduce due to caching. If it were an open source project then I could have at least set some watchpoints and sent a useful backtrace, but with closed source, SOL.


Oracle has a native driver for .NET.

Google for ODP.NET. If it's anything like as good as their pure Java JDBC drivers, it will perform pretty well.


Before I get raked over the coals:

npgsql is great. It is up to 10x faster than the PostgreSQL ODBC driver for our workload. It has a few tiny issues that are irritating but not big problems, and I plan to fix them and submit patches when I have time.


The raw performance between Sql Svr and Mysql are pretty much a wash. If you primarily base your decision on this, you are overlooking quite a few more important issues. Hardware (Raid level, Ram, CPU), DB driver, programming language, and caching are all more important issues.

Non-speed issues: 1. Sql Svr 2000 had locking problems such that many transactions deadlock and automatically get rolled back. This was due to page locking. I think MS fixed this in 2005 with record-level locking. 2. Sql Svr has very nice stored procedure support. This can be extremely useful. If you think you may need this capability, then Sql Svr is the better choice.

If you do decide to go with Sql Svr, you will probably end up having to get a copy of VS 2008 and do some C# programming. It's almost impossible to maintain Sql Svr w/o it, unless you want to maintain the db by hand every day.

It doesn't sound like you have much db experience. Get a couple of experienced db admins in to talk with you before you make any big decisions, even if it's just a two hour consult.


Skip both of them.

Just use memory with a (very frequent) periodic write of the entire dataset in a file named after the time you saved it. Just write/read a block of memory (big array) to/from disk in a file named after the relevant time. What's better is that modifying or accessing this data is normal straightforward memory accesses - very likely less work than writing SQL queries and very very likely easier to unit test. So this should not only be faster, it should also take you less work.

You probably only care about ~10,000 US stocks & bonds, maybe 50 numbers per, all of which are represented well by a 64-bit double (most probably fine in a 32-bit float... yes even US bond prices - they trade in 32nds - binary works fine). 10,000 instruments x 50 numbers x 8 bytes = 4M bytes. All 3 dimensions could increase by an order of magnitude and the answer would probably remain the same or very similar.

You mentioned two requests, and a primary goal of SPEED.

1. Data for a stock/bond - it's in memory. Heck, there's a good chance you can fit the entire current dataset in your CPU cache. You just can't beat that (well you don't need to - put down that VHDL tutorial).

2. Data for a certain day/time (across a bunch of stocks): Just read the file off disk for that day/time. Even if this isn't immediately up to your needs, normal optimizations should buy you tons of performance here - buy RAM and disk mirrors. The data should be contiguous on disk, so you should have very minimal seeking (which is normally the speed killer). You could come up with something that beats this by pulling out fewer than all the stocks at once, but your complexity would skyrocket. This will should at least beat most setups with MySQL or SQL Server 2005 without a metric ton of tuning.

I'm guessing you forgot to mention that you need a time-series for price and volume information (if only for a graph to throw up alongside some boring numbers). Consider just handling that separately, especially as all the other numbers you're interested in don't change anywhere close to as often (and those that do are based off the price/volume and can be trivially calculated on the fly). Some headaches to watch out for though: if you try opening 10,000 file handles at the same time, something's likely to get mad at you. Also note that if you're thinking of opening and closing 10,000 files a lot, you're likely going to be sending the disk seeking around like crazy updating meta-data on disk.


Speed I have found is more dependant upon the sql query you are using and once you have the data the method you use to display it. If you have highly optimized queries and you are using Array's or GetString (not sure what they call those in .net) then you will be in good shape.

For $42 from Amazon you can get the SQL Server 2005 developer version, which is the "enterprise" version, but just for development to get you started http://www.amazon.com/Microsoft-SQL-Server-Developer-2005/dp...

Personally, I am sticking with MSSQL 2000 because I know and already own it. You can get that on eBay for about $800.

I hate that I tend to get out of date on Microsoft stuff, I tend to stick with what I have and not experiment once something is working the way I want it. I did buy the MSSQL 2005 developer because the price is right, one day I will try it out.


I'd go with MySQL. There are better tools, it is less expensive and easier to scale. I don't have any straight-up "speed" evidence other than what is anecdotal.

If you're worried about the MySQL license, you should try PostgreSQL. I've been using it for three years and have had nothing but extremely positive results. Version 8.3 is stupidly fast.


1. Data for an stock/bond 2. Data for a certain day/time (across a bunch of stocks). The predominant concern here is : SPEED.

I'm a big fan of PostgreSQL but based on your description you might want to look into one of the existing kdb solutions from www.kx.com. Those guys have been solving stuff in that ___domain for the last two decades.


Go with what you have experience in. If you build it and it is not fast enough, then throw caching at it. There probably won't be a huge difference between most databases, if used effectively.

If you have special requirements, such as true isolation in transactions, be aware that this is "leading edge" for SQL Server 2005, and probably for MySQL. On the other hand, Oracle has been doing it for years.

If the data is being streamed in, and timestamped, and stored, you may want to just hold it in memory (32 Gigs holds a lot of data) or write to flat files. For simple data types, that are used in a predictable way, not much beats well-written custom code.


Have you thought of trying Amazon's simpledb? I just to my beta login and it seems pretty awesome so far.

what about couchdb? Still young but can be clustered and will probably way outperform any relational db for the stuff you're describing.

Last time I did benchmarks (a few years ago) mssql totally blew away mysql and postgres on windows. Turns out this was b/c I was using a "dependent subquery" which in my opinion is a lot simpler way to write a lot of queries.

Now that I use mysql (and linux and solaris) exclusively I don't use dependent subqueries at all and all is well.


I would go for sql server 2005/8 in a .net enviroment.

I have never had any speed problems. Also if you are going to use asp.net membership stuff it makes sense to go with it. There is a mySQL membership provider but its not great from my playing.

Also the cost is not going to be that much because you can do an aweful lot of processing with one MS database server.

I am sure with some googling you can find some bench marking of all major dbs.

I use discountasp.net for my .net apps as a starting point.

One last thing, finding someone to work on your MS sql database might be easier.


The financial industry does not use databases for this kind of thing much - for keeping records, and back office trade post-processing, yes, but not for market data. Try wikipedia ~ Event Stream Processing, Complex Event Processing, TimesTen (now inside Oracle), Streambase... all very expensive... And if you don't need this kind of speed, then probably all of your listed databases are fine, none has a significant (x10) lead over the others. Work with what matches your skillset.


The actual design and implementation of your data model and application are going to be far more important for performance. My suggestion would be to start on MySQL and migrate to another platform if it becomes necessary. You also have to ask what kind of skill sets do you have around and what are you likely to have in the future. A well configured and maintained installation of either platform will out peform a bad one. So are you windows geeks or linux geeks?


MySQL Server is cheaper and faster. However, MySQL Server is technically only free if you use it with GPL2 applications or never distribute your app.


I think the key is the meaning of the word "distribute." I believe it is generally accepted that using MySQL in a web based application provided as a service doesn't constitute distribution (though there are some questions about situations when the app is hosted on servers operated by a 3rd party).


I was suspicious, but I read MySQL's page on licensing and that seems to be true. I'm quite shocked about it, I thought you were free to use MySQL in non-open products without having to buy their commercial license.


you can use it all you want. Just don't distribute it. e.g you are prohibited to take it and make tweaks and release your own DB software...


I would argue assuming its stock trading, your concern should be transactions safety so you need a database with ACID support. Beyond that, which you choose is based on your experience.

Speed is really not a concern anymore because you can use memcached or other distributed cached system when you really need that much scalability and fast access to raw data.


mysql on linux is better then its windows version. and the windows version I belive has a licensing cost. I guess they call it mysql-max and mysql-nt maybe this was the older versions..

if you are going with .net then your server I guess is a windows server so go with MS SQL..

I would not go with.NET Unless other things depended on it and there was gun pointed at my head :)


Neither, pgsql


Do you have any evidence that the database is going to be your bottleneck, at this point in the game?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: