Rockset’s primary use-cases are:
1/ developers building low-latency operational applications, esp. combining real-time data sets with other structured data sets (eg: you are building a microservice to relieve pressure from your OLTP system)
2/ data scientists wanting to quickly test hypotheses on different structured and semi-structured datasets without having to stand-up any servers or do any ETL or data prep. (you can suspend collections/documents in Rockset when you don’t use them -- our pricing page currently only lists Active Documents’ pricing)
Rockset is mutable which allows it to keep itself in sync with any data source, unlike columnar data warehouses, which are not optimized for data manipulation.
Rockset’s strong dynamic typing allows it to treat JSON as a data representation format rather than a special data type or a storage format. So, once you load JSON data into Rockset, you can access all fields at all levels without any special JSON operators or functions.
Comparing Snowflake with Rockset is perhaps akin to comparing Teradata with Elasticsearch. Both useful systems but built for very different use cases.
The biggest thing Rockset has in common with Snowflake is in sharing the philosophy that data management systems have to be built ground up for the cloud to take full advantage of cloud economics. Our blog (https://rockset.com/blog/) has a few posts on these already and we will write more.
The SQL with dot notation is nice, and something other databases have failed at doing with JSON. As for updates, it looks like _id is set automatically, and there's no UPDATE endpoint listed in the docs, so how is data mutated (by key)?
And my comment was more about the price, because at low volume of 10M records, any database system can already do fast queries. From the whitepaper, it looks like the indexing is intensive and that's where most of the cost is coming from, and explains the active/passive storage and tiered rocksdb-cloud setup.
Interesting use-case for prototyping but I don't see how it is cost-effective for higher-scale usages. Congrats on the launch though.
add_docs() API always UPSERTS and so yes, updates are through "_id". The system auto-assigns an "_id" only when it is not supplied by the user or when an existing field is not mapped as the "_id" field at collection creation time. You will have to use delete_docs() before add_docs(), if you want replace-document behavior.
Our backend architecture is quite scalable and actually grows and shrinks with the demand continuously.
And yes, all documents are automatically indexed and replicated for fast query performance, which is more expensive than just storing them in "_id"->"doc" format. For our use cases and value prop, this one time indexing cost pays for itself several times over by saving time during query processing.
Rockset | Senior Software Engineer, Lead Front-end Engineer | San Mateo, CA | Onsite | Full time
At Rockset we are building the next generation of cloud-native data infrastructure. Our team includes founding members of RocksDB, Hadoop Distributed File System, Facebook's search engine (Unicorn) and social graph serving engine (TAO). We are backed by Greylock Partners and Sequoia Capital. We are building our infrastructure on top of Kubernetes on AWS, and are using systems like RocksDB, Kafka, Zookeeper, gRPC and Terraform. Most of our codebase is in C++ and Java.
Rockset | Senior Software Engineer | San Mateo, CA | Onsite | Full time
At Rockset we are building the next generation of cloud-native data infrastructure. Our team includes founding members of RocksDB, Hadoop Distributed File System, Facebook's search engine (Unicorn) and social graph serving engine (TAO). We are backed by Greylock Partners and Sequoia Capital.
We are building our infrastructure on top of Kubernetes on AWS, and are using systems like RocksDB, Kafka, Zookeeper, gRPC and Terraform. Most of our codebase is in C++ and Java.
Rockset | Senior Infastructure Engineer, Lead Frontend Engineer, Software Engineer | San Mateo, CA | Onsite | Full time
At Rockset we are building the next generation of cloud-native data infrastructure. Our team includes founding members of RocksDB, Hadoop Distributed File System, Facebook's search engine (Unicorn) and social graph serving engine (TAO). We are backed by Greylock Partners and Sequoia Capital.
We are building our infrastructure on top of Kubernetes on AWS, and are using systems like RocksDB, Kafka, Zookeeper, gRPC and Terraform. Most of our codebase is in C++ and Java.
Rockset | Senior Infastructure Engineer, Lead Frontend Engineer, Software Engineer | San Mateo, CA | Onsite | Full time
At Rockset we are building the next generation of cloud-native data infrastructure. Our team includes founding members of RocksDB, Hadoop Distributed File System, Facebook's search engine (Unicorn) and social graph serving engine (TAO). We are backed by Greylock Partners and Sequoia Capital.
We are building our infrastructure on top of Kubernetes on AWS, and are using systems like RocksDB, Kafka, Zookeeper, gRPC and Terraform. Most of our codebase is in C++ and Java.
Rockset | Senior Infastructure Engineer, Lead Frontend Engineer, Software Engineer | San Mateo, CA | Onsite | Full time
At Rockset we are building the next generation of cloud-native data infrastructure. Our team includes founding members of RocksDB, Hadoop Distributed File System, Facebook's search engine (Unicorn) and social graph serving engine (TAO). We are backed by Greylock Partners and Sequoia Capital.
We are building our infrastructure on top of Kubernetes on AWS, and are using systems like RocksDB, Kafka, Zookeeper, gRPC and Terraform. Most of our codebase is in C++ and Java.
Rockset’s primary use-cases are: 1/ developers building low-latency operational applications, esp. combining real-time data sets with other structured data sets (eg: you are building a microservice to relieve pressure from your OLTP system) 2/ data scientists wanting to quickly test hypotheses on different structured and semi-structured datasets without having to stand-up any servers or do any ETL or data prep. (you can suspend collections/documents in Rockset when you don’t use them -- our pricing page currently only lists Active Documents’ pricing)
Rockset is mutable which allows it to keep itself in sync with any data source, unlike columnar data warehouses, which are not optimized for data manipulation.
Rockset’s strong dynamic typing allows it to treat JSON as a data representation format rather than a special data type or a storage format. So, once you load JSON data into Rockset, you can access all fields at all levels without any special JSON operators or functions.
Comparing Snowflake with Rockset is perhaps akin to comparing Teradata with Elasticsearch. Both useful systems but built for very different use cases.
The biggest thing Rockset has in common with Snowflake is in sharing the philosophy that data management systems have to be built ground up for the cloud to take full advantage of cloud economics. Our blog (https://rockset.com/blog/) has a few posts on these already and we will write more.