Wednesday, June 19, 2013

2 reasons why we select SimpleDB instead of DynamoDB

If you search on google with keywords "SimpleDB vs DynamoDB", there will be a lot of helpful posts. Most of them give you 3 to 7 reasons to select DynamoDB. However, today I'll share some experience of using SimpleDB instead of DynamoDB.

I got some issues when use DynamoDB in my production, and finally found that SimpleDB is fit in my case perfectly. I think the choice of SimpleDB and DynamoDB should NOT rely on the performance or the benefits of the DynamoDB/SimpleDB, instead, based on the limitation and real requirement in my product.

Some background: I have some data previously saved in MongoDB, the amount of data will mostly not exceed 2G bytes in SimpleDB. Now we decided not to maintain our MongoDB database servers, but leverage AWS SimpleDB or DynamoDB to reduce the cost on ops.

Both SimpleDB/DynamoDB is key/value pair database. There are some workaround to store a JSON document, but will introduce additional cost. The data structure in my MongoDB is not too complicated and can be convert to key-value pair. So, before you choose SimpleDB or DynamoDB as your database backend, you must understand this fundamental.

Reason 1: Not flexible on indexing. With DynamoDB you have to set indexing fields before creating the database, and cannot be modified. This is really limited the future change. DynamoDB supports 2 mode of data lookup, "Query" and "Scan". "Query": based on hash key and secondary keys, high performance. However, when you query data, “hash” key must be set. For example, suppose we have “id” key as hash key. When query by “id”, it’s good, we can get best performance. But when we query only by a field "name", we have to shift to “Scan” because hash key is not used. The performance of "Scan" is totally not acceptable because AWS will scan every record. I created a sample DynamoDb with 100,000 records, and each record has 6 fields. With "Scan", it costs 2 ~ 6 minutes to selecting ONE record by adding condition on one field. Here is the testing code in Java:

DynamoDBScanExpression scan = new DynamoDBScanExpression();

scan.addFilterCondition("count", new Condition().withAttributeValueList(new AttributeValue().withN("70569")).withComparisonOperator(ComparisonOperator.EQ));

System.out.println("1=> " + new Date());

PaginatedScanList<Book> list = mapper.scan(Book.class, scan);

System.out.println("2=> " + new Date());

Object[] all = list.toArray();

System.out.println(all.length); // should be 1

System.out.println("3=> " + new Date()); // 2 ~ 6 minutes comparing to date after “2=>”, in most cases around 2 minutes

SimpleDB does not have this limitations. SimpleDB create index for "EVERY" field in a table(actually AWS use the term "domain", and MongoDB use "collection"). I modified a little bit the code and test on SimpleDB, here are the results:

  • Query 500 (use "limits" to get the first 500 items in a “select” call) items with no condition: about 400 ms to complete. The sample application running on my local machine. If it is running on EC2, it should be within 100 ms. 
  • Query 500 items with 1 condition, also about 400 ms to complete.
Reason 2: Not cost effective for our case. The DynamoDB charge money by capacity of Read/Writes per seconds. Please note that the capacity is based on read/write your records instead of the read/write API call, and no matter you use batch or not. Here are more details in my test.  I used batch API to send 1000 records with more than 1000 bytes for each record. There will cost 50 seconds to finish the batch when the write capacity was set to 20/seconds. While I keep the my application running, and change the capacity on AWS console to 80/seconds, there will take 12 to 25 seconds to complete one batch(ideally it should be 1000/80 = 12.5 seconds, the extra time comes from network latency because I’m sending more than 1 megabytes data per API call). 

In our case, we may read the 500 records in SimpleDB into memory, but read nothing in next 10 minutes. With SimpleDB we can complete it in 500 milliseconds. With DynamoDB we have to set read capacity to 1000 reads/seconds, and it will cost $94.46 per month(via AWS Simple Monthly Calculator). With SimpleDB, it may cost less than 1 dollar.

Conclusion: DynamoDB is really designed for high performance database. SimpleDB has more flexibility. Here what I mean "really designed for high performance" to DynamoDB is, if you choose DynamoDB, you must make sure you have well designed your architecture for high traffic dynamic content. If you have design your architecture targeting high traffic dynamic content and high performance, DynamoDB may perfectly match your request. In our case, SimpleDB is enough, excellent flexibility, and cost effective. Before looking for the comparison of SimpleDB and DynamoDB, design your architecture first. DynamoDB is good, but not fit for everyone.

Here are some useful links:

2 comments:

  1. Nice to see a post like this.

    Reason 1: Not flexible on indexing.

    I totally agree that DynamoDB is not flexible here. You can not add on or delete indexes. For your use case, seems like making "id" the hashkey and "name" as the range key, maybe adding some other columns as local secondary index as well. But anyway, if your usage changes again you would need to nuke your table and re-create table and schema - yuck!

    Reason 2: Not cost effective for our case.
    Hmm....this sucks too. One possible thing is, have you thought about dialing up read capacity in the morning and dialing down it every night?



    ReplyDelete
  2. I am also agree with these two points. I am adding three more points in the comparison -

    1. Amazon SimpleDB offers simplicity and flexibility whereas Amazon DynamoDB offers good performance and incremental scalability.
    2. Amazon SimpleDB pricing is based on your actual box usage whereas DynamoDB is priced according to how much request capacity you have requested.
    3. Amazon SimpleDB can be useful for those who need a non-relational database for storage of smaller, non-structural data whereas Amazon DynamoDB can be useful for those who need a fast, highly scalable non-relational database.

    ReplyDelete