Wednesday, June 19, 2013

2 reasons why we select SimpleDB instead of DynamoDB

If you search on google with keywords "SimpleDB vs DynamoDB", there will be a lot of helpful posts. Most of them give you 3 to 7 reasons to select DynamoDB. However, today I'll share some experience of using SimpleDB instead of DynamoDB.

I got some issues when use DynamoDB in my production, and finally found that SimpleDB is fit in my case perfectly. I think the choice of SimpleDB and DynamoDB should NOT rely on the performance or the benefits of the DynamoDB/SimpleDB, instead, based on the limitation and real requirement in my product.

Some background: I have some data previously saved in MongoDB, the amount of data will mostly not exceed 2G bytes in SimpleDB. Now we decided not to maintain our MongoDB database servers, but leverage AWS SimpleDB or DynamoDB to reduce the cost on ops.

Both SimpleDB/DynamoDB is key/value pair database. There are some workaround to store a JSON document, but will introduce additional cost. The data structure in my MongoDB is not too complicated and can be convert to key-value pair. So, before you choose SimpleDB or DynamoDB as your database backend, you must understand this fundamental.

Reason 1: Not flexible on indexing. With DynamoDB you have to set indexing fields before creating the database, and cannot be modified. This is really limited the future change. DynamoDB supports 2 mode of data lookup, "Query" and "Scan". "Query": based on hash key and secondary keys, high performance. However, when you query data, “hash” key must be set. For example, suppose we have “id” key as hash key. When query by “id”, it’s good, we can get best performance. But when we query only by a field "name", we have to shift to “Scan” because hash key is not used. The performance of "Scan" is totally not acceptable because AWS will scan every record. I created a sample DynamoDb with 100,000 records, and each record has 6 fields. With "Scan", it costs 2 ~ 6 minutes to selecting ONE record by adding condition on one field. Here is the testing code in Java:

DynamoDBScanExpression scan = new DynamoDBScanExpression();

scan.addFilterCondition("count", new Condition().withAttributeValueList(new AttributeValue().withN("70569")).withComparisonOperator(ComparisonOperator.EQ));

System.out.println("1=> " + new Date());

PaginatedScanList<Book> list = mapper.scan(Book.class, scan);

System.out.println("2=> " + new Date());

Object[] all = list.toArray();

System.out.println(all.length); // should be 1

System.out.println("3=> " + new Date()); // 2 ~ 6 minutes comparing to date after “2=>”, in most cases around 2 minutes

SimpleDB does not have this limitations. SimpleDB create index for "EVERY" field in a table(actually AWS use the term "domain", and MongoDB use "collection"). I modified a little bit the code and test on SimpleDB, here are the results:

  • Query 500 (use "limits" to get the first 500 items in a “select” call) items with no condition: about 400 ms to complete. The sample application running on my local machine. If it is running on EC2, it should be within 100 ms. 
  • Query 500 items with 1 condition, also about 400 ms to complete.
Reason 2: Not cost effective for our case. The DynamoDB charge money by capacity of Read/Writes per seconds. Please note that the capacity is based on read/write your records instead of the read/write API call, and no matter you use batch or not. Here are more details in my test.  I used batch API to send 1000 records with more than 1000 bytes for each record. There will cost 50 seconds to finish the batch when the write capacity was set to 20/seconds. While I keep the my application running, and change the capacity on AWS console to 80/seconds, there will take 12 to 25 seconds to complete one batch(ideally it should be 1000/80 = 12.5 seconds, the extra time comes from network latency because I’m sending more than 1 megabytes data per API call). 

In our case, we may read the 500 records in SimpleDB into memory, but read nothing in next 10 minutes. With SimpleDB we can complete it in 500 milliseconds. With DynamoDB we have to set read capacity to 1000 reads/seconds, and it will cost $94.46 per month(via AWS Simple Monthly Calculator). With SimpleDB, it may cost less than 1 dollar.

Conclusion: DynamoDB is really designed for high performance database. SimpleDB has more flexibility. Here what I mean "really designed for high performance" to DynamoDB is, if you choose DynamoDB, you must make sure you have well designed your architecture for high traffic dynamic content. If you have design your architecture targeting high traffic dynamic content and high performance, DynamoDB may perfectly match your request. In our case, SimpleDB is enough, excellent flexibility, and cost effective. Before looking for the comparison of SimpleDB and DynamoDB, design your architecture first. DynamoDB is good, but not fit for everyone.

Here are some useful links:

Sunday, June 2, 2013

Cross Platform - Initial Idea

I worked on a commercial product for 7 years. have more than 400 million dollar revenue per year. That product can running on Windows and Mac, also a lite version on web, android and iPhone/iPad, and have data interoperability across all the platforms. We have investigated various possible techniques to support cross platform development using C#/C++/Objective-C, with some framework like Qt framework, as well as some other approach like HTML+CSS+JavaScript for cross-platform features. I want to share my working experience on some technologies that support cross-platform development.

Decades ago, when the 2nd operation system came to the world, there is the needs for cross platform development. We need to choose the target platforms based on current marketing shares. Here is the majority target platforms:
- Desktop
 - Microsoft Windows
 - Mac OS X, Apple Inc.
 - Linux(My favorite distribution is Ubuntu)
from, February 2013

- Mobile
 - Google Android, there is also difference between handheld and tablet.
 - Apple iOS, there is also difference between iPhone and iPad

from, February 2013

For this series of articles, I'll starting with this roadmap:
- Programming Language for cross-platform development.
- Review of frameworks for cross desktop operations, e.g. Qt, Mono, wxWidgets
- Review of web as platform: HTML 5, Native Client
- Review of frameworks supporting multiple mobile frameworks, e.g. PhoneGap, Appcelerator/Titanium