Tuesday, August 6, 2013

7 Reasons You Should Use MongoDB over DynamoDB

Even recently I migrated from MongoDB to DynamoDB, and shared 3 reason to use DynamoDB. I still love MongoDB, really good NoSQL solution. Here are some points for you to make decision on using MongoDB over DynamoDB.

Reason 1: Use MongoDB if your indexing fields might be altered later.
With DynamoDB, it's NOT possible to alter indexing after being created. I have to admit that there are workarounds. For example, you can create a new table and import data from the old one. But no one is straightforward and you need some trade off if using workaround. Back to indexing, DynamoDB allows you define a hash key to make the data well-distributed, and then adding range key and secondary index. When query from table, hash key must be used, and then either range key or one of secondary indices. No complex query supported. The hash key, range key and secondary index key definition can NOT be changed in future. So your database structure must be well designed before going production. By the way, the secondary key will occupy additional storage. If you have 1G data, and if you create index and "project" all attribute to the index, then your actually cost of storage will be 2G data. If you project only the hash and range key value to index, then you need to query twice to get the whole record. Actually the API allows you to invoke query only once, but the cost to "read" capacity is twice. In addition, you can still "scan" the data and filter by conditions on un-indexed key, but please check the data in my previous post, scan could be 100 times (or more) slow than query.

Reason 2: Use MongoDB if your need features of document database as your NoSQL solution.
If you will save document like this:
{
  _id: 1,
  name: { first: 'John', last: 'Backus' },
  birth: new Date('Dec 03, 1924'),
  death: new Date('Mar 17, 2007'),
  contribs: [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ],
  awards: [{
      award: 'National Medal of Science',
      year: 1975,
      by: 'National Science Foundation'
  }, {
      award: 'Turing Award',
      year: 1977,
      by: 'ACM'
  }]
} 
(sample document from MongoDB technical document)
With document database, you'll be able to query by name.first, or if some value exists in sub-document of awards. However, DynamoDB is key-value database, and support only value or set, no sub-document supported, and no complex index and query supported. It's not possible to save sub-document { first: 'John', last: 'Backus' } to name, accordingly, not possible to query by name.first.

Reason 3: Use MongoDB if you are going to use Perl, Erlang, or C++.
Official AWS SDK support Java, JavaScript, Ruby, PHP, Python, and .NET, while MongoDB supports more. I used node.js to build my backend server, both AWS SDK for node.js and mongoose SDK for MongoDB works very well. It's really amazing to use mongoose for MongoDB. It's in active development and the defect I report to mongoose can be fixed soon. I have also experience of using AWS SDK for Java and morphia for MongoDB, both of them works perfect! SDK for AWS and MongoDB are all well designed and widely used. But if your programing language is not listed in official support list, you may need to evaluate the quality of the SDK carefully. I have ever used non-official Java SDK for AWS SimpleDB, it's also good. But I can still easily get defect, for example, when using Boolean in object persistence modal, the Java SDK for SimpleDB cannot handle this type and will introduce some bad result.

Reason 4: Use MongoDB if you may exceed the limits of DynamoDB.
Please be careful about the limits and read them carefully if you are evaluating DynamoDB. You may easy to exceed some of the limits. For example, the value you stored in an item(value of a key) cannot exceed 64k bytes. It's easy to exceed 64k bytes when you allow user to input content. User may input a 100k bytes text as article title just because of pasting it by mistake. There is also workaround. I divide the content to multiple keys if it exceed the limits, and aggregate to one key in the post processing stage after reading the data from DynamoDB server. For example, the content of an article database may exceed 64k bytes, then in the pre-processing stage when storing to DynamoDB, I divide it to article.content0, article.content1, article.content2 and so on. After reading from DynamoDB, I will check if keys article.content0 exists, and if article.content0 exists, then continue to check article.content1, and combine the value in these fields to article.content and remove the article.content0, article.content1, and so on. This will introduce the complexity of your code and introduce additional dependency to your code. MongoDB does not have these limitations.

Reason 5: Use MongoDB if you are going to have data type other than string, number, and base 64 encoded binary.
In addition to string, number, binary, and array, MongoDB supports date, boolean, and a MongoDB specified type "Object ID". I use mongoose.js, and it supports these data type well. When you define data structure for object mapping, you can specify the correct type. Date and Boolean are quite important types. With DynamoDB you can use number as alternative, but still, need additional logic in your code to handle them. With MongoDB you can get all these data types by nature.

Reason 6: Use MongoDB if you are going to query by regular expression.
RegEx query might be an edge case, but in case this happens in your situation. DynamoDB provided a way to query by checking if a string or binary start with some substring, and provided the "CONTAINS" and "NOT_CONTAINS" filter when you do "scan". But you know "scan" is quite slow. With MongoDB, you can query easily on any key or sub document with RegEx, for example, if you want to query by user's name for "John" or "john", you can query by a simple regular expression {"name" => qr/[Jj]ohn/}, while this cannot be completed in DynamoDB by 1 query.

Reason 7: Use MongoDB if you are a big funs of document database.
10gen is the company backing MongoDB. They are very active on community. I asked question on stackoverflow, and Dylan, a Solution Architect of MongoDB, actively follows up my question, helped me analyze the issue, looked for the cause, also gave some very good suggestions on MongoDB. This is really a very good experience. In addition, the MongoDB community are willing to listen to users. Amazon is big company, it's not easy to getting touch with the people inside, not to mention impacting their decision and roadmap.

Bonus Tips: Read carefully on DynamoDB document if you are going to use it.
For example, there is an API "batchWriteItem". This API may return no error but give a field with key "UnprocessedItems" in result. This is somewhat anti-pattern. When I invoke a call, the result could be either success or failed. But this API gives a different status: "partial correct". You need to manually re-submit those "UnprocessedItems" again until there is no item in it. I didn't notice this because it's never happens during the testing. However, when there are big traffic, and the count of request to DynamoDB exceeded your quote for several seconds, this may happen.

Hold on, before you made the decision on using MongoDB, please read 3 Reasons You Should Use DynamoDB over MongoDB.

3 Reasons You Should Use DynamoDB over MongoDB

Recently I post a blog to share my experience of migrating from MongoDB to DynamoDB. Migration is smooth, and here are a summary of 7 reason we did the migration:

Reason 1: Use DynamoDB if you are NOT going to have an employee to manage the database servers. 
This is the top 1 reason I migrate from MongoDB to DynamoDB. We are launching a startup, and we have a long list of user requirements from early adoption users. We want to make them satisfied. I need to develop the Windows/Mac OS/Ubuntu software and iPhone/Android apps, also need to work on server to provide data synchronization among these apps. Kelly is not a technically people and didn't have experience on managing servers. Someone may said that people can be a web developer with 21 days. However, that's really not easy for server troubleshooting. With only 15k users and 1.4 million records, I start to get into serious troubles. From the last post,  the more data I stored, the more worse the database latency. In future when I set sharding and replica set for shardings, I can imagine that database management may take a big portion of my time in future. With DynamoDB, you can totally avoid any database management stuff. AWS manages it very well. I've migrated the database for one week, everything works very well.

Reason 2: Use DynamoDB if you didn't have budget for dedicated database servers.
Because I didn't have too much traffic and data records. I used 2 linode VPS as database servers, 1G RAM, 24G disk. The 2 database server is grouped as replica set, and no sharding yet. Ideally they should support my current data scale very well. However it's not true. Upgrading database servers will take more cost, and may still not be able to resolve the issue. There are some managed MongoDB services, but I may not be able to stand for the cost. With current user base, the MongoDB database occupied 8G disk on data and 2G disk on journal file. With managed mongodb service, I need to select 25G plan and starting from US$500 monthly fee. If I got more traffic and users, it would cost too much. Before migration, I tested on DynamoDB, migrating all the data to DynamoDB, that is, 1.4 million records. The actually space is less than 300M. I'm not sure how managed mongodb service, I use command in mongo console to get the disk usage statistics. My first week of cost on DynamoDB is, US$0.05. That's the last week of July, let's see how much it will cost in August.

Reason 3: Use DynamoDB if you are going to integrate with other Amazon Web Services.
For example, full text index of the database. There are solutions for MongoDB, but you need to setup additional servers for indexing and search, and understand the system. The good thing is that MongoDB provided full text index, but I can imagine that full text index for multiple languages is not easy, especially the Chinese word segmentation. Amazon CloudSearch is a solution for DynamoDB full text index. Another example could be AWS Elastic MapReduce, it can be integrated with DynamoDB very easily. Also for database backup and restore, Amazon has other services to integrate with DynamoDB. In my opinion, as the major NoSQL database in Amazon Web Services, DynamoDB will have more and more features, and you can speed up development and reduce the cost of server management by integrating Amazon Web Services.

However, DynamoDB has it's shortcomings. Before you made the decision on using DynamoDB, please read 7 Reasons You Should Use MongoDB over DynamoDB.