Monday, July 29, 2013

LEAN7: Migrate from MongoDB to AWS DynamoDB + SimpleDB

Migrate from MongoDB to DynamoDB + SimpleDB: New Server Side Architecture Ready for More Users

Recently we have 14,000 registered users, a small portion of them are paid users. I feel that TeamViz is recognized for more and more sales (even still a very small number) generated every month. However, I start to get trouble on our server architecture mentioned in this post. The issue is, the MongoDB backed database getting locked for unknown reason for several minutes every 2 hours. Initially, the all request will be hold for 2 minutes every 2 hours 7 minutes. Now it becomes more worse, all request will be hold for 7 minutes every 2 hours and 7 minutes. I asked this question on stackoverflow, but no answer yet. So I can either increase the capacity of servers, or shift to another database server. We are small, and I can try different solutions.

Because all the connection will be hold for several minutes, so the connection on load balancer looks like this way. (At the beginning I though the server are attacked, but no one will attack a sever every 2 hours 7 minutes, and for 1 month, right ^_^ )


So here are several possible solutions. Use another NoSQL database, or use managed NoSQL database. My first decision is to looking for other NoSQL database servers, I have read comparison of NoSQL solutionsthis link about NoSQL benchmark, and this link about couchbase. Every NoSQL database has some pros and cons.

I then talked with Kelly about the cost of server, cost of managed service, possibility of shifting to other NoSQL providers, or even shifting to MySQL. The conclusion is, current issue on MongoDB is just a start, we may take more time on managing databases and resolve performance, or some unknown issues. This will cost much energy. However, our focus is to providing better product. There are a lot of fun on playing NoSQL and other cutting edge technology. But that's not our goal. Shifting to managed database service can help us focusing on providing features/fix issues on product itself. At least we have a long list of features and issues to resolve. So we shifted to Amazon AWS DynamoDB, and to reduce the cost, part of the data on AWS SimpleDB. The server side is almost rewrote to handle the database change. I take this chance to practiced Promise pattern on node.js. It works great! and leveraged the middleware technology provided by Express framework. In addition, hold data of DynamoDB and SimpleDB in memcache. Everything has worked great for 24 hours (except that I got some error logs on memcache).

Here are the picture after 10 hours of migration. The huge periodically traffic disappeared.

Here are the new architecture on database and sync server.

You may have concern about accessing AWS from Linode, currently it's fine. We have more than 1.3 million items in one DynamoDB table, and response from DynamoDB to get one record by key is 25 ~ 45 ms from Linode network. SimpleDB has less than 20k items, and also 25 ~ 45 ms.

Some notes about the new architecture:
- Why Linode: much cheaper than AWS EC2.
- Why AWS DynamoDB and SimpleDB: don't want to worry about managing database.
- memcached suppose to work independently, we use CouchBase because they provided automatic clustering.
- Still, the design goal is to scale out. Every machine is independent. We can add more sync server and memcached server independently.
- Future plan: currently we still need a message queue, AWS SQS does not provide a way for post event to multiple subscribers simultaneously. RabbitMQ can make it. But message queue is not urgent so far.
- Future blog: I will share more experience on using SimpleDB and DynamoDB.

Sunday, July 14, 2013

LEAN6: 3 Reasons Not to Do an Unnecessary SDK Upgrade

3 Reasons Not to Do an Unnecessary SDK Upgrade

I used ExtJS to build my productivity tool TeamViz. Recently ExtJS release 4.2.1 while I'm still using 4.1.1a. After checked the release notes of 4.2.1, I'm excited to see some fixes and performance improvement. So I decided to make an upgrade. I read the upgrade guide from 4.1 to 4.2, and estimated that it should be completed within 1 hour. However, actually I spent 2 days on it. Here I share more details about the items happened in this upgrade.

  • Dependency Tools. My project is generated using Sencha Cmd, it can help generate an initial framework based on Ext JS so you can start your work quickly. Firstly I replaced the library with ExtJS 4.2, and it works well. But when I use sencha cmd to compile the project. Errors happened. Some changes happened in ExtJS 4.2 framework, just replacing the JS/CSS/Resource files does not work. Sencha Cmd rely on some auto-generated config file. So I decided to upgrade Sencha Cmd from 3.0 to 3.1 also. Also generated project again using command sencha -sdk ~/ext-4.2.1.883 generate app TeamViz ./TeamViz, and then replacing files based on the generated sample project. Later when I compile on Ubuntu 32Bit and 64Bit machine, and Windows, I also need to upgrade toolset for Sencha Cmd.
  • Fixes or Regressions. Every time when a new version of apps/sdk released, there must be some regressions or fixes. After the upgrade, I got some issues on mouse enter/leave events. My instant tools on items are broken. It works in a normal case, but broken on some special scenarios. After dig into the code of Ext JS 4.2, I found it's a regression of Ext JS 4.2, and make some workaround to resolve it. The workaround could be technical debt for future release, but it's the most efficient way to resolve it so far.
  • Undocumented API. When I implemented my complicated drag & drop in my app, I used undocumented api, actually injected some code in the drag & drop process of Ext JS. When I upgraded to ExtJS 4.2, the hacked part has been changed. I need to do a full test to find it, then to resolve it. I think there might some other potential issues but not find so far.
Actually the upgrade is not necessary, there is no bug report directly related the SDK, and the existing version works very well. For a startup, that everyday is important, it's may not be necessary comparing the risk and benefit of upgrading.

Wednesday, July 10, 2013

SDK to Sync Tasks: Dropbox vs Evernote vs Google Apps Tasks vs Jira

Today Dropbox published a blog post for their new Datastore API, the amazing feature is offline support. I have ever investigated other popular tasks API providers, and want to share some quick summary. I didn't discuss outlook/skydrive/calender staff, and would be focusing company who intent to be service providers.

1. Introduction to Providers


  • Dropbox: Datastore API in Beta, well designed and elegant API for Tasks.
  • Evernote: Evernote does not provide a really SDK or functionality for tasks, but personally I want to make evernote a tasks/project management tool. You can attach your own data to ever note, this would be enough for client tools to filter the note marked as tasks, and category them. The API Documetation here
  • Google: Google Apps Tasks API. Google have provided the tasks API for several years, and there are some tools, chrome plugins.
  • Jira: The enterprise project management tool. They also provided REST API. Jira provided best-in-class feature set.

2. Features, Pros, Cons

  • Dropbox
    • Features: 
      • Provided data store API to handle Table/Record. The data store API is the API to handle generic remote key-value database. You can easily build your task management tool based on it.
      • Support offline temporarily. The SDK works when your apps go offline temporarily, with all its data locally. Accordingly, it provided a way to sync data, and resolve conflicts.
      • SDK: Provided SDK in JavaScript for Web, and iOS/Android SDK.
    • Pros:
      • Flexibility: Because the API is to handle generic NoSQL database remotely, it has enough flexibility for app developers to add their own fields, and store what they need.
      • Temporarily Offline Support: this is essential for mobile apps because they can easily be offline. I can imagine that the Dropbox API would improve the user experience greatly on mobile devices.
      • SDK in JavaScript, iOS, Android can bootstrap the integration quickly.
      • Potentially when you need larger storage for content/attachment of a task, Dropbox would be the best candidate.
    • Cons:
      • It's still in Beta, so not enough support on Search/Filter on server side. So when you have a big data set, it would be a problem in current release. However, I can expect that Dropbox will improve it very quickly!
  • Evernote:
    • Features:
      • Evernote does not provide a way to direct create tasks and projects. It provided SDK to create and manage notes. Notes can contain rich format of text, images, and other resources. You can categories them by Notebooks, or Tags. Application Data can be attached to notes, so you can manage status/estimations/priorities with application data for a note. A task management model for Evernote can be:
        • Put all tasks notes in a special notebooks
        • Use Tags/Parent Tags to build hierarchy of projects
      • SDK: Objective-C, Java, PHP, Ruby, Python, Perl, C#, C++, ActionScript
    • Pros:
      • All your data can be visible in Evernote Client Tools from Web, Windows, Mac, iOS and Android. The official Evernote apps has very high quality.
      • You an do search on server, and leverage great Evernote features like OCR. This is unique comparing with all the other providers.
    • Cons:
      • Even you can add some tasks/checkbox in a note, but that's not a direct way to manage them.
      • Evernote is designed for notes, you need some workaround to make it works as task management tool.
  • Google Apps Tasks API:
    • Features:
    • Pros:
      • Better for integration with other google apps.
      • Simple but complete feature for task management.
    • Cons:
      • No way to extend. For example, if I want to add estimation for a tasks, then there is no tasks properties supported, and there is no flexibility to add customized fields.
  • Jira
    • Features:
      • Jira is already an ENTERPRISE task management tool for team planning and project tracking.
      • SDK: Rest API
    • Pros:
      • Really feature rich, and generally you can get everything done on web.
      • You can deploy Jira Server to your private cloud or internal networks.

3. Summary of Unique Features

  • Dropbox: Allow temporarily offline, and handled sync/conflict resolve well inside SDK, developers don't need to worry about it. Also provided the best flexibility on apps design.
  • Evernote: Rich format for contents in a note, and provided powerful search capability.
  • Google Apps Tasks: Compete API dedicated for simple tasks management.
  • Jira: Provided a way to deploy server to your internal network.

Finally let's back to TeamViz, my task management tool. The goal is to support completely offline work. User can use it as a standalone tool, and also can sync with other desktop apps and mobile apps. None of the modal above can meet my goal, the most close one is the what Dropbox released today, datastore API. But it supports only temporarily offline, you still need to be online to access data.