Sunday, May 26, 2013

LEAN5: Does user really need data synchronization

Does user really need data synchronization across multiple devices?

As mentioned in this post, I started a new version of PomodoroApp and finally supported data synchronization across multiple devices. The initial idea is, PomodoroApp is a cross platform application, and in future it will be on both iPhone and Android. I spent a lot of efforts on supporting data sync, and spent money on servers. Let's see the result:

After the data sync is available, the visits to my site is still getting at the stable speed(the red line with a slope). The new version with data synchronization looks does not have any impact to the slope. So even users coming in with a stable increasing speed, I hope more people will be interested on buy the license. The page "pricing-plan" was set as goal page, and you can see the conversion rate. The conversion rate to the page "pricing-plan" keeps around 20%, no change when this new feature introduced.

So, on desktop side, data synchronization may not be a "must-have" feature. I think it "should-have", actually it's "nice-to-have". This is somewhat anti-lean-startup. Actually "data sync" is not what user requested mostly, and user never mentioned mobile support before. What they write to me most, is still on user experience side.

The good thing is, more and more customers requested to make it available on mobile devices. The mobile app still not released yet. Let's see what will happen once data sync across mobile and desktop is available.

LEAN4: 3 Lessons Learned on Creating Cross-Platform App

3 Lessons Learned on Creating Cross-Platform App

As what happens on last year been summarized in this post, I'll continue to share what I did recently. I have the ambition to support 5 platforms, Windows/Mac OS X/Ubuntu/iPhone/Android. It's not as smooth as what I expected, and here are some lessons learned.

  • Don't use web technology to make native applications on desktop unless you have no choice. My first version of PomodoroApp is built on Qt with C++. Not including the time in school, I have 10 years of experience of using C++ on commercial software. Starting from 2002, I worked as part-time programmer to work on backend services for GIS applications, and now enjoy the new features of C++ 11. Native applications are really good at platform, and C++ is easier for me to do optimization. However, because of the increasing of web technologies. I decided to use ExtJS and TideSDK for new version, the simple initial idea is to share code for PomodoroApp on mobile devices because I can use Appcelerator Titanium or PhoneGap to reduce the efforts of making apps on iPhone and Android, and sencha provided Sencha Touch for mobile and I can share most of the code with ExtJS. Ideally, I can share 80% of the code on Windows/Mac OS X/Ubuntu/iOS/Android, the only thing I need to revise for different platform is UI. However, here are a list of issues that I have to resolve:
    • Limitations. Desktop and Mobile devices have specific limitations. For example, SQL API in TideSDK is synchronous API, while when it comes to mobile, SQL API in web page is actually HTML 5 sqlite API, it has 5M bytes limitation. More worse, it provided only asynchronous API, so the logic of the code will be different with desktop. Making it clear on limitations is important because features may not be able to achieve.
    • Performance. In somewhat level it depends on the libraries selected. Web technology may looks good at beginning, but I'm sure it will get really sluggish when more and more components added on one page. In addition, for ExtJS, the one page application, it's really easy for memory leak.
    • Dependencies. C++ can access platform features, while JavaScript cannot. What your application can achieve depends on how the cross platform SDK provided. For example, in 2.x of PomodoroApp, I updated the application icon on dock bar/task bar. However, it's impossible to achieve unless you can add new API to TideSDK.
  • Don't mess up the library of your core business logic with platform guards. It's acceptable when I have to handle different logic in my library to have something like if(platform is Windows){...}. However, I didn't imagine the complexity at the very beginning until I started porting on mobile devices. When there are 5 platforms need to be supported. The guards section in code is really what bugs come from. Especially in JavaScript because there is no compiling time verification.
  • Triple your estimation when porting from desktop to mobile. Even you have lot's of reusable code. There may be lots of issues never happened on desktop side. For example, Apple App Store review rejected my app several times and I have to resolve all the issues they proposed. Generally I need 5 working days to get review result.

Saturday, May 18, 2013

LEAN3: Updates After 1 Year

Updates After 1 Year

There has been about 1 year since my last blog about The Lean Startup in April 2012. What happens during this year?
  • My majority responsibility changed from desktop software side to cloud services. It is perfect to start to working on new technologies. The bad thing is, I'm really busy and exhausted, and din't have time to take care of PomodoroApp. I spent a lot of time learning the new technologies, programming languages, existing design for the system on the new division. Everything on PomodoroApp has been stopped for 8 months until the last Christmas.
  • My baby was born. Kelly and I are very happy to have our first baby. Kelly spent a lot of time on taking care of the baby, and didn't have enough time on marketing/customer services/ux design.
  • The traffic of the website of PomodoroApp is increasing every month with a very stable speed. You can see the traffic report by google analytics below:

  • I'm getting more and more familiar with JavaScript and cloud technologies. I'm now confident with cloud technologies.
By the end of 2012, I can have 2 weeks vacation. I decided to leverage the 2 full weeks for a brand new version of PomodoroApp, with following majority changes:
  • Programming Language and Libraries: The version 1.x and 2.x are built based on Qt, a C++ library. I used the old but mature UI technology, CSS style, to create beautiful UI controls. Qt and C++ are powerful and fast on every platform. I also write some code with Objective-C on Mac and COM interface on Windows to handle some OS specific features, e.g. dockbar icon. With the recently experience, I'll shift to TideSDK and ExtJS. With TideSDK I can create beautiful and unique desktop apps using web technologies. ExtJS is a web application library with a lot of samples, but may need more time to learn. Other libraries like jQuery UI has been considered also, but not has necessary features in place for me.
  • Data Synchronization Service: because PomodoroApp is cross platform, currently on Windows/Mac/Linux, and will support iPhone/Android also. There is no reason not to synchronize data across devices. I use node.js + MongoDB as backend, and host servers on Linode, Digital Ocean, and Windows Azure. That's really strange infrastructue architecture. The puporse is to reduce the cost, and try servers from different providers.
  • Mobile Platform: Appcelerator Titanium is really fantastic for cross platform mobile development. Again, need some time to learn at the beginning.
So with ideas on every aspect settled down, I start to work with full of enery. Because I didn't always have this kind of long vacation for my own projects. I started the first release of PomodoroApp at the end of 2011. The main reason is that I have 2 weeks vacation, and I need to take the advantage of the long vacation. I have to admit that I also made some bad decisions when building the idea. I've listed some topics, and will have some blog posts to share my experience and lessons learned in future. So, what's the stuff about "Lean Startup"? This is a summary of what happens in the past year. I'll start a new blog post about the detail of this new version.

Sunday, May 12, 2013

A Story of "Design for Failure"

When we come to the era of cloud computing, what's the most important factor you can imaging for the cloud computing? You may think of scaling. It could be, scaling is very important when your business getting bigger and bigger. You may think of backup, it always should be. You may also think of programable computing resources. That's a really important concept from AWS. Machine is programable, you can programmatically add or delete a  machine within seconds, instead of purchasing from vendor and deploy it to data center. You can allocate a new reliable database, without dependency on operations team. However, as a startup, my business is starting from scratch, and I do everything myself. In my practice, "Design for Failure" is really the top priority at the very beginning.

As AWS providing EC2, and other vendors providing VPS, it would be a common sense to use VPS instead of building your own data center when you are not so big. Scaling is not so important because I'm still very small, limited machines are enough to support scale of current users. But I do designed for scaling in future. Design for failure? Yes, I have considered, but not so seriously. My VPS provider, Linode, claimed a 99.95% availability, and Linode has very good reputation in this industry. I trust them.

Some background around my online service. I released a new version of desktop application PomodoroApp at the end of 2012, and support data synchronization across computers. User will rely on my server to sync data. It's yet another a new service on Internet, no one knows it. I'm not sure tomorrow it will be only 1 new users or 1,000 new users. Although I designed a reliable and scalable server architecture, I applied a minimum viable architecture for servers in order to reduce the cost. Perhaps nobody will use the service in next week. 2 web servers, one to host my website, and another to host a node.js server for data synchronization. It provide only rest services, I'll call it sync server. 1 MongoDB database server instance. Each one can be a single point of failure. It's acceptable if I have 99.95% availability. My sync server is in a very low load, so I configured the sync server to be the secondary of MongoDB replica set. The server code also support accessing data from replica set.

Everything ran very well in the coming 2 months. I keep improving the server, adding new features. Users came to use my service from google, blog, Facebook, twitter, and increased with a stable speed. When I have new code, just need 1 seconds to restart service. February 17th, 2013, for an unknown reason, database server is out of service. Nobody knows the reason, Linode technical support managed to fix the issues. When database server was down, the secondary database on sync server became primary, and all data read/write switched to database on my sync server automatically, this may take 1 minute, depending on the timeout settings. So the outage of the database server has no impact to my sync service. 

However, I'm just lucky for the incident of Feb 17. Just 3 days later, my sync server is down, and I even cannot restart the server from Linode managed console.  This took 55 minutes. I got alerts from monitor service pingdom, also from customers report. This is the first lesson. So the single point of failure does happen. I decided to add more sync servers. Consequently, a load balance server is necessary for 2 sync servers. In addition, I added the 3rd replica set which has 1 hour delay from primary server. In case there are any data broken, I can recover it from the backup server. You may ask why 1 hour delay instead of 24 hours. Ideally there should be multiple delayed replica set servers. In my production environment, user count is still small, and there is no necessary for sharding so far. But my new features, or my changes to existing code is only tested on dev environment. When I deployed it to server, it may make damage to server. I need a backup plan for this case. Even there are still SPOF, it 's much better:)

The real disaster happened in May 11, I am going to deploy new version which resolved some issues on database. The new version handled index creation on database. I use a web based admin tool to manage my MongoDB instances. When I connect production database for final release testing, I happened to found a duplicated index on the collection. I'm not sure why this happen, so I deleted one on admin tool. The tool reported that 2 indexes are both deleted. Later when I continue my testing and try to sync data to server. I got the error that failed to commit to database. This never happens before. Then I use MongoDB console to check the collection. What made me surprising is, the whole collection is lost, neither to be created again. I shutdown the MongoDB server, and then try to restart it. Failed! The database log indicates "exception: BSONObj size: 0 (0x00000000) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO". Googling the exception does not help much. Oh my, finally I have to recover the database. Fortunately I have a replica set which have realtime mirror for the database, and another replica set which has 1 hour delay for the database. I spent about 2 hours on fixing the issue, but my sync service is still online and functioned well. Because I have "stepDown" my primary and the secondary is now work as primary. Doing these troubleshooting does not hurt my online service. MongoDB really did an excellent job on the replica set pattern.

Initially I decided to recover the database from the replica set which has 1 hour delay. But it's in another datacenter, I use scp to copy data file, only 1.7M bytes/seconds, I have 9G bytes data in total. That would spent a long time for copying. Then I checked the new primary database, fortunately found that the new primary(the old secondary) is in good shape, the data file does not broken. Then I stopped the primary database, and spent about 2 minutes to copy all the files with a 29M bytes file transfer speed within the same datacenter. Again, it's still a very small business. 2 minutes outage is acceptable, because my client software support offline mode, it has local database, and can work at the place without Internet. When the network is available, it will sync to server. Some users even disabled the sync feature because they don't what to upload any data to server. After all files are copied, I restart MongoDB. It took several seconds to recovery the uncommitted data from oplog, and try to duplicate from the primary server. Everything works well now. MongoDB rocks!

Even I have the ultimate backup plan designed and tested on my client software, it still make me tense very much. Actually my  backup plan is, if the whole database is lost, I can still recover all the data. My client software supports offline mode, it duplicated all the data for the user. Automatic data recovery from user's machine to server has already been there. 

This story is the first real disaster for me so far. I respect VPS provider Linode, and respect to software companies who provided linux server, node.js, MongoDB. But it's really a must to keep the "design for failure" the top priority even you are very small. The hardware may be outage, the software may have bugs, the IO or the memory may be corruption. Hackers may need your server. People may say, the only thing that never change is change. My lesson is, the only thing that never failure is failure. Without these lessons, "Design For Failure" would never have so tremendous impact for my future design.