So you've determined that deploying your application in the Amazon cloud environment provides you with substantial advantages over traditional deployment approaches.
First off - you've made a good decision - scalability is easy, cost is flexible, based on what you consume. In addition, you no longer need to pay people to maintain the hardware you once used. Now it's time to get down to brass tacks and move your application to the cloud... It seems so easy but there are some important challenges to address. This post will attempt to help you with a few of the application profiles and challenges we see commonly from our customers.
What do you really need?
To answer that question, you might ask yourself the following:
- What is my operating environment? (Ubuntu Linux 10.04, Windows 2008 Server, RedHat EL 6, ...)
- Where does my application data live? (Local storage, single backend database, NoSQL data store, cloud data service)
- How do we handle local content? (We don't - it's all static files, admins upload graphic content to the local filesystem, etc.)
- What are our high availability requirements? (Load balanced frontend with clustered backend data stores)
- How do we deploy our code? (git checkout, svn checkout, standard build deploy process, standard application deploy (ie. Joomla, Wordpress, etc.))
Local Content
One of the more challenging questions is "How do we handle local content?". Many standard applications like Joomla or Wordpress as well as many custom built applications, store your written content in a backend database but store graphic content and other static elements in files on the local filesystem. The servers hosting this content are expected to be dynamically provisioned so you will need a solution to get all of this content to the correct servers. Some of the possible solutions include:
- Configure EBS volumes for local storage and replicate between volumes with rsync (or deltasync under Windows). This can work really well if you have a primary server where content updates are made. From there you use a dynamic configuration file to replicate your content to the "child" servers from that single point.
- Mount Amazon S3 volumes in your operating environment such as windows shares or nfs mounts. There can be performance challenges with this approach concerning high data volumes but it is a very suitable solution for many applications: including things that behave similar to Joomla or Wordpress.
Database Storage
In concept, this is a really simple problem. I have a database. I need it replicated and always available. Unfortunately, the concept is where the simplicity stops unless you are able to use Amazon's SimpleDB for your backend database needs. Few packaged products suppport this approach out of the box yet. If you are writing your own application and the limited feature set of SimpleDB meets your needs then you're in luck and you should skip to the next section. Otherwise, keep reading.
Many of the NoSQL databases like Mongo, Cassandra, Tokyo Tyrant, Kyoto Tycoon, Redis, etc. support native data replication. If you are using one or more of those, you probably have a pretty technical team. You will need to setup replication between two EC2 nodes (or more if you shared the data set). If the connection system for your chosen NoSQL store handles failover and connection location (as is the case with Cassandra and Mongo), you should have a relatively easy time aside from disk i/o. On the other hand, if you are using something like Tokyo or Kyoto, you need to deal with failover connection from master to slave yourself. One approach for this is to setup an elastic ip and script (moving of that ip when the master mode becomes unreachable) from at least 2 of your app servers. Another approach is to make you application deal with redundant database connections. Doing this requires two pieces to work well:
- The application should be connection aware and resilient (ie. reconnect on connection loss).
- The database driver you are using will need to be overridden or enhanced to take a primary and secondary database in the configuration. It's pretty easy to put in place with a simple code shim in front of the driver.
This second approach is more work and potentially an operational problem. What happens when the primary fails, the secondary takes over and the primary comes back but it's corrupt? In my experience, this has not been the most uncommon scenario.
Before we move on to relational data stores (the easy topic),with a word of caution with respect to Tokyo Tyrant. DO NOT use the b-tree data store. I need to write a separate post on our experiences with Tokyo Tyrant to really explain that but to keep it short, we experienced: a) AMAZING Performance, b) Massive Operational INSTABILITY and CORRUPTION at scale. The hash database seems to be much more reliable. Performance of Kyoto Tyrant seems to be a bit better but we have not used it in production. If your data set is predictable, it is important that you figure out the proper settings for apow, fpow and ncnum.
Above I mentioned disk I/O as a potential problem you may experience. The best approach is to apply either RAID-0 or RAID-1/10 to a bunch of EBS stores attached to each of your DB servers. Split the core data store and transaction logging storage onto different raid sets. Availability on the EBS stores is pretty good but not much better than individual disks so if you choose to use RAID-0, you really need to trust your replication, failover and failback schemes. I would not use RAID-0 for a RDS like PostgreSQL.
On the relational database topic, if you are using Oracle or MySQL, using Amazon's pre-packaged highly available RDS solution is the way to go unless you have some special needs which are not covered by those solutions. If you are using SQL Server, PostgreSQL, etc. you can setup an active/passive clustering using EBS and an Elastic IP. You will really want to use a combination of RAID-1 and RAID-10 for all of your EBS volumes. It's probably a good idea to put another safety net in place by configuring a HOT Standby utilizing log shipping on a separate EC2 instance with RAIDed EBS in a different availability zone.
Operating Environment - Choosing an AMI
This section is coming soon.
High Availability
Front end load balancing for high availability is made easy with Amazon's Elastic Load Balancing (ELB). You can simply configure ELB to direct the appropriate application traffic to your application servers either within a single availability zone or multiple availability zones. For this to work as you expect, it is crucial to configure good application health checks. A simple http check is usually not sufficient. You should configure a check that really validates that your application is working.
The database section covered several of the nuances to database high availability. If your application happens to have other service components which need high availability, you can probably use the mechanisms outlined in the database section to achieve high availability.
Code Deployment
If you are using a packaged app, you really do not need to worry about this topic. Otherwise, it is something often overlooked and can make your use of any type of infrastructure easier with a little thought ahead of time. Unfortunately, this topic can get religious for a lot of organizations so please take my suggestions with a grain of salt and realize that there is no one size fits all solution.
The first question I always ask is - are your deploying compiled code (Java, C++, Erlang, etc.) or interpreted code (like ruby, php, perl, python, javascript, etc.) or both?
If you environment runs exclusively or partially from compiled software, you undoubtedly have some soft of reproducible build process which results in something compilable. We have found that the single easiest way to handle this scenario for deployment is to use a build system like Hudson which can be made to compile and build your software in an automated manner, track the release/build and push it to S3. Once it's in the S3 environment, you can setup a key that all of your app servers have for authentication and then pull the release packages directly from S3 and deploy them using whatever is most convenient for your OS like debian packages or rpms. This makes it easy for you to track consistency within the compiled code base.
If you also use or exclusively use interpred code like ruby or php, the single easiest approach we have found is to use git. The production environment then has checkout of your release branch (master usually) and then your updates are just a 'git pull' and graceful restart of the app server. Since there are always dependencies, using something like chef to automate the update of dependencies can be a huge time saver but has a significant upfront cost you may or may not be willing to incur. If not, just having a standard dependency update script as part of your source repository will make management of software versions a lot easier. You can do all of the same stuff with subversion or other version control systems but it's not quite as easy.
Closing Thoughts
This post started off as a writeup of deployment approaches for different types of common application architectures in the Amazon Web Services environment. As I started to write it, it became clear that I was starting in the middle with that approach and thus decided to start at the beginning with an overview of the challenges. The next post in this series will be about common deployment scenarios for LAMP based applications and will get into depth on a few of the more challenging details we have encountered. There is another post I started a while ago and will finish soon regarding the trial and tribulations of using Tokyo Tyant in a high volume production environment. It's a good data store for some things as long as you use it correctly.
Have fun and please contact me if you have questions on how to get your application going in the Amazon environment to meet your business needs.
-Jerry Champlin, CEO, Absolute Performance










A detailed podcast discussion from the CEO of Absolute Performance, Jerry Champlin on how a well designed approach to APM can enable you to proactively avoid performance issues and optimize end-user performance.


