I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I’m now in a position where my meager virtual server hosted by MediaTemple isn’t cutting it at all, and it’s really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I’d take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I’ve spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what’s the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I’d like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.
So many questions – all of them good though.
In terms of scaling, you’ve a few options.
The first is to start with a single box. You can scale upwards – with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you’ll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier – I haven’t used it personally.
Something else to be aware of – EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You’ll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.
First step is to separate concerns. I’d split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I’d monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I’d also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
- Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
- Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don’t extend far beyond
mod_rewriteand simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It’s worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
- web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You’d probably only have one of these, and it would probably implement some sort of connection to Amazon’s EBS, or persistent storage solution, as mentioned by dcaunt.
- app: An application server with Apache and PHP. You’d probably have many of these, and they’d get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
- db: A database server with MySQL. Again, you’d probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
- memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.