Now I’ll tackle the last 6 factors:
7) Port binding
Again, throwing Java and PHP under the bus, they tie back to “factor” number six where everything is stateless. They also ban PID files, daemon processes, etc… What they’re proposing is so far outside anything I’ve done or seen anyone do in a real world app I don’t even know how to respond.
I’m not sure how spinning up large numbers of processes on the same hardware works with the idea of having to vend all your own network servers. How do you handle port binding, port conflict, and port discovery across a cluster?
This ties back to “factor”s six and eight, stateless standalone processes which can spawn and die quickly without any impact to the overall system. This forces you to avoid caching, anything complex, and so on. I’d much rather have big powerful instances with caches, stateful data, a mature container, solid network systems, and all that.
Building a system around stateless job queues and assuming that instances can and will die often enough to warrant building a system around that seems like an odd place to spend your effort. Big JBoss instances are VERY stable. They run and they run and they run for months at a time without issue. There’s no need to build complex queues and job management system, relying heavily on your external systems (like your database, message queues, redis, etc…) to handle all your state because you’re scared to do it yourself. I’m not sure why relying on MySQL or RabbitMQ for long running state management is any better than just having your app do it. Let parsing, network overhead, and re-work.
10) Dev/prod parity
This “factor” really covers two separate points: Firstly it pushes frequent production deployments every few hours, made by the code developers (not an ops team). Secondly it requires that developer instance use identical backing services for everything as production, use the same operating system, etc…
The first point is problematic to me on a few fronts. The frequency of releases implies no real QA process, beyond perhaps some automated unit tests. While waterfall-esque QA isn’t a solve-all approach, it DOES generally reduce the number of issues that go into production. I understand the agile approach makes this trade-off happily against the increased ease of getting fixes deployed out quickly. However for large players production bugs, production downtime, production impact are VERY costly. Many large eCommerce sites will lose hundreds of thousands of dollars per hour of downtime (downtime being defined here as anything that prevents users from easily placing orders – not just a site availability outage). Many sites are used for other critical system or contain large amounts of our privacy information. Minor code bugs can have catastrophic impacts if they are pushed to production without testing. Developers are awful testers. I say this as a developer. We think about changes based on the intended change and may test that successfully. But we won’t test the rest of the entire site recession suite, and hence we miss unintended impacts of our code change. We also tend not to have the meticulous attention to detail that a good QA person does. A developer unit testing a change is VERY different from a good QA person testing the same change or performing a full regression test suite.
It’s also not PCI compliant to have code developers pushing code changes into production (the idea is that it’s too easy for a malicious developer to push a malicious change out to prod without anyone knowing).
The second point I agree with, to a degree, but not entirely. I run RHEL 5 in production, but am not going to force any developers to run that as their laptop OS as it’s a pretty terrible desktop system. OS X is my preferred OS and works pretty well for building things for deploying on Linux systems, including RHEL. I can agree with wanting things to be as similar as is easily and legally possible. If you can run the same DB software locally as is used in production, great! But you probably can’t use full production data locally (either due to size, privacy concerns or other security factors) therefore there will always be differences. Some external systems you can tie into the same as production, others you’ll need to tie into testing/staging versions which differ from their production counterparts more often than you might expect (PayPal is a great example of this). Others may not be accessible at all, and will need to be ignored or stubbed out for local work.
They say that all apps should just write to stdout in an unbuffered stream for any and all logging. Can you imagine if all the apps you install did that? If redis server just dumped to stdout instead of writing it’s own log file? Ugh. What a mess. There’s also often very reasons that different sets of log output should be handled, directed, and stored differently. I don’t want to put the onus on everyone who runs the app (developers or otherwise) to properly configure something like syslog-ng and ensure the app’s output is directed there, etc… Way overly complex. Also unbuffered writes to stdout carries with it a massive CPU (and occasionally IO) performance penalty. High volume logs, especially in production, especially under heavy load or when things are going wrong, can generate a TON of lines per second. Buffering these in memory in the app and writing in batches to file is much more efficient than unbuffered dumping to stdout with the assumption that stdout is being re-routed through another system entirely (syslog-ng, maybe a remote spunk server, maybe other things), parsed, and then written out to a file or db somewhere else.
12) Admin processes
Nothing really to say here.
Despite the initial claim that these rules apply to any programming language I think it’s pretty clearly not the case. These rules are a terrible fit for PHP, Java (J2EE, Seam, ATG), and probably many other languages and frameworks. Maybe they work great with RoR, I don’t know.
Some of the points here are sound. Others I think are misguided even if you’re not using a language or framework that pushes you away from the given rule by its very nature. Some of them fall down on things like PCI compliance, performance, or other real world aspects of running a high performance, high availability, complex production web app.
Not a bad read, but certainly not even close to guidelines I’d recommend anyone take to heart.