The Twelve Factor App Review – part 1

There is a site I’ve seen around lately called The Twelve Factor App which lists 12 rules for building a web application.

The goals are noble: to build scalable, portable, easily deployable web applications, while being language and framework agnostic. Unfortunately I found several of the points to run against my experience and personal opinion, so I wanted to go over the 12 tenets they recommend and offer my thoughts on each.  Please keep in mind that much of my experience is on ATG, J2EE, and large complex applications.

1) Codebase

Overall this is pretty normal, but in some cases I don’t completely agree with the idea that shared code can only be handled through independently built-packaged libraries included via some sort of dependency manager.  I haven’t really used a good J2EE level dependency manager, perhaps Maven makes this easy?

2) Dependancies

I have several issues with this “factor”.  Firstly, Java doesn’t have a packaging system for distributing libraries and common app bundles.  Next, I’m not sure what a “dependency declaration manifest” actually DOES at the app level or what “dependency isolation tools” I should be using.  ATG has module dependencies defined within the MANIFEST.MF files which is great, but isn’t something you can use to handle extra-ear dependencies (JBoss, JDBC drivers, native apps, etc…).

Further they expect the app to provide/package ALL system tools it may need.  The examples they use are ImageMagick and curl.  This is crazy for many reasons: first, many of these tools are different on each platform, packaging/building/installing these tools on each different platform is a massive effort and not something easily bundled into your app, nor should your Java/Ruby/PHP developers have to deal with multi-platform C++ build issues, secondly most platforms have their own package installation and dependency management system (yum, apt-get, etc…) which ensure supported platform specific versions, which not only will work, but also may be required for support for Enterprise Linux distress for instance.

Also: where does it end?  It’s turtles all the way down.  ImageMagick on RHEL 5, installed via  yum, has 148 dependencies.  Curl has 310.  A J2EE application, in EAR form, may depend on: JBoss 5.1 EAP, JDK 1.6.0_27 (the correct version for the platform and 32 or 64 bit), JDK extended crypto policy files, Oracle JDBC driver jar file, various JBoss server config overrides, curl, NFS mounts, ImageMagick, mplayer, Apache front end configured with SSL certs, proxy configs, etc…  It’s insane to say all that needs to be handled within your EAR deployment mechanism.

3) Config

This starts off making sense: keep your configs separate from your code.  Don’t hardcode configs into your source code.  But then they say you should use environment variables to handle all of your configs….  They also say you shouldn’t “group” config values based on the environment (dev, stage, prod, etc…).  Both ATG and JBoss Seam have very sensible configuration setups with config files, environment level groupings for environment specific overrides, etc…  It works great.  Much better than trying to deal with env variables…

4) Backing Services

No argument here.  Nothing that shouldn’t already be common sense either.

5) Build, release, run

No argument here.  Nothing that shouldn’t already be common sense either.

6) Processes

Heading into the deep end again:)  They push for stateless, share-nothing processes that rely on a database (or similar service) for all persistent or shared data/state.  That’s okay for very simple or very low traffic applications, but in my experience more complex and/or higher traffic web apps benefit hugely from having sticky sessions with local RAM based session state.
A user’s session on a serious eCommerce application contains a lot of data with somewhat complex relationships: basic user profile data (username, name, gender), extended user profile data (shopping preferences, shipping address(es), shopping history), current cart data (which could be as simple as a list of SKUs, or could be full copies of SKU data to avoid catalog data changes from impacting cart items, with cloned pricing to avoid pricing changes impacting the cart, coupons, promotions, shipping calculations, shipping methods, tax lookup data, and more), session history data (as simple as breadcrumbs or as complex as browsing history data being used to calculate recommendations, promotions, cross-sell, and up-sell), and checkout flow state and related data.

They seem to recommend that at the end of every page request, this complex relationship of Objects (Java components or beans or whatever you use) should be serialized or mapped back to database backing tables (manually or via an ORM), persisted in a series of inserts and updates (the most expensive type of operations) to a remote database across the network, and those objects should be cleaned up/purged/garbage collected.  Then at the start of every page request, a cookie identifier should be used to identify all of that data again, probably with multiple separate sequential lookups, select it out of the database, parse it back into new Objects you’ve created again, and bound together based on their relationships, before you can service the request.

This is CPU, memory, and network intensive.  You’d be creating massive additional load on your database and dramatically increasing your app server CPU utilization and increasing the request response time.  It’s MUCH easier to build session state data once, keep it around for the life of the session, and just use intelligent sticky session routing on your load balancer/proxy.

It’s late, so I’ll address the last 6 tenets in another post!  Thanks for reading!

11 thoughts on “The Twelve Factor App Review – part 1

  1. We are currently using Maven to build and package a fairly big ATG application (more than 1500 Java source files). It works well now, but it took at least a couple of months for a very experienced engineer to get it to a state where it could reliably produce dependencies in ATG’s MANIFEST.MF files from Maven’s pom.xml files: all of this was done by creating a new Maven plugin and a new Maven archetype. Also, if you start using Maven, put a repository manager (such as Artifactory) in front of it to make your builds less dependent on Internet connectivity and repository servers all over the world.

    I agree with your observation about the app needing to embark all the dependencies, this just doesn’t apply to our world where we don’t control the deployment environment. However, I’ve recently started to package JBoss and ATG as RPM packages and being able to declare dependencies in the spec file helps a lot in achieving a reliable and repeatable installation.

    • Good to know about Maven. I’ve just never worked on any projects where people used it, so I don’t have any real hands-on experience with it. Do you find it’s easier to have Maven deal with the MANIFEST.MF files and other dependancies, versus just editing those files by hand when needed in SVN?

      I agree that if you’re able to manage an RPM repo, and you’re deployment environments are standardized around a specific Linux distro, then that can help immensely with setting up new environments. You can also use tools like puppet and chef in that arena. I just think that’s outside the core application developer’s area of responsibility.

      • I find dependency management to maven’s most useful feature. If I need to add a new dependency to a project, I can simply add the entry to the pom.xml and it is immediately make it available to everyone working on the project, the continuous integration server, without having to add binaries to the source repo. Like Sebastiano, we have a simple custom maven plugin that reads the dependencies from the pom and updates the ATG-Class-Path manifest entry accordingly

      • From a developer point of view, editing Maven’s pom.xml files is not much different from editing MANIFEST.MF files. Thankfully adding dependencies is still something that occurs in the vast majority of cases in the first months of a project and less and less as the project matures.

        • I guess my question, not having any hands-on Maven experience, is what does Maven do better than just a “normal” ANT/Jenkins/ATG Manifest file based build system that’s worth the time and effort it seems to have taken to get working smoothly?

          Also would either of you be able to release the Maven customizations or would you be interested in writing up a guest blog post or something covering how you use Maven for ATG apps, etc…? I know I’d love to learn more about it, and I’m sure many others would too!

          • Sorry for the belated reply.

            I have appreciated the convention that Maven imposes on a project by forcing the layout of the sources (src/main/java, etc.) and of the generated classes (target/classes, target/test-classes). I have also appreciated to be able to declare dependencies on jar files or on other ATG modules and having Maven generate Eclipse, IntelliJ, ATG manifest files and also call runAssembler with all the correct modules. The generated EAR can then be uploaded as a Maven artifact to a repository where other tools can make use of it.

            In my opinion the result was worth the effort, and we’re looking forward to extend this way of working to our future ATG projects. The only objection I have is that there’s far too much XML to write, and this is why I’m inclined to try something like buildr, which can interact with Maven repositories and seems less inflexible in enforcing its convention (though picking up buildr and Ruby at the same time is not an easy task).

            I don’t think unfortunately that we will be able to release the Maven customizations, but I’ll talk to the engineer who worked on them and I’ll see if he is available to write up something.

            • Thanks for the info! Would love to see some info or code released so we don’t have to go through the same learning curve from ground zero, if possible.

          • Not a ton of maven experience here, but I’d say that the biggest advantage is that maven in use feels very apt-flavored, in that “it just works” sort way.

            Of course, when it *doesn’t* “just work” it’s a pain and a half to sort out. This is compounded by the fact that maven is its own worst enemy, in that it enables/encourages you to get a lot done with fairly shallow knowledge. Therefore, when you get stuck, it’s hard to find deep maven expertise to get you unstuck.

            In more practical terms, maven seems to have two major things going for it. The first is a hate-it-or-love-it thing, the archetypes which dictate the project layout and components, much as the servlet spec dictates the webapp layout, except that this the layout for the entire development project.

            Let’s face it, the vast majority of the time most of these details are quibbles. Like logging or code conventions, the benefit of picking and sticking to a common approach usually vastly outweighs minor disagreements you may have with specific details of the chosen approach. However, as above, it “just works” great… until it doesn’t. Still, I’d give it at least an 8 out of 10 here.

            Second is the dependency management. Again, it feels very apt-like in that a lot of work is done for you automatically. It also reminds me a fair bit of ruby gems (though I’m much less experienced with ruby). It’s also somewhat ant/jar like, in that (unlike with apt or ruby gems) maven also helps with building and managing the packages for your own internal code.

            Let’s say you have a project that in turn relies on twenty other projects, half of which are jakarta projects and the like, and half are other internal projects.

            You add a dependency to a project by manually editing the pom.xml. This isn’t very apt-like, but it’s usually not that hard, although it can get finicky at times. My suspicion is that most of that finickiness is due to the “shallow” problem I mentioned above, and that the pom.xml really shouldn’t require too much work to get a clear, thorough understanding and avoid these issues in the future.

            Then you kick off a maven build. Maven checks your local repo (typically located in ~/.m2/repository) for cached packages, then it goes to the project’s home repo to fetch it. You might also have an organizational-level repo that your maven client will check.

            For the internal projects, your pom.xml has dependencies on these internal maven packages, so it fetches those if you have an organziational maven server, or builds them if you have the code checked out and in the right place.

  2. 1. Version control: Duh.
    2. Dependencies: There are tons of dependency management tools these days, Maven, Ivy, Gems, apt, gradle, etc. This should be a non-issue. If your project is big enough, it should *easily* pay for itself. That does not mean you won’t have issues. It should save you more time than it costs over the long haul. In a Maven project, do a ‘mvn dependency:tree’ to get an idea. It (maven) also allows people to use whatever main stream IDE they like via tools to import to IDEA or Eclipse or Netbeans config. Several companies I have worked for keep an internal Maven repo to share Java code across teams, team members, and projects. Makes sense if you need it.
    3. Config: I generally agree although having tons of environment variables seems ridiculous. I’ve had one that pointed to the configuration’s property files with modes, JNDI urls, SOAP urls, etc in the file.
    4. Backend services: No one wants a monolithic bunch of junk to deal with. Decouple presentation , business code, orm stuff, long running stuff, queued logic…. whatever.
    5. Separate Build and Run: Even in Perl, you can run a syntax check for the ‘compile’ part. Then make it run some sanity checks… I’ve done this with simple Makefiles. This seems like a no brainer.
    6. Processes: Add something to your basket on Amazon’s site and see them in your basket on their mobile app. Stateless => scalable. In the long run you save to storage not memory for very large (wide) scalability.
    7. Port Binding: Seems weird to me. You could bundle Tomcat or Jetty, but that is against most folks recommendations. It *is* the common platform rather than coding to many OS’s. Grails for example serves up HTTP but its only intended for development use. If you are using JNDI DataSources why would you do that? Deploying a WAR is superior in that example. This seems like saying use no middleware. Well to me, that’s silly. I don’t want to write all that. Maybe that’s just me though. Or I missed something. There is a point though – don’t let that stuff be a bottleneck.
    8. Concurrency: There’s goodness here but it doesn’t come through well in their description. Its really about modularizing component parts of an application and being able to add (duplicate running) components to the layers easily without changing code, just a couple of config files perhaps. Think scaling a layer horizontally. The problem is that all the stuff like adding middleware or home brew (Queues and Remote Invocations) add complexity and cost that may not be needed day 1. Should business logic call persist methods over the wire on day 1 when you have 10 users? Stuff like Spring Integration, for example, can help take you from a in app decoupling to multi process / multi part system.
    9. Disposability: This goes along with Stateless. A little overstated though. Don’t let an interruption destroy your data.
    10. Parity: The stuff stated is way over simplified. There is goodness here too…but again way to simple. Some pragmatism should be had. Get as much good as you can. Data can be an issue to. Think performance tuning on SQL queries for large data sets – 40 million row update in a table with 10′s of billions of rows. There is value, but not every desktop can hold the data set! Maybe you shouldn’t use SQL in that case…but that’s another thought.
    11. Logs: Seems cool to me, Unix style.
    12. Admin: This is weird. Why not build some of that into the app/system? Really make it nice. Protect it with role or something. Amazon web services seems like it was in house and super awesome. Then made commercially available. Make your app data drive from config in the database. Make your application have admin screens to manipulate that data. Lots of options, not just running script of sorts.

    All of these points have a place. Just like anything – When in Rome. Don’t code a web app for 20 million users like you were coding a departmental app in VB using Access. Use the Zen of the platform you are targeting. This is their Zen, but not the only Zen out there.

  3. Thanks heaps for posting this, I think discussion of 12 factor is much needed. I actually wish you’d written one post per factor, so we could go to town on each one in the comments. There are some that I don’t fully understand. Forgive me if the following is obvious.

    I don’t think a “dependency declaration manifest” does anything at the app level. It’s just the text file that declares your dependencies, am I right? So it’s only used when building/deploying, not at runtime.

    As for “dependency isolation tools”, in the python world means “virtualenv”, which let’s you install one version of a dependency for your app, while also having a different versions installed on the same machine for other apps.

  4. I think the 12-factor methodology is best understood as the set of constraints that Heroku decided to adopt in order to provide a scalable, polyglot platform.

    1. Codebase: Heroku uses a Git repository per app. But I think the rule of a repository per app, with shared code in libraries, is good anyway, just for the sake of avoiding tangled messes.

    2. Dependencies: See this blog post for the motivation: https://blog.heroku.com/archives/2011/6/28/the_new_heroku_4_erosion_resistance_explicit_contracts

    Basically, Heroku doesn’t want apps to have implicit assumptions about their host environments. More to the point, Heroku wants to be able to update the platform without inadvertently breaking apps. This is why this factor includes the rule about bundling third-party executables. I’m not sure what I think of this; bundling one’s own copies of tools like curl, rather than using the OS package manager, seems like a nightmare for ongoing maintenance, security updates, and the like.

    3. Config: Heroku needed a language-neutral way for add-ons, like their own Postgres service, to expose their configurations to the app. The Unix process environment kind of sucks for this, because it’s a flat key-value store where all of the values are strings. But I suppose it’s the best thing that met Heroku’s goals.

    6. Processes: I’m not convinced that RAM-based sessions are a good thing. What happens if you do a rolling restart of all the application processes while the user is using the app? Maybe this isn’t much of a concern for the enterprise apps that you work with, but for the web apps I work on, I want to be able to continuously deploy new stuff any time, day or night, while people are using the app. Stateless processes are the only way to do that, as far as I can tell.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>