AWS assume role with Fog

Amazon Web Services (AWS) provides users with a wealth of services and a suite of ways in which to keep those services secure. Unfortunately, software libraries are not always able to keep up with all the new features available. While wiring up our Rails application to deposit files into an AWS S3 bucket we ran into such a problem. AWS has developed an extensive role-based security structure; and as part of their Ruby AWS SDK they allow a person, or role, to assume another role which may have completely different access.

money-patch-croppedOur troubles came in the fact that the interface provided by the fog-aws gem does not currently have any way to tell it to assume a different role. It does provide a way to use a role your server may already have been assigned, and a nice mechanism to re-negotiate your credentials when they are about to expire. So what do we do when something doesn’t quite work the way we want? Monkey patch!

Below is a link to a Gist showing how we were able to leverage the things fog-aws did the way we wanted, and overwrite the one thing we needed to be done differently.

https://gist.github.com/peterwells/39a5c31d934fa8eb0f2c

 

Lessons in Rails: Email is slow

Now that my first Rails application is in production, there are many things I wish I would have know at the onset. Not the least of these is how slow (to send) and tedious (to implement) email messages are within the framework. Snail Mail Firstly, the creation and maintenance of mailers is just tiresome. Having multiple views for each message; not to mention keeping the text up to date. Second, being new to pretty much all things web based, I initially inserted all of the sending of emails into the main request/response cycle of the controllers. Needless to say, once we began load testing we started to see terrible performance. A lot of timeouts, and ugly blank screens coming back. Upon investigation, we determined that the emails were the culprits. Being a functional requirement, and therefore unable to be done away with, what were we to do?

We decided to do the same thing you do with anything that is slow and bothersome: we made it someone else’s problem. To address the maintenance issue we created one mailer to rule them all, and pushed all aspects of all the messages into a database table. This, combined with a nice variable substitution object, allows the end users to maintain the emails their system is sending out. This simple move made everyone on both sides extremely happy (and happy customers make happy bosses).

Next, we farmed the sending of the messages off to a queue background job. Since we are using a Postgres database, we decided on the Que gem. The setup for this was super easy, and can even be configured to work with your testing framework so as not to invalidate that wonderful suite you have going. Implementation was straight forward and we instantly saw improved performance.

I plan to use this paradigm for every application I have in the future, and I hope this helps another new RoR developer avoid learning this lesson the hard way.

Enthusiasm – it’s contagious

I just interviewed a guy today for a position as a Ruby on Rails developer. This would be his first software development position, having previously worked as a customer service rep for an insurance company. His background includes a degree in film, so not your typical coder back story. He was a confident guy, well-spoken and quite personable. While speaking with him I could feel how excited he was at the prospect of continuing to work with the platform. But it wasn’t just what he would be doing, but also where he would be doing it. He complimented the campus (something you hear a lot about Notre Dame), was even happy about the color in the trees, but he stated that he couldn’t wait to help work on things that would influence change. To him working in a higher education setting, having the ability to impact the students and make a positive change for the world as a whole was quite invigorating.

I found myself coming out of that interview with a new energy for my position. I think too often we start to take for granted where we are in life, and forget to really appreciate what we are doing and the influence we can actually have in the world. I have a really cool career, get the opportunity to work with some really great people, and no matter how long the walk is in to work everyday the view is always incredible. So as we look to the future in our careers, trying always to get somewhere better, remember to take a moment and remember what made you so excited when you first started down the path you are on.

Why test? Simplicity. Security.

One of the biggest complaints I’ve heard about Test Driven Development is that developers don’t feel like they have the time to create the additional code in the tests. I completely understand the feeling. Lets face it, as developers we hardly ever hear from our customers “Take all the time you need.” It always seems like we either need to have the wanted functionality finished yesterday, or we have so many projects going at once that work feels more like a juggling act; doing just enough on each one to keep the wolves at bay.

But I would argue that this time crunch is exactly the reason we need to be writing tests, and letting these tests drive our solutions. A recent example demonstrates how testing creates simpler, better defined code you can feel better about.

A month, or so, back I was tasked with adding a new member class to a new Ruby on Rails application I am working on. This required adding fields to forms, adding entries to some underlying tables, and substantial changes to the object we had backing the forms. All pretty routine stuff. Because I am fairly new to Rails, and I wasn’t completely sure on what behavior I wanted the application to have, I just started writing the application code; figuring I would go back and cover with testing once finished. Adding the needed fields and database entries was pretty straight forward, taking very little time. I then spent the better part of the day putting the needed functionality into the form-backing object (namely adding and updating the new member class).

I started by copying similar behavior already in place for a different type of user in the system. Seems like a logical step, right? Should be pretty safe and straight forward. It wasn’t long before I started getting very uneasy about the code I was writing. There were more than a few conditionals, fragments of code in multiple places, and, most importantly, I wasn’t sure it was even doing the thing I needed to have done! This is a very scary way to feel. I quickly realized this was not going to work and backed out every change I had made to this object, choosing instead to start fresh the next day with some fresh eyes and a partner to do some pair-programming with.

The next morning we started out fresh, being sure not to write a line of application code without first having a test to cover what was being done. In true paired-programming/TDD fashion we alternated writing test and application code. By the time we were done, we had written fewer lines of application code than I had previously, had a nice little suite of tests, and, most importantly, we were confident the application now did the thing we needed it to do! So not only did we achieve better code through testing, but we did it in less time than I had spent by myself doing it the wrong way.

So remember: you may not have a lot of time to get the work done, but you always have a choice in how you do the work. And in the long run, better written code backed by tests will help you sleep better at night.

bear-sleeping-on-rug

A Public Rails App From Scratch in a Single Command

This weekend, I tweeted the public URL of an AWS instance.  Like most instances during this experimentation phase, it was not meant to live forever, so I’ll reproduce it here:

_________________________________________________________________________________________________________
devops_intensifies

This rails application was deployed to AWS with a single command. What happens when it runs? 

  1. A shell script passes a CloudFormation template containing the instance and security group definitions to Boto.
  2. Boto kicks off the stack creation on AWS.
  3. A CloudInit script in the instance definition bootstraps puppet and git, then downloads repos from Github.
  4. Puppet provisions rvm, ruby, and rails.
  5. Finally, CloudInit runs Webrick as a daemon.

To do: 

  1. Reduce CloudInit script to merely bootstrap Puppet.
  2. Let Puppet Master and Capistrano handle instance and app provisioning, respectively.
  3. Get better feedback on errors that may occur during this process.
  4. Introduce an Elastic Load Balancer and set autoscale to 2.
  5. Get a VPN tunnel to campus and access ND data
  6. Work toward automatic app redeployments triggered by git pushes.

Onward! 

Brandon

_________________________________________________________________________________________________________

A single command, eh? It looks a bit like this:

./run_cloudformation.py us-east-1 MyTestStack cf_template.json brich tagfile

Alright!  So let’s dig into exactly how it works.  All code can be found here in SVN.  The puppet scripts and app are in my Github account, which I’ll link as necessary.

It starts with CloudFormation, as described in my post on that topic.  The following template creates a security group, the instance, and a public IP.  This template is called rails_instance_no_wait.json.  That’s because the Cloud Init tutorial has you create this “wait handle” that prevents the CF console from showing “complete” until the provisioning part is done.  I’m doing so much in this step that I removed the wait handle to prevent a timeout.  As I mention below, this step could/should be much more streamlined, so later we should be able to reintroduce this.

{
  "AWSTemplateFormatVersion" : "2010-09-09",

  "Description" : "Creates security groups, an Amazon Linux instance, and a public IP for that instance.",

  "Resources" : {

    "SGOpenRailsToWorld" : {
      "Type" : "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Rails web server access from SA-VPN",
        "VpcId" : "vpc-1f47507d",
        "SecurityGroupIngress" : [ { 
        	"IpProtocol" : "tcp", 
  		"CidrIp" : "0.0.0.0/0",
		"FromPort" : "3000", 
  		"ToPort" : "3000"
    	} ]
      }
    },

    "BrandonTestInstance" : {
        "Type" : "AWS::EC2::Instance",
        "Properties" : {
            "ImageId" : "ami-83e4bcea",
            "InstanceType" : "t1.micro",
            "KeyName" : "brich_test_key",
            "SecurityGroupIds" : [ 
                { "Ref" : "SGWebTrafficInFromCampus" },
                { "Ref" : "SGSSHInFromSAVPN" },
                { "Ref" : "SGOpenRailsToWorld" }
            ],
            "SubnetId" : "subnet-4a73423e",
            "Tags" : [
              {"Key" : "Name", "Value" : "Brandon Test Instance" },
              {"Key" : "Application", "Value" : { "Ref" : "AWS::StackId"} },
              {"Key" : "Network", "Value" : "Private" }
            ],
            "UserData" : 
            { "Fn::Base64" : { "Fn::Join" : ["",[
            "#!/bin/bash -ex","\n",
            "yum -y update","\n",
            "yum -y install puppet","\n",
            "yum -y install subversion","\n",
            "yum -y install git","\n",
            "git clone https://github.com/catapultsoftworks/puppet-rvm.git /usr/share/puppet/modules/rvm","\n",
            "git clone https://github.com/catapultsoftworks/websvc-puppet.git /tmp/websvc-puppet","\n",
            "puppet apply /tmp/websvc-puppet/rvm.pp","\n",
            "source /usr/local/rvm/scripts/rvm","\n",
            "git clone https://github.com/catapultsoftworks/cap-test.git /home/ec2-user/cap-test","\n",
            "cd /home/ec2-user/cap-test","\n",
            "bundle install","\n",
            "rails s -d","\n"
 ]]}}
       }
    },

    "PublicIPForTestInstance" : {
        "Type" : "AWS::EC2::EIP",
        "Properties" : {
            "InstanceId" : { "Ref" : "BrandonTestInstance" },
            "Domain" : "vpc"
        }
    }

  },

"Outputs" : {
    "BrandonPublicIPAddress" : {
        "Value" : { "Ref" : "PublicIPForTestInstance" }
    },

    "BrandonInstanceId" : {
        "Value" : { "Ref" : "BrandonTestInstance" }
    }
}

}

So we start with a security group.  This opens port 3000 (the default Rails port) to the world.  It could just have easily been opened to the campus IP range, the ES-VPN, or something else.  You’ll note that I am making reference to an already-existing VPC.  This is one of those governance things: VPCs and subnets are relatively permanent constructs, so we will just have to use IDs for static architecture like that.  Note that I have altered the ID for publication.

Please also notice that I have omitted an SSH security group!  Look ma, no hands!

    "SGOpenRailsToWorld" : {
      "Type" : "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Rails web server access from SA-VPN",
        "VpcId" : "vpc-1f37ab7d",
        "SecurityGroupIngress" : [ { 
        	"IpProtocol" : "tcp", 
  		"CidrIp" : "0.0.0.0/0",
		"FromPort" : "3000", 
  		"ToPort" : "3000"
    	} ]
      }
    },

Next up is the instance itself.  Its parameters are pretty straightforward:

  • the ID of the base image we want to use: in this case a 64-bit Amazon Linux box.
  • the image sizing: t1.micro, one of the least powerful (and therefore cheapest!) instance types
  • the subnet which will house the instance (again obscured).
  • the instance key (previously generated and stored on my machine as a pem file).
    • note that we will never use this key in this demo! we can’t — no ssh access!
  • tags for the instance: metadata like who created the thing.  Cloudformation will also add some tags to every resource in the stack.
  • user data.  This is the “Cloud Init” part, which I will describe in more detail, below.

Bootstrapping with Cloud Init (User Data)

Much of what I’m doing with Cloud Init comes from this Amazon documentation. There is a more powerful version of this called cfn-init, as demonstrated here, but in my opinion it’s overkill.  Cfn-init looks like it’s trying to be Puppet-lite, but that’s why we have actual Puppet.  The “user data” approach is basically just a shell script, and though I have just under a dozen lines here, ideally you’d have under five: just enough to bootstrap the instance and let better automation tools handle the rest.  This also lets the instance resource JSON be more reusable and less tied to the app you’ll deploy on it.  Anyway, here it is:

"UserData" : 
            { "Fn::Base64" : { "Fn::Join" : ["",[
            "#!/bin/bash -ex","\n",
            "yum -y update","\n",
            "yum -y install puppet","\n",
            "yum -y install git","\n",
            "git clone https://github.com/catapultsoftworks/puppet-rvm.git /usr/share/puppet/modules/rvm","\n",
            "git clone https://github.com/catapultsoftworks/websvc-puppet.git /tmp/websvc-puppet","\n",
            "puppet apply /tmp/websvc-puppet/rvm.pp","\n",
            "source /usr/local/rvm/scripts/rvm","\n",
            "git clone https://github.com/catapultsoftworks/cap-test.git /home/ec2-user/cap-test","\n",
            "cd /home/ec2-user/cap-test","\n",
            "bundle install","\n",
            "rails s -d","\n"
 ]]}}

So you can see what’s happening here.  We use yum to update itself, then install puppet and git.  I first clone a git repo that installs rvm.

A note about Amazon Linux version numbering.

Why did I fork this manifest into my own account?  Well, there is a dependency in there for a curl library, which apparently changed names on centos as some point.  So there is conditional code in the original manifest that chooses which name to use based on version number. Unfortunately, even though Amazon Linux is rightly recognized as a centos variant, this part fails, because Amazon uses their own version numbering.  I fixed it, but without being sure how the authors would want to handle this, I avoided a pull request and submitted an issue to them.  We’ll see.

A note about github

I went to github for two reasons:

  1. It’s public.  our SVN server is locked down to campus, and without a VPN tunnel, I can’t create a security rule to get there
  2. I could easily fork that repo.

Let’s add these to the pile of good reasons to use Github Organizations.

More Puppet: Set up Rails

Anyway, with the rvm module installed, we use another puppet manifest to invoke / install it with puppet apply, the standalone client-side version of puppet. It installs my ruby/rails versions of choice: 1.93 / 2.3.14.  I also set up bundler and sqlite (so that I can run a default rails app).

The application

The next git clone downloads the application.  It’s a very simple app with one root route. It’s called cap-test because the next thing I want to do is deploy it with capistrano. The only thing to note here is that the Gemfile contains the gem “therubyracer,” a javascript runtime.  I could have satisfied this requirement by installing nodejs, but it looks like a bit of pain since there’s no yum repo for that.  This was the simplest solution.

Starting the server

No magic here… I just let the root user that’s running the provisioning also start up the server.  It’s running on port 3000, which is already open to the world, so it’s now publicly available.

That public IP

The CloudFormation template also creates an “elastic IP” and assigns it to the instance.  This is just a public IP generated in AWS’s space.  Not sure why it has to be labeled “elastic.”

    "PublicIPForTestInstance" : {
        "Type" : "AWS::EC2::EIP",
        "Properties" : {
            "InstanceId" : { "Ref" : "BrandonTestInstance" },
            "Domain" : "vpc"
        }
    }

You’ll also notice the output section of the CF template includes this IP reference.  This causes the IP to show up in the CloudFormation console under “output” and should be something I can return from the command line.  Speaking of which…

Oh yeah, that boto script

So this thing doesn’t just run itself.  You can run it manually through the CloudFormation GUI (nah), use the AWS CLI tools, or use Boto.  Here’s the usage on that script:

/Users/brich/devops/boto> ./run_cloudformation.py -h
usage: run_cloudformation.py [-h]
                             region stackName stackFileName creator tagFile

Run a cloudformation script. Make sure your script contains a detailed description! This script does not currently accept stack input params.

Creator: Brandon Rich Last Updated: 5 Dec 2013

positional arguments:
  region         e.g. us-east-1
  stackName      e.g. TeamName_AppName_TEST (must be unique)
  stackFileName  JSON file for your stack. Make sure it validates!
  creator        Your netID. Will be attached to stack as "creator" tag.
  tagFile        optional. additional tags for the stack. newline delimited
                 key-value pairs separated by colons. ie team:sfis
                 functional:registrar

optional arguments:
  -h, --help     show this help message and exit

You can see what arguments it takes.  The tags file is optional, but it will always put your netid down as a “creator” tag.  Some key items are not yet implemented:

  • input arguments.  supported by CloudFormation, these would let us re-use templates for multiple apps.  critical for things like passing in datasource passwords to the environment (to keep them out of source control)
  • event tracking (feedback for status changes as the stack builds)
  • json / aws validation of the template

Anyway, here’s the implementation.

# read tagfile
lines = [line.strip() for line in open(args.tagFile)]

print "tags: " 
print "(creator: " + args.creator + ")"

tagdict = { "creator" : args.creator }
for line in lines:
    keyval = line.split(':')
    key = keyval[0]
    val = keyval[1]
    tagdict[key] = val
    print "(" + key + ": " + val + ")"

# aws cloudformation validate-template --template-body file://path-to-file.json
#result = conn.validate_template( template_body=args.stackFileName, template_url=None )
# example of bad validation (aside from invalid json): referencing a security group that doesn't exist in the file

with open (args.stackFileName, "r") as stackfile:
    json=stackfile.read().replace('\n', '')
#print json

try:
    stackID = conn.create_stack( 
		   stack_name=args.stackName, 
		   template_body=json, 
		   template_url="", 
		   parameters="",
		   notification_arns="",
		   disable_rollback=False,
		   timeout_in_minutes=10,
		   capabilities=None,
		   tags=tagdict
		) 
    events = conn.describe_stack_events( stackID, None )
    print events
except Exception,e:
    print str(e)

That’s it.  Here’s the script for OIT SVN users.  Again, run it like so:

./run_cloudformation.py us-east-1 MyTestStack cf_template.json brich tagfile

So this took a while to write up in a blog post, but there’s not much magic here.  It’s just connecting pieces together.  Hopefully, as I tackle those outstanding items from the page replica above, we’ll start to see some impressive levels of automation!

Onward!