A Good Day for DevOps at Notre Dame

Last week, several new processes and technologies were asked to sink or swim as the OIT and the Office of the Registrar brought two new Ruby on Rails applications to production.  I’m pleased to announce that due to a confluence of thorough planning, robust automatic deployment processes, and engaged collaboration across multiple OIT units, the apps are live and swimming, swimming, swimming.

Screen Shot 2014-07-01 at 11.42.45 PM

What do we have now that we didn’t have before?

  • two Rails apps in production
  • a Banner API poised to serve as a key data integration point for the future
  • an automated app deployment tool
  • a new workflow that empowers developers and speeds app deployment
  • puppet manifests to create consistency between developer VMs and deployment environments (satisfying point 10 of the Twelve-Factor App)

 

What else?

  • the experience to extend API services to other data sources and consumers
  • an architectural framework for future Rails app development
  • a boatload of fresh Ruby on Rails knowledge

 

Automation + Collaboration = Innovation.  Sound familiar?  These new practices and processes are enhancing our agility, velocity, and ability to deliver quality functionality to users.

Big Thanks

I have often observed that some of the most fulfilling times working in the OIT are on outage weekends.  Communication is quick and actions are decisive as disparate OIT teams come together, often in the same room, to bring new functionality to our campus constituents. That unity of purpose is the heart of DevOps, and I am pleased to say that I have seen it happen on a day-to-day basis recently. Let me highlight some of the people and teams who made this week a success, and who are laying the foundation for a bright future of ND application development.

Information Security

Jason Williams and his team were attentive and helpful in defining best practices for handling database and API credentials — something that is a little different in the new technology stack.  Not only that, but when we needed Webinspect scans done or firewall rules put in place quickly, Jason’s team was ready to jump in and take action to help us go live.

Database Administration

Fred Nwangana‘s team was involved from early on, helping shape how Rails applications would work in our environment.  Together we determined that this moment presents a great opportunity to decouple custom apps from the Banner database.  Vincent Melody in particular was a great help in provisioning database resources and helping drive forward our process standardization.

Change Control

Julie Stogsdill and Matt Pollard‘s contributions have been tremendous.  I came to them with Launchpad and a pretty clear agenda of putting TEST environment deployments in developers’ hands.  Rather than objecting to this idea, they helped me find ways to integrate the process into our change control system.  The new workflow is even more flexible than I had hoped, and has already allowed us to push important changes to production, via RFC, without a hint of dread that the process is too slow.

System Administration / Virtualization

I wrote puppet manifests to provision our servers, but I would have gotten nowhere in our local infrastructure without help from Chris Fruewirth‘s team.  Milind Saraph and Joseph Franco, plus John Pozivilko from the virtualization team, were a great help in creating hosts in VMWare, assigning IPs, updating systems, and answering lots of questions when my limited sysadmin knowledge hit a wall.  Plus, we are all going to be working toward increasing puppet infrastructure management in the future.  Good stuff ahead there!

Just the Beginning

People used to ask me how the new job was going.  There were so many things up in the air; how could I really give an answer?  So I’d say something like “ask me in six months.”  Well, now you can ask me any time, because the apps are live, the processes are working, and we are ready to take on new development challenges. There’s still more to tackle: expanding configuration management; exploring cloud infrastructure; implementing comprehensive monitoring.  But for now, I want to pause and say “thank you” to everyone who helped get us to this point.  Onward!

RFC workflow for Launchpad

Now that we are actually getting some Rails code to production, I have worked with the Change Control team and Change Advisory Board to incorporate Launchpad into the OIT change control process.  This process is similar to the old one, with some fantastic new features:

  • The developer (submitter) will get the BUILD TEST task
  • Upon receiving this task, the developer can deploy with Launchpad as many times as necessary (incrementing tags — see below).
  • Each TEST/PROD deploy generates a notification to change control.  They will be checking for an associated RFC!
    • A forthcoming change to Launchpad will include a field to give the RFC number, further reinforcing this
  • Change control will update the BUILD PROD task to use the latest deployed tag.  You may want to state this explicitly in the closure text, in addition to pasting the deploy history.

 

  • Rules:
    • Deploy tags only (more on git tagging)
      • Always include a tag message summarizing the changes
      • Tag convention:  v1.0.1, v3.2.21, v1.2.4a
        • First digit:  major releases.  Very rare, for large milestones in the project.  (Note not large BUNDLES of updates… we should be more iterative than ever now!)
        • Second digit: significant feature additions or enhancements
        • Third digit: minor additions, tweaks, or bug fixes.  This number can get high if necessary!
        • Letter:  optional, rare, only for hotfixes
    • DO NOT ALTER TAGS.  This new process allows you to iterate tag numbers in TEST.  It’s easy to make new ones.  DO IT!
    • Document deployments in Assyst.  Upon closing the task (when you’re ready for PROD), paste into Assyst a list of all your deployments
      • See <app_web_root>/version.txt, a file generated by Launchpad, to help with this.

Here’s a sample RFC:

Banner Web Services v1.0.2 
------------------------------------------------ 
v1.0.2 contains a new service to return a student's favorite color

Test 
Step 1 - [YOU, THE SUBMITTER] - api-internal-test.dc.nd.edu 
a. Use Launchpad (launchpad.dc.nd.edu) to deploy app "ndapi" to the TEST environment. 
    App: NDAPI
    Environment: test
    Task: Deploy:cold
    Tag: v1.0.2
    Do_Migration: True

Step 2 - [FUNCTIONAL_USER] - Test using attached testing spreadsheet. 

Step 3 - Webinspect 
[ATTACH WEBINSPECT PLAN]

Prod 
Step 1 - [Bruce Stump|Pete Bouris] - api-internal-prod.dc.nd.edu 
a. Use Launchpad (launchpad.dc.nd.edu) to deploy app "ndapi" to the PROD environment. 
    App: NDAPI
    Environment: production
    Task: Deploy:cold
    Tag: v1.0.2
    Do_Migration: True

Step 2 - [FUNCTIONAL_USER] - Test using attached testing spreadsheet.

Note that you must be specific in your Launchpad steps for the person running the prod deploy.  Soon, I will release command line tools / API endpoints for Launchpad that will make this less error-prone.

This is a great step forward, enabling developers to react quickly to issues that pop up during functional testing.  Thanks to the Change Control team and the CAB for their time, attention, and approval of this new process!

Launchpad: A Rails app deployment platform

Capistrano is a great tool for building scripts that execute on remote hosts.  While its functionality lends itself to many different applications, it’s a de facto standard for deploying Ruby on Rails apps.  A few months ago, I used it to automate app deployments and other tasks such as restarting server processes, and behold, it was very good.

I had provisioned each of the remote hosts using Puppet, so I knew that my machine configurations were good.  This meant that I could use the same capistrano scripts for multiple apps, as long as they used the same server stack and ran on one of these hosts.  In short, consistency enables automation.

However, these are a few issues with this approach.

  • Distribution of Credentials.  Capistrano needs a login to the remote host.  I can’t just give passwords or pem files to developers; our separation of responsibilities policy doesn’t allow it.
  • Proliferation of Cap Scripts.  I can’t hand over scripts to developers and expect them to stay the same.  I need to centralize these things and maintain one copy in one place.
  • Visibility.  I need these automated tools to work in tandem with our change control processes.  That means auditing and logging.
  • Access Control.  If I’m going to centralize, I need some way to say who can do what.

Enter Launchpad.

This is my solution: a web app that wraps all this functionality.  Launchpad has the following features:

  • A centralized repository of application data
    • git urls
    • deploy targets (dev, test, prod)
    • remote hosts
  • A UI for running capistrano tasks
  • Fine-grained access control per app/environment/task
  • Notification groups for deployment events (partially implemented)
  • Full audit trails of all actions taken in the system and the resulting output
  • Support for multiple stacks / capistrano scripts
  • JSON API (deploying soon)

 

Launchpad owns the remote host credentials, so users never have to see them.  As a result, I can give developers the ability to deploy outside of dev in a way that is safe, consistent, and thoroughly auditable.  My next blog post will outline the ways in which our Change Control team has worked to accommodate this new ability.

Right now, the only stack implemented in Launchpad is an NGINX/Unicorn stack for Rails apps, but there really is no limit to what we can deploy with this tool on top of capistrano.

Launchpad is available to internal OIT developers; see me for details.

Better, Faster, More Consistent

It wasn’t long ago that OIT wasted time and energy having DBAs manually execute SQL scripts created by developers.  Then, Sharif Nijim developed the “autodeploy” tool that allows us to run SQL scripts automatically from SVN tags.  Developers have a faster way to run SQL without imposing on DBAs, and DBAs have their valuable time freed up for more important work.  We have never looked back.  I’m hoping Launchpad will do the same with application deployments.  Onward!

Calling Oracle Stored Procedures from Ruby with ruby-plsql

Isn’t it nice when something just works?  We are building Ruby on Rails apps on top of Oracle, so we’re using the Oracle Enhanced ActiveRecord adapter on top of the ruby-oci8 driver library.

The ActiveRecord adapter gives us a nice AR wrapper around our existing Oracle schema, which is great, but what about when I want to work with stored procedures or functions?  Turns out the author of this adapter, Raimonds Simanovskis, has a gem just for this called ruby-plsql.

Include the gem in your Gemfile:

gem 'ruby-plsql'

Then, write an initializer that hooks it to your existing ActiveRecord connection (config/plsql.rb)

plsql.activerecord_class = ActiveRecord::Base

After that, calling procedure is easy.  Oracle return types are automatically cast to ruby types.  Oracle exceptions are raised as OciError, which contains a “code” and “sql” attribute.  However, you can call a “message” method on that exception to get the full error output.

Here I call an Oracle procedure, idcard.nd_is_valid_pin using the plsql object provided in the gem:

ok_pin = plsql.idcard.nd_is_valid_pin( new_pin )
  if ok_pin
    plsql.idcard.update_pin_pr( @info.ndid, params[:old_pin], pin )
  else
    raise Errors::InvalidInput
  end
rescue OCIError => e
 render json: { error: e.message }, status: :unprocessable_entity

That’s it!  Nice and easy, and “rsims” is two for two.

ActiveRecord PSA: nil vs RecordNotFound

I’ve been meaning to get back into blogging here, and I think I have been blocked by the fact that many of the posts in my mental backlog are somewhat large in scope.  So here’s a useful bit of ActiveRecord trivia that I just learned.

When no records are found, the “find” method (ie Person.find(‘badinput’)) throws ActiveRecord::RecordNotFound.

However, any find_by_* method, such as Person.find_by_netid(‘badinput’), will return nil.

This was rather confusing as I focused on the error handling semantics of the Banner API.  I want that exception.  Good news, though:

find_by_netid!(‘badinput’) throws the exception.  The bang changes the behavior, as it often does, though not always in the way you may expect.

TLDR:

2.0.0-p353 :001 > Person.find('badinput')
ActiveRecord::RecordNotFound: Couldn't find Person with ndid=badinput <snip>

2.0.0-p353 :002 > Person.find_by_netid('badinput')
 => nil 

2.0.0-p353 :003 > Person.find_by_netid!('badinput')
ActiveRecord::RecordNotFound: Couldn't find Person with netid = badinput <snip>

So that’s it.  Maybe this will get me back in the habit.  Happy coding!

Recent CITS Tech Session Material

A while back, Scott Kirner handed responsibility for the CITS (nee ES) Tech Sessions to me.  With all the technology changes happening in OIT right now, there are plenty of exciting topics to learn and discuss.  So far in 2014, we have had presentations on the following:

  1. Responsive Web Design
  2. Provisioning a Rails development environment with Vagrant
  3. Git / GitHub basics (thanks Peter Wells!)

Having been justly called out for not providing access to my presentation material, I will now play catch-up and share some slides!  Be aware that these decks only provide partial information; each meeting had a significant live demo component.  They probably need some tweaking and they definitely need context.  For my part, I have planned for weeks to write detailed blog posts on each topic (especially the second one, as I had hardly any time to discuss capistrano).  I seem to be writing a lot today, so maybe I’ll get to them soon.  It’s important to share this information broadly!

For now, try this: the CITS Tech Session resource folder.  Everything’s in there, but let me provide some details.

  1. Responsive Web Design slides
    1. Demo for this was pretty bare-bones.  I put it into GitHub.  Hopefully it makes some sense…
  2. Rails deployment in Vagrant
    1. the most underserved probably.  Lots of good info on vagrant, but not detailed enough on the puppet / capistrano part.
    2. Git repo for the vagrantfile that builds the rails/nginx/unicorn stack (+ oracle client)
    3. Git repo for the main manifest
      1. The modules used by this manifest are all downloaded in the shell provisioner of that vagrantfile, so you can see them there.  They’re all in  NDOIT public repos.
    4. Git repo for the CAS Rails test app — authenticates a user to CAS from the root page, then displays some CAS metadata.
    5. The Vagrantfile used to actually download and deploy that app automatically, but I have removed that step.
    6. This probably deserves three different blogs posts
    7. The puppet modules and CAS test app are extended from code started by Peter Wells!
  3. Rails deployment — not a CITS tech session, but it describes a progression on the work from #2, above.  I demoed a remote deployment to a totally fresh machine with  a “deploy” user and and /apps directory — much like we might do in production.
    1. This presentation was aimed at Ops staff, so I get into the stack a bit more.
    2. I also created an “autodeploy” script to wrap capistrano, to try to show one way in which our current RFC process could accommodate such deployment mechanisms.  I hope for something even more flexible in the future.

No slides from today, but my last two blog posts will provide some information about the GitHub part.  If you want to learn Git, the official site has some great documentation.  Here are Git Basics and Basic Branching and Merging.  Git’s easy branching is one of the most interesting and exciting parts of working with Git, and will be the foundation for multi-developer coding in the future.

As I have mentioned elsewhere, I know not everyone can make each session.  Blog posts will certainly help make the content accessible, but in addition, I am 100% open to doing recap sessions if there are enough people who want it!  Heck, I’ll even sit down with you one-on-one.  So please reach out to me.  The more we can share our combined knowledge, the better developers we’ll be.

Using SSH with GitHub for Fun and Profit

Please see my previous post about joining the NDOIT GitHub Organization.

You can easily clone a git repository using the https URL that appears in your browser when you visit it on GitHub.  However, you can also use an SSH key pair.  It takes a little setup, but you will want to do this for two reasons:

  1. It’s required to use git from the command line after enabling two-factor security
  2. It’s necessary for ssh agent forwarding, which lets you…
    1. use the remote deployment scripts I am developing in capistrano
    2. use ssh with github on vagrant (or any other machine you ssh to) without redoing these steps

So here’s what you want to do:

STEP 1: Follow the instructions on this blog to generate an SSH key  pair and register its public key with your GitHub account.  Note the platform selection tabs at the top of that page, and please be aware that these instructions work for Mac and Linux, but GitHub encourages Windows users to use the Windows native GUI app.

However, I am not recommending anyone proceed with Rails development on Github using Windows.  Many of you have seen the demos I’ve given on developing in Vagrant, and we’ve got Student Team developers building their new app on Linux VMs.  We want to develop as Unix natives!  I am happy to personally assist anyone who needs help making this transition.

STEP 2: Set up two-factor authentication on your GitHub account.  The easiest way to do this is to set up your smartphone with the Google Authenticator app, which will act as a keyfob for getting into GitHub.

STEP 3: Use SSH on the command line.  There are two ways to do this:

  1. Use the SSH URL when you first do your git clone
    1. Find the SSH URL as shown below, circled in green.
    2. do git clone SSH_URL and get on with your life.  You’ll never need the Google Authenticator.
    3. Screen Shot 2014-02-21 at 11.13.55 AM
  2. Modify your existing git checkout directory to use SSH
    1. Check your remotes by typing git remote -v
    2. You’ll see something like this:
      1. origin https://github.com/ndoit/muninn (fetch)
        origin https://github.com/ndoit/muninn (push)
    3. That means you have a remote site called “origin” which represents github.  This is the remote URL you use for push/pull.  We need to change it to use SSH!
    4. That’s easy.  Have the SSH URL handy as shown above.
    5. git remote -h   tells you the help details, but here’s what we’ll do:
      1. git remote set-url origin SSH_URL
      2. Where SSH_URL is your ssh URL, of course.
    6. push/ pull as normal!

SSH Forwarding and Vagrant

Another vital result of enabling SSH is that you can now perform SSH Agent Forwarding.  What do that mean??  Imagine the following scenario:

  1. You create an SSH keypair for use with GitHub as shown above, on your laptop
  2. You launch a vagrant VM for Rails development
  3. You try to git clone via SSH
  4. FORBIDDEN!

The problem is that the SSH key you registered with GitHub is on your laptop, but the VM is a whole other machine.  Fortunately, we can use SSH agent forwarding to use your laptop’s keys on a remote machine.

In Vagrant, this is a one-liner in the Vagrantfile:  config.ssh.forward_agent = true

Or use -A when using ssh from the command line:  ssh user@someplace -A

Now your keys travel with you, and ssh git@github.com will result in a personal greeting, rather than “Permission denied.”

Conclusion

If you’re using GitHub, you need to do all of this.  I can help you.  When you’re done, you’ll be more secure and generally be more attractive to your friends and colleagues.  Plus, you’ll be able to do remote deployments, which is a very good topic for my next blog post. See you next time!

GitHub Organization: NDOIT

In advance of today’s CITS Tech Session on Git and Github, I wanted to make OIT readers aware that we have created a GitHub Organization.  GitHub organizations are like regular GitHub accounts, but you can assign individual GitHub users to it and manage their privileges on repos in the org.  Ours is named NDOIT, an you can find it at https://github.com/ndoit. Many thanks to Chris Frederick for pushing to get this set up, and for Scott Kirner and Todd Hill for finding the funding!  Here are a few important points:

  • How to join this organization
    • If you don’t already have one, create your own github account. For this purpose, my recommendation is to create an account under your nd.edu email address.
    • Provide that account name to Chris or me, and we can add you.
  • How to use the org
    • Look for this drop-down to appear on the left-hand side of your main github landing page after you log in:
    • Screen Shot 2014-02-04 at 10.44.44 AM
    • Change the “account context” to NDOIT, and you’ll see all our shared repos.
  • Public vs private
    • GitHub’s philosophy and pricing model both favor public, open-source repositories.  As such, we have a very limited number of private repos.
    • Because private repositories are scarce, please do not create any without first getting approval.  We have not yet defined a formal process for this, so please talk to me (Brandon Rich).  New Rails apps will get preference in this respect.
  • What sorts of things can be public?
    • Here is another area where I’m afraid the formal process is not nailed down.  Please discuss with me if you think you have a repo that can be public.  As long as there is nothing private or confidential, we can probably make it work.
    • Examples thus far have been puppet repos, rails demo apps, and the BI portal, which went through the Technology Transfer office.
  • What about SVN?
    • SVN is still an appropriate tool for many things:
      • Anything sensitive / non-public that will not go into a private github repo
      • Anything that uses autodeploy

That’s it for this topic.  Please see the follow-up, Using SSH with GitHub for Fun and Profit.

10 Things a Modern IT Professional Should Do

OM3.29.11 MCOB Study

You work in IT. You’re in higher education. It’s 2014. Get a move on!

1. Back up your files.

When your machine crashes, you’ll get no sympathy. The whole world knows you should back up your computer. There’s no excuse. Go get CrashPlan or some other way to make sure your stuff isn’t lost. No, don’t go on to number two until you’ve finished this. It’s that important.

2. Use a password manager.

Most people only use a couple of passwords frequently enough to remember them. Go get PasswordSafe (pwSafe on the Mac) and start generating random passwords for your accounts. Don’t bother remembering them, just store ’em in your password manager and copy/paste when you need them. Secure this password manager with one hard password (the only one you have to remember).

3. Use a collaboration tool to share files rather than emailing them.

Email attachments fill up your mailbox and affect performance. Now think about how that might affect your recipients. Just use a tool like Box and send a share link. Much easier and you can control it later if you need to.

4. Learn some basic scripting, even if it’s just an Excel formula.

You don’t need to be a programmer to do some basic string manipulation or some basic conditionals (if-then). Excel or Google Spreadsheets are incredibly powerful tools in their own right.

5. Find a reliable IT news source relevant to your area (news site, mailing list, blog, twitter).

Make it part of your daily habit to check your RSS (try feedly.com) or Twitter feeds. Seek out and follow a couple of good news sites, tech journalists, or smart people who keep tabs on IT stuff. Let these people be your filter and don’t be in the dark about what’s going on in your industry.

6. Build relationships with people who can help you improve, especially from other departments or organizations.

We learn from each other – by listening, by teaching, or by doing. Join campus communities (e.g., Developer Meetups or Mobile Community of Practice), higher ed user groups, Educause constituent groups, industry consortiums, etc. If it doesn’t exist, create one.

7. Contribute back to the community – an open source project, present at a conference, write a blog post, etc.

You’re standing on the shoulders of giants, and with luck your successors may just stand on yours. We benefit from so many other generous contributions and we’re fortunate to work in higher education, where our counterparts are willing to swap stories and strategies. Be a part of the larger community.

8. Listen to your customers.

Work is happier when your customers are happier. What can you do that will make their lives better? How do you know without talking to them? Get out there and watch them use your software, have them brainstorm ideas, or just listen to them complain. Just knowing that they’re being heard will make your customers happier.

9. Figure out your smartphone.

There’s nothing sillier than an IT person who is computer illiterate, and that counts for your smartphone as well. It’s a magical supercomputer in your pocket. It probably cost you a bunch of money and it does way more than phone, email, and Angry Birds. You can tie in with your VoIP phone, access files via Box, edit Google Docs, submit expense reports, and other wizardry.

10. Break stuff.

Our world is full of settings screens, drop-down menus, config files, old hardware, and cheap hosting. Sadly, too many people are afraid of what might go wrong and they never discover the latest features, time-saving methods, or opportunities to innovate. Try out the nightly build (in dev, maybe). Ask vendors about loaning you some demo hardware. Challenge yourself on whether there’s another way to get it done. You’re in IT. Don’t be afraid to break something – it can be fixed. And who knows? You might just learn a thing or two.

Our Finest Hour

Winston-Churchill4

Last Friday (1/24/2014) saw our campus experience multiple service disruptions.  Both were service provider issues – one with Box, the other with Google.  In both cases, the services were resolved in a timely manner.  Both the Box and Google status pages were kept up to date.

What was required from OIT to resolve these interruptions?  Communication and vendor management.  We did not have to burn hours and hours of people time digging in to determine the root cause of these issues.  We did not have to bring in pizza and soda, flip charts and markers, bagels and coffee, and more pizza and more soda to fuel ourselves to deal with this issue.  We did not work late and long into the night, identifying and resolving the root cause.

Why?  Because we have vendors we trust, employing more engineers than we have humans in all of OIT, singularly focused on their product.

I was part of the team that worked through our Exchange issues last fall.  I will be the first to tell you that the week we spent getting to operational stability was a phenomenal team effort.  We had an engineer from Microsoft and all of the considerable talent OIT could bring to bear to solve our issues.  And solve them we did.  Yes, we worked late into the night.  Yes, we fueled ourselves with grease and caffeine.  Yes, we used giant post-its and all the typical crisis management tools.

Everyone on the team did not get much sleep, did not spend the typical evenings with their families, and displayed their remarkable devotion to their profession and to our University.  Everyone on the team was elated when we solved a gnarly set of technical issues and got to operational stability.  And everyone on the team did not relive that experience this past Friday.  They were able to focus on their core mission.

We have moved all the way up the stack of abstraction in the space offered by both Box and Google.  It is something to celebrate, as it allows us to relentlessly focus on delivering value to the students, faculty, and staff at our Lady’s University.