Feature Image

Positive Trajectory

Bob and I did a brief update on where the OIT is in terms of cloud strategy and the progress we have made with our IaaS provider of choice, Amazon Web Services.  It was essentially a highly summarized, non-technical version of the presentation we gave in D.C. earlier this year.

The most interesting thing that came out of the update was a follow-on conversation with one of the non-technical users on campus.  This person was not aware of the implications of moving Conductor and its associated sites to AWS.  We explained that her departmental website is a Conductor site, and as such, she is directly benefiting from the improved performance and availability.

It was in the context of this conversation that the beauty of what we are doing struck me.  Functionally, end users should not even realize that the infrastructure running their web presence or line of business applications has transitioned to AWS.  If well-executed, these users should continue to be oblivious to the changes being made, on their behalf and in their best interest.

The fact this person had no realization that anything changed is a testament to our success to date, because as stated previously, the performance essentially doubled for roughly half the cost.  All in the best interest of the University, allowing us to focus our efforts on initiatives which are core to the institutional mission.

Very, very excited about what is to come.

Onward!

Why test? Simplicity. Security.

One of the biggest complaints I’ve heard about Test Driven Development is that developers don’t feel like they have the time to create the additional code in the tests. I completely understand the feeling. Lets face it, as developers we hardly ever hear from our customers “Take all the time you need.” It always seems like we either need to have the wanted functionality finished yesterday, or we have so many projects going at once that work feels more like a juggling act; doing just enough on each one to keep the wolves at bay.

But I would argue that this time crunch is exactly the reason we need to be writing tests, and letting these tests drive our solutions. A recent example demonstrates how testing creates simpler, better defined code you can feel better about.

A month, or so, back I was tasked with adding a new member class to a new Ruby on Rails application I am working on. This required adding fields to forms, adding entries to some underlying tables, and substantial changes to the object we had backing the forms. All pretty routine stuff. Because I am fairly new to Rails, and I wasn’t completely sure on what behavior I wanted the application to have, I just started writing the application code; figuring I would go back and cover with testing once finished. Adding the needed fields and database entries was pretty straight forward, taking very little time. I then spent the better part of the day putting the needed functionality into the form-backing object (namely adding and updating the new member class).

I started by copying similar behavior already in place for a different type of user in the system. Seems like a logical step, right? Should be pretty safe and straight forward. It wasn’t long before I started getting very uneasy about the code I was writing. There were more than a few conditionals, fragments of code in multiple places, and, most importantly, I wasn’t sure it was even doing the thing I needed to have done! This is a very scary way to feel. I quickly realized this was not going to work and backed out every change I had made to this object, choosing instead to start fresh the next day with some fresh eyes and a partner to do some pair-programming with.

The next morning we started out fresh, being sure not to write a line of application code without first having a test to cover what was being done. In true paired-programming/TDD fashion we alternated writing test and application code. By the time we were done, we had written fewer lines of application code than I had previously, had a nice little suite of tests, and, most importantly, we were confident the application now did the thing we needed it to do! So not only did we achieve better code through testing, but we did it in less time than I had spent by myself doing it the wrong way.

So remember: you may not have a lot of time to get the work done, but you always have a choice in how you do the work. And in the long run, better written code backed by tests will help you sleep better at night.

bear-sleeping-on-rug

A Good Day for DevOps at Notre Dame

Last week, several new processes and technologies were asked to sink or swim as the OIT and the Office of the Registrar brought two new Ruby on Rails applications to production.  I’m pleased to announce that due to a confluence of thorough planning, robust automatic deployment processes, and engaged collaboration across multiple OIT units, the apps are live and swimming, swimming, swimming.

Screen Shot 2014-07-01 at 11.42.45 PM

What do we have now that we didn’t have before?

  • two Rails apps in production
  • a Banner API poised to serve as a key data integration point for the future
  • an automated app deployment tool
  • a new workflow that empowers developers and speeds app deployment
  • puppet manifests to create consistency between developer VMs and deployment environments (satisfying point 10 of the Twelve-Factor App)

 

What else?

  • the experience to extend API services to other data sources and consumers
  • an architectural framework for future Rails app development
  • a boatload of fresh Ruby on Rails knowledge

 

Automation + Collaboration = Innovation.  Sound familiar?  These new practices and processes are enhancing our agility, velocity, and ability to deliver quality functionality to users.

Big Thanks

I have often observed that some of the most fulfilling times working in the OIT are on outage weekends.  Communication is quick and actions are decisive as disparate OIT teams come together, often in the same room, to bring new functionality to our campus constituents. That unity of purpose is the heart of DevOps, and I am pleased to say that I have seen it happen on a day-to-day basis recently. Let me highlight some of the people and teams who made this week a success, and who are laying the foundation for a bright future of ND application development.

Information Security

Jason Williams and his team were attentive and helpful in defining best practices for handling database and API credentials — something that is a little different in the new technology stack.  Not only that, but when we needed Webinspect scans done or firewall rules put in place quickly, Jason’s team was ready to jump in and take action to help us go live.

Database Administration

Fred Nwangana‘s team was involved from early on, helping shape how Rails applications would work in our environment.  Together we determined that this moment presents a great opportunity to decouple custom apps from the Banner database.  Vincent Melody in particular was a great help in provisioning database resources and helping drive forward our process standardization.

Change Control

Julie Stogsdill and Matt Pollard‘s contributions have been tremendous.  I came to them with Launchpad and a pretty clear agenda of putting TEST environment deployments in developers’ hands.  Rather than objecting to this idea, they helped me find ways to integrate the process into our change control system.  The new workflow is even more flexible than I had hoped, and has already allowed us to push important changes to production, via RFC, without a hint of dread that the process is too slow.

System Administration / Virtualization

I wrote puppet manifests to provision our servers, but I would have gotten nowhere in our local infrastructure without help from Chris Fruewirth‘s team.  Milind Saraph and Joseph Franco, plus John Pozivilko from the virtualization team, were a great help in creating hosts in VMWare, assigning IPs, updating systems, and answering lots of questions when my limited sysadmin knowledge hit a wall.  Plus, we are all going to be working toward increasing puppet infrastructure management in the future.  Good stuff ahead there!

Just the Beginning

People used to ask me how the new job was going.  There were so many things up in the air; how could I really give an answer?  So I’d say something like “ask me in six months.”  Well, now you can ask me any time, because the apps are live, the processes are working, and we are ready to take on new development challenges. There’s still more to tackle: expanding configuration management; exploring cloud infrastructure; implementing comprehensive monitoring.  But for now, I want to pause and say “thank you” to everyone who helped get us to this point.  Onward!

RFC workflow for Launchpad

Now that we are actually getting some Rails code to production, I have worked with the Change Control team and Change Advisory Board to incorporate Launchpad into the OIT change control process.  This process is similar to the old one, with some fantastic new features:

  • The developer (submitter) will get the BUILD TEST task
  • Upon receiving this task, the developer can deploy with Launchpad as many times as necessary (incrementing tags — see below).
  • Each TEST/PROD deploy generates a notification to change control.  They will be checking for an associated RFC!
    • A forthcoming change to Launchpad will include a field to give the RFC number, further reinforcing this
  • Change control will update the BUILD PROD task to use the latest deployed tag.  You may want to state this explicitly in the closure text, in addition to pasting the deploy history.

 

  • Rules:
    • Deploy tags only (more on git tagging)
      • Always include a tag message summarizing the changes
      • Tag convention:  v1.0.1, v3.2.21, v1.2.4a
        • First digit:  major releases.  Very rare, for large milestones in the project.  (Note not large BUNDLES of updates… we should be more iterative than ever now!)
        • Second digit: significant feature additions or enhancements
        • Third digit: minor additions, tweaks, or bug fixes.  This number can get high if necessary!
        • Letter:  optional, rare, only for hotfixes
    • DO NOT ALTER TAGS.  This new process allows you to iterate tag numbers in TEST.  It’s easy to make new ones.  DO IT!
    • Document deployments in Assyst.  Upon closing the task (when you’re ready for PROD), paste into Assyst a list of all your deployments
      • See <app_web_root>/version.txt, a file generated by Launchpad, to help with this.

Here’s a sample RFC:

Banner Web Services v1.0.2 
------------------------------------------------ 
v1.0.2 contains a new service to return a student's favorite color

Test 
Step 1 - [YOU, THE SUBMITTER] - api-internal-test.dc.nd.edu 
a. Use Launchpad (launchpad.dc.nd.edu) to deploy app "ndapi" to the TEST environment. 
    App: NDAPI
    Environment: test
    Task: Deploy:cold
    Tag: v1.0.2
    Do_Migration: True

Step 2 - [FUNCTIONAL_USER] - Test using attached testing spreadsheet. 

Step 3 - Webinspect 
[ATTACH WEBINSPECT PLAN]

Prod 
Step 1 - [Bruce Stump|Pete Bouris] - api-internal-prod.dc.nd.edu 
a. Use Launchpad (launchpad.dc.nd.edu) to deploy app "ndapi" to the PROD environment. 
    App: NDAPI
    Environment: production
    Task: Deploy:cold
    Tag: v1.0.2
    Do_Migration: True

Step 2 - [FUNCTIONAL_USER] - Test using attached testing spreadsheet.

Note that you must be specific in your Launchpad steps for the person running the prod deploy.  Soon, I will release command line tools / API endpoints for Launchpad that will make this less error-prone.

This is a great step forward, enabling developers to react quickly to issues that pop up during functional testing.  Thanks to the Change Control team and the CAB for their time, attention, and approval of this new process!

Launchpad: A Rails app deployment platform

Capistrano is a great tool for building scripts that execute on remote hosts.  While its functionality lends itself to many different applications, it’s a de facto standard for deploying Ruby on Rails apps.  A few months ago, I used it to automate app deployments and other tasks such as restarting server processes, and behold, it was very good.

I had provisioned each of the remote hosts using Puppet, so I knew that my machine configurations were good.  This meant that I could use the same capistrano scripts for multiple apps, as long as they used the same server stack and ran on one of these hosts.  In short, consistency enables automation.

However, these are a few issues with this approach.

  • Distribution of Credentials.  Capistrano needs a login to the remote host.  I can’t just give passwords or pem files to developers; our separation of responsibilities policy doesn’t allow it.
  • Proliferation of Cap Scripts.  I can’t hand over scripts to developers and expect them to stay the same.  I need to centralize these things and maintain one copy in one place.
  • Visibility.  I need these automated tools to work in tandem with our change control processes.  That means auditing and logging.
  • Access Control.  If I’m going to centralize, I need some way to say who can do what.

Enter Launchpad.

This is my solution: a web app that wraps all this functionality.  Launchpad has the following features:

  • A centralized repository of application data
    • git urls
    • deploy targets (dev, test, prod)
    • remote hosts
  • A UI for running capistrano tasks
  • Fine-grained access control per app/environment/task
  • Notification groups for deployment events (partially implemented)
  • Full audit trails of all actions taken in the system and the resulting output
  • Support for multiple stacks / capistrano scripts
  • JSON API (deploying soon)

 

Launchpad owns the remote host credentials, so users never have to see them.  As a result, I can give developers the ability to deploy outside of dev in a way that is safe, consistent, and thoroughly auditable.  My next blog post will outline the ways in which our Change Control team has worked to accommodate this new ability.

Right now, the only stack implemented in Launchpad is an NGINX/Unicorn stack for Rails apps, but there really is no limit to what we can deploy with this tool on top of capistrano.

Launchpad is available to internal OIT developers; see me for details.

Better, Faster, More Consistent

It wasn’t long ago that OIT wasted time and energy having DBAs manually execute SQL scripts created by developers.  Then, Sharif Nijim developed the “autodeploy” tool that allows us to run SQL scripts automatically from SVN tags.  Developers have a faster way to run SQL without imposing on DBAs, and DBAs have their valuable time freed up for more important work.  We have never looked back.  I’m hoping Launchpad will do the same with application deployments.  Onward!

AWS Public Sector Symposium

It has been a while since our last update, so now is the perfect time to give a little perspective as to where we stand with regards to our Amazon initiatives.

Amazon graciously invited Bob and me to present the Notre Dame AWS story and its fifth annual Public Sector Symposium.

Bq_6fKUCIAEKAvz.jpg-large

We described how we got started with AWS in the context of migrating www.nd.edu 18 months ago.  Over the past 18 months, we have experienced zero service issues in this, our most reliable and resilient service offering.  Zero.

Since then, as mentioned in our 2014 spring update when Notre Dame hosted the Common Solutions Group in May, we reviewed progress to-date in the context of governance, people and processes, and our initiatives to grow our institutional understanding of how to operate in the cloud.

Bob and I had fun presenting, and were able to give a synopsis on:

  • Conductor Migration:  ND-engineered content management system that migrated to AWS in March 2014.  You can read all about Conductor’s operational statistics here. The short story is the migration was a massive success.  Improved performance, reduced cost, increased use of the AWS tools (RDS, Elasticache), and a sense of joy on the part of the engineers involved in the move
  • ND Mobile App:  Instantiation of the Kurogo framework in providing a revamped mobile experience.  Zero performance issues associated with rollout.
  • AAA in the Cloud:  Replicating our authentication store in Amazon.  Using Route 53 to heartbeat against local resources so that in a failure condition, login.nd.edu redirects to the authentication store in Amazon for traffic originating from off campus.  This was tested in June and performed as designed.  Since the preponderance of our main academic applications, including Sakai, Box, and Gmail, are hosted off campus, the result of this effort is that we have separated service availability from local border network incidents.  If campus is offline for some reason, off-campus students, faculty, and staff are still able to use the University’s SaaS services.
  • Backups:  We are in the midst of testing Panzura, a cloud storage gateway product. Preliminary testing looks promising – look for an update on this ongoing testing soon.

We are absolutely thrilled with the progress being made, and look forward to continuing to push the envelope.  More to come!

Onward!

Posted in AWS

Calling Oracle Stored Procedures from Ruby with ruby-plsql

Isn’t it nice when something just works?  We are building Ruby on Rails apps on top of Oracle, so we’re using the Oracle Enhanced ActiveRecord adapter on top of the ruby-oci8 driver library.

The ActiveRecord adapter gives us a nice AR wrapper around our existing Oracle schema, which is great, but what about when I want to work with stored procedures or functions?  Turns out the author of this adapter, Raimonds Simanovskis, has a gem just for this called ruby-plsql.

Include the gem in your Gemfile:

gem 'ruby-plsql'

Then, write an initializer that hooks it to your existing ActiveRecord connection (config/plsql.rb)

plsql.activerecord_class = ActiveRecord::Base

After that, calling procedure is easy.  Oracle return types are automatically cast to ruby types.  Oracle exceptions are raised as OciError, which contains a “code” and “sql” attribute.  However, you can call a “message” method on that exception to get the full error output.

Here I call an Oracle procedure, idcard.nd_is_valid_pin using the plsql object provided in the gem:

ok_pin = plsql.idcard.nd_is_valid_pin( new_pin )
  if ok_pin
    plsql.idcard.update_pin_pr( @info.ndid, params[:old_pin], pin )
  else
    raise Errors::InvalidInput
  end
rescue OCIError => e
 render json: { error: e.message }, status: :unprocessable_entity

That’s it!  Nice and easy, and “rsims” is two for two.

ActiveRecord PSA: nil vs RecordNotFound

I’ve been meaning to get back into blogging here, and I think I have been blocked by the fact that many of the posts in my mental backlog are somewhat large in scope.  So here’s a useful bit of ActiveRecord trivia that I just learned.

When no records are found, the “find” method (ie Person.find(‘badinput’)) throws ActiveRecord::RecordNotFound.

However, any find_by_* method, such as Person.find_by_netid(‘badinput’), will return nil.

This was rather confusing as I focused on the error handling semantics of the Banner API.  I want that exception.  Good news, though:

find_by_netid!(‘badinput’) throws the exception.  The bang changes the behavior, as it often does, though not always in the way you may expect.

TLDR:

2.0.0-p353 :001 > Person.find('badinput')
ActiveRecord::RecordNotFound: Couldn't find Person with ndid=badinput <snip>

2.0.0-p353 :002 > Person.find_by_netid('badinput')
 => nil 

2.0.0-p353 :003 > Person.find_by_netid!('badinput')
ActiveRecord::RecordNotFound: Couldn't find Person with netid = badinput <snip>

So that’s it.  Maybe this will get me back in the habit.  Happy coding!

Why We Test

Despite design principles, despite best intentions, things can go wrong.  This is true when all of the design parameters are quantifiable, as exhibited by this spectacular engine failure:F1EngineFail What happens when some of the design constraints are not fully understood?  Specifically, consider the following excerpt taken from Amazon’s page detailing instance specifications:

Instance Family Instance Type Processor Arch vCPU ECU Memory (GiB) Instance Storage (GB) EBS-optimized Available Network Performance
Storage optimized i2.xlarge 64-bit 4 14 30.5 1 x 800 SSD Yes Moderate
Storage optimized i2.2xlarge 64-bit 8 27 61 2 x 800 SSD Yes High
Storage optimized i2.4xlarge 64-bit 16 53 122 4 x 800 SSD Yes High
Storage optimized i2.8xlarge 64-bit 32 104 244 8 x 800 SSD 10 Gigabit*4
Storage optimized hs1.8xlarge 64-bit 16 35 117 24 x 2,048*3 10 Gigabit*4
Storage optimized hi1.4xlarge 64-bit 16 35 60.5 2 x 1,024
SSD*2
10 Gigabit*4
Micro instances t1.micro 32-bit or
64-bit
1 Variable*5 0.615 EBS only Very Low
10 Gigabit is a measure we can understand.  But what does Moderate mean?  How does it compare to High, or Very Low?  It is possible to find out, but only if we test.

According to Amazon’s page on instance types (emphasis added):

Amazon EC2 allows you to provision a variety of instances types, which provide different combinations of CPU, memory, disk, and networking. Launching new instances and running tests in parallel is easy, and we recommend measuring the performance of applications to identify appropriate instance types and validate application architecture. We also recommend rigorous load/scale testing to ensure that your applications can scale as you intend.

Using LoadUIWeb by Smartbear, we were able to take a given workload and run it on different instance types to gain a better understanding of the performance differences.  Running 100 simultaneous virtual users with a 9 millisecond think time between pages, we ran against an M1.small and an M1.medium.  Here is are some pictures to illustrate what we discovered:

M1.small performance test output:

Unknown-2Small

M1.medium performance test output:

Unknown-1

Medium

 

Looking at the numbers, we see that the the response time of the small is roughly twice that of the medium.  As neither CPU, memory, or disk activity were constraints, this serves as a warning to us – the quality of service limitations placed on different instance types must be taken into consideration when doing instance size analysis.

When in doubt, test it out!

What’s in a User Story?

stickies

As I started out in Agile software development, as a lowly intern, I was bombarded by jargon. Terms like ‘scrum’, ‘sprint’, ‘velocity’, and ‘TCC’ came flying at me right from the get-go. With hardly time to create a mental dictionary of these new terms, I found myself in meetings with product owners, a senior developer, and QA person, all talking about how to implement the current ‘user story’. Having little to no background, I more or less mentally checked out until the meeting was over and we had a list of things that actually needed to be done; which meant I could now go and do what I knew, write some code.

Being the intern, I never had much insight into where these magical statements of desired functionality came from. All I knew is that when I was done with my current work, another developer and myself would pull the next user story from backlog into TCC, and schedule a meeting like the one above. I didn’t give any thought as to how these nuggets of work came about; I only viewed  each as the next problem to tackle. And I will admit that I was quite happy in my ignorance. Work was full of doing the thing that I enjoyed, without much distraction.

Skip ahead a few years, and I now find myself on the other side of that coin. Often my days are filled with meetings, attempting to discern what it is a customer really needs; as apposed to what they think they need. This can be a very difficult time for anyone who is very technical in nature as functional users are not very specific in what they need. I mean, why can’t they just tell me how much remote file storage they need? or exactly what firewall rules they need put into place? It’s their application after all, they should know how to make it work! And really I just want to get to coding as soon as possible.

Unfortunately the world is not as such, and we need a way to capture what a customer WANTS not what they NEED. And this is where the user story comes in. User stories are a great way to capture the function that the customer is needing to perform. For example: “As a user I need a place to store uploaded pictures to include in reports later”. Sounds simple, has a straight forward objective, and is easily understood by everyone involved. It does not give any indications as to where to put the pictures, what format the pictures will be stored in, or how many a single person may be able to store. These are all details to be sorted out before implementation. Because we aren’t bogged down at every step by all the details, the project team can spend their time figuring out what they want their program to do. Which is great, because who really knows how an entire system is going to be implemented until you get in and start making it anyway?

The important thing about a user story is to capture WHAT the user needs to do, and WHY they need to do it. During the TCC these two parts serve as the focus for implementation, and getting them down in a clear form makes things much easier going forward. I’ve included links to some other resources below; which elaborate on this topic. Thank you for reading, and happy requirements gathering!

Agile Atlas | User Stories

User Stories and User Story Examples by Mike Cohn

Project Advantages of User Stories as Requirements