{"id":503,"date":"2016-06-10T18:39:18","date_gmt":"2016-06-10T18:39:18","guid":{"rendered":"http:\/\/blogs.nd.edu\/devops\/?p=503"},"modified":"2016-06-10T18:39:18","modified_gmt":"2016-06-10T18:39:18","slug":"onbase-to-aws-move-the-data","status":"publish","type":"post","link":"https:\/\/sites.nd.edu\/devops\/2016\/06\/10\/onbase-to-aws-move-the-data\/","title":{"rendered":"OnBase to AWS: Move the Data"},"content":{"rendered":"<h1>Over 7 million files?<\/h1>\n<p>As we were pondering moving OnBase, one of the first considerations was\u00a0how to move over five years&#8217; worth of documents from our local data center to our primary data center in AWS. \u00a0The challenge wasn&#8217;t massive in big data terms: 7 million files, 2 terabytes. \u00a0Due to file transfer overhead,\u00a0it is more efficient to\u00a0transfer\u00a0one big file\u00a0instead of\u00a0copying\u00a0millions of tiny files.<\/p>\n<div id=\"attachment_510\" style=\"width: 235px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/blogs.nd.edu\/devops\/files\/2016\/06\/Weldon_Spring_Site_containment_stairway_1.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-510\" class=\"size-medium wp-image-510\" src=\"http:\/\/blogs.nd.edu\/devops\/files\/2016\/06\/Weldon_Spring_Site_containment_stairway_1-225x300.jpg\" alt=\"By Kbh3rd (Own work) [CC BY 4.0 (http:\/\/creativecommons.org\/licenses\/by\/4.0)], via Wikimedia Commons\" width=\"225\" height=\"300\" srcset=\"https:\/\/sites.nd.edu\/devops\/files\/2016\/06\/Weldon_Spring_Site_containment_stairway_1-225x300.jpg 225w, https:\/\/sites.nd.edu\/devops\/files\/2016\/06\/Weldon_Spring_Site_containment_stairway_1.jpg 256w\" sizes=\"auto, (max-width: 225px) 100vw, 225px\" \/><\/a><p id=\"caption-attachment-510\" class=\"wp-caption-text\">Over 7 million files!<\/p><\/div>\n<div id=\"attachment_504\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/blogs.nd.edu\/devops\/files\/2016\/06\/BigBoulder.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-504\" class=\"wp-image-504 size-medium\" src=\"http:\/\/blogs.nd.edu\/devops\/files\/2016\/06\/BigBoulder-300x209.jpg\" alt=\"BigBoulder\" width=\"300\" height=\"209\" srcset=\"https:\/\/sites.nd.edu\/devops\/files\/2016\/06\/BigBoulder-300x209.jpg 300w, https:\/\/sites.nd.edu\/devops\/files\/2016\/06\/BigBoulder.jpg 768w, https:\/\/sites.nd.edu\/devops\/files\/2016\/06\/BigBoulder-431x300.jpg 431w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-504\" class=\"wp-caption-text\">7 million documents zipped into one large file.<\/p><\/div>\n<p>Since OnBase is a Windows-centric platform, the need to retain Windows file permissions as part of the data transfer was of prime concern. \u00a0Fundamentally, there were two ways to approach this: trickling the data out using multiple robocopy threads, or doing a bulk data transfer.<\/p>\n<h1>A word on CIFS in AWS<\/h1>\n<p>A brief aside on providing <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc939973.aspx\">CIFS<\/a> storage in AWS. \u00a0A straightforward way to get CIFS is\u00a0to use\u00a0Windows Server 2012 and <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc753479(v=ws.10).aspx\">Distributed File System<\/a>\u00a0from Microsoft.<\/p>\n<div style=\"width: 648px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/image.slidesharecdn.com\/stg401-131119164938-phpapp01\/95\/nfs-and-cifs-options-for-aws-stg401-aws-reinvent-2013-20-638.jpg?cb=1434517834\" alt=\"http:\/\/www.slideshare.net\/AmazonWebServices\/nfs-and-cifs-options-for-aws-stg401-aws-reinvent-2013\/20\" width=\"638\" height=\"359\" \/><p class=\"wp-caption-text\">CIFS in AWS<\/p><\/div>\n<p>You could also use a product from a company such\u00a0as <a href=\"http:\/\/www.averesystems.com\/\">Avere<\/a> or <a href=\"http:\/\/panzura.com\/products\/panzura-controllers\/\">Panzura<\/a> to present CIFS. \u00a0These products are storage caching devices that use RAM, <a href=\"https:\/\/aws.amazon.com\/ebs\/\">EBS<\/a>, and <a href=\"https:\/\/aws.amazon.com\/s3\/\">S3<\/a> in a tiered fashion, serving as a\u00a0translation layer between S3 object storage and CIFS. \u00a0Our current configuration makes use of Panzura, striped EBS volumes, and S3.<\/p>\n<h1>So, let&#8217;s move this data<\/h1>\n<div style=\"width: 2570px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/www.oreablog.com\/wp-content\/uploads\/2016\/05\/Tortoise-and-hare-0141.jpg\" width=\"2560\" height=\"1536\" \/><p class=\"wp-caption-text\">Robocopy versus bulk transfer<\/p><\/div>\n<p>The initial goal was to get the data from its current CIFS system to a CIFS environment in AWS with all relevant metadata in place. \u00a0We evaluated a number of options, including:<\/p>\n<ol>\n<li>Zipping up the directory structure and using AWS <a href=\"https:\/\/aws.amazon.com\/importexport\/\">Snowball<\/a>\u00a0for the transfer.<\/li>\n<li>Zipping up the directory structure, using <a href=\"http:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/dev\/mpuoverview.html\">S3 multipart upload<\/a> to pump the data into S3.<\/li>\n<li><a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc733145(v=ws.11).aspx\">Robocopy<\/a> to local storage on a virtual machine, use Windows backup\u00a0to get to a single file,\u00a0transmit that backup file to S3, copy file to a local EBS volume, and finally restore.<\/li>\n<li>Use NetBackup\u00a0to\u00a0backup to S3 and then restore to EC2.<\/li>\n<li>Zip the file structure, gather metadata with <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/cc753525(v=ws.11).aspx\">Icacls<\/a>, transmit to S3, copy from S3 to EBS, and restore.<\/li>\n<\/ol>\n<p>Using the robocopy approach would take weeks to transfer all of the data. \u00a0We immediately started trickling the data out using robocopy. \u00a0That said, the\u00a0team was interested in seeing if it was possible to compress that data transfer time to fit within an outage weekend.<\/p>\n<h2>Snowball<\/h2>\n<div style=\"width: 410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/media.amazonwebservices.com\/blog\/2015\/ie_sb_device_4.png\" width=\"400\" height=\"485\" \/><p class=\"wp-caption-text\">1 PB of capacity&#8230;overkill for 2 TB<\/p><\/div>\n<p>So we tested Snowball. \u00a0The first try\u00a0didn&#8217;t go so well. \u00a0The Snowball was dead on arrival, and we had to get a second one shipped to us. \u00a0Ultimately, it worked, but it was overkill for the volume of data we needed to move.<\/p>\n<h2>Zip, transmit, unzip<\/h2>\n<p>We broke down the transfer process into three basic steps:<\/p>\n<ol>\n<li>Prepare and package the data<\/li>\n<li>Transmit the data<\/li>\n<li>Rehydrate the data<\/li>\n<\/ol>\n<p>The transmit was the easy part. \u00a0We have a 10 GB network connection, and were able to use multipart upload to pump data to S3 at 6 Gpbs, transmitting\u00a0a 200 GB test file\u00a0in under an hour.<\/p>\n<div style=\"width: 510px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"http:\/\/www.bunnyslippers.com\/blog\/wp-content\/uploads\/2013\/12\/fast-rabbit-run.jpg\" width=\"500\" height=\"337\" \/><p class=\"wp-caption-text\">10 GB network connection + S3 multipart upload == speedy transfer<\/p><\/div>\n<h1>Reality Check<\/h1>\n<p>While discussing the different packaging\/rehydration options, we talked a bit more about how OnBase actually works. \u00a0It turns out, it manages files a bit like a rotating log file. \u00a0That is, it writes to a directory, then switches to a new directory and starts writing to that. \u00a0After files files are written, the old directory essentially becomes read only.<\/p>\n<p>That took the time pressure off the bulk of our data. \u00a0We could robocopy out the bulk of our data at our leisure. \u00a0On cutover weekend, we will migrate\u00a0the active directory\/directories.<\/p>\n<p>Problem solved.<\/p>\n<h1>What did we learn?<\/h1>\n<ol>\n<li>We can move data really quickly if we need to<\/li>\n<li>CIFS and AWS is feasible, but aren&#8217;t a match made in heaven<\/li>\n<li>You really need a comprehensive understanding of how your application works when you plan for any migration<\/li>\n<li>With a full understanding of what we needed to do, the simple, slow, tortoise approach of trickling data with robocopy met our needs.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>Stay tuned for adventures in load balancing!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over 7 million files? As we were pondering moving OnBase, one of the first considerations was\u00a0how to move over five years&#8217; worth of documents from our local data center to our primary data center in AWS. \u00a0The challenge wasn&#8217;t massive &hellip; <a href=\"https:\/\/sites.nd.edu\/devops\/2016\/06\/10\/onbase-to-aws-move-the-data\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1551,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[65088,65095],"tags":[],"class_list":["post-503","post","type-post","status-publish","format-standard","hentry","category-aws","category-cloud-infrastructure"],"_links":{"self":[{"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/posts\/503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/users\/1551"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/comments?post=503"}],"version-history":[{"count":10,"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/posts\/503\/revisions"}],"predecessor-version":[{"id":516,"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/posts\/503\/revisions\/516"}],"wp:attachment":[{"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/media?parent=503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/categories?post=503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.nd.edu\/devops\/wp-json\/wp\/v2\/tags?post=503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}