Git – Version Control and Reproducibility

Welcome.

Everything is fine.

Git & github links/instructions

Since we’re going to be using git for the rest of the semester to turn in assignments, we should get it down pretty good now.

Here’s a nice explanation of git & github that uses an analogy of an author writing a book.

https://blog.red-badger.com/blog/2016/11/29/gitgithub-in-plain-english

When git goes wrong:

http://ohshitgit.com

Resolve a conflict

https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/

Turning in assignments – only one per pair/group

  1. Complete the exercise on your local machine. Commit your changes.
  2. Push your repo to your own github repository.
  3. Go to the GitHub website and your own repository (which is a forked repo of my original repo).
  4. Use “Pull Request” to turn in your assignment (upper middle of the screen, click “New Pull Request”
  5. Make sure to type you and your collaborator’s name into the text box (ex: jones-rivaldi submission).
  6. Click “Create Pull Request”
  7. Done!

String manipulation with sed and grep

String manipulation with sed and grep

What is string manipulation and why do we care about this?

As biologists, or more generally as people interested in working with data, a lot of what we want to do will require us to manipulate many characters at the same time.

By definition:

  • A character is  class whose instances can hold a single character value.
  • A string is an immutable class for working with multiple characters.

 

For our purposes, we can consider strings as information that we don’t want to use for numerical calculations. DNA sequences, column or row names, and categorical/qualitative data values will generally be strings. You might want to remove the primers from a lot (like a lot) of sequences at the same time, or you might want to remove whitespaces from a dataset you found online.

Most of our string manipulation is covered by the previous links that are tied in with the for loops – here are a couple of useful comics for some of those commands though. ‘awk’ is new but you’re probably going to run into it during google adventures. As it in some ways is its own programming language, it’s very much worth learning, we just didn’t quite have time to fit it into our classwork.

https://www.hackerearth.com/practice/algorithms/string-algorithm/basics-of-string-manipulation/tutorial/

https://pythonforbiologists.com/printing-and-manipulating-text/

For loops in bash

A collection of really useful links for bash scripting

For loops:

Here are some links to tutorials I’ve compiled so you can get some extra practice using/crafting for loops. All of these will contain information we haven’t covered yet in addition to the basic for loop.

https://jvns.ca/blog/2017/03/26/bash-quirks/

https://astrobiomike.github.io/bash/for_loops

https://ryanstutorials.net/bash-scripting-tutorial/bash-loops.php

http://tldp.org/LDP/abs/html/loops1.html#EX22

Warning about using the output of ‘ls’ as a set for a for loop:

http://mywiki.wooledge.org/ParsingLs

More bash goodies:

http://www.kfirlavi.com/blog/2012/11/14/defensive-bash-programming/

https://google.github.io/styleguide/shell.xml

Test your skills!!

https://cmdchallenge.com/

Environment to test out code if you think something weird might be going on with your setup (warning – there might also be something weird with this setup, I haven’t played with it a whole lot).

https://repl.it/languages

Regex Practice

(Comic: www.xkcd.com/208)

Lots of options for practice – choose your favorite!

https://regexr.com/

https://regexone.com/

http://rubular.com/

Regex combined with sed and awk: https://likegeeks.com/regex-tutorial-linux/?epik=0wDgLEvIWHzZ9

Regex golf – match a string with the shortest possible expression:

https://alf.nu/RegexGolf

Bash scripting cheatsheet: https://devhints.io/bash
Common/userful bash one-liners: http://www.bashoneliners.com/
Friend’s github page with too much awesome information to put it into any other category – spend some time digging around: https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources