October 2018
This session will be a recap of what we saw in practical 1 and practical 2. You can also use this session to revisit the parts of the last practicals that are still not entirely clear.
REMEMBER The important thing is that you understand what you are doing. It is better to understand well a few exercises than finishing all exercises without entirely knowing what is going on.
To get back into gear in terms of thinking about programming, we’ll start with a quick recap exercise of the material from the last weeks:
answer <- rep(x = c(42, 24), times = 42)
? And what would then be the outcome of the code mean(answer)
?antilope
in the vector c("cameleopard", "eop4a", "kiloparsec", "antilope")
? In R, there are a few ways of getting a result. Find three ways to answer this question.my_square_roots
instead of just printing the results. What's the value of the 3rd iteration? What is the sum of the square roots of the numbers 16 to 49?for
loop that calculates the first 20 numbers in a Fibonacci sequence starting from 1.Hint - we cannot use positions equal or lower than 0 in a vector. Start with a vector containing the 2 first numbers of the sequence (1 and 1 in this case) and run through a loop starting in position 3 of your sequence to calculate the rest
x
from number y
, where x
and y
are inputs to the function.Hint - The starting number y
will be the first two positions of your resulting vector
Palindromes are arrangements of words or letters which read the same way whether you read them backwards or forwards, such as the phrase ‘Never odd or even’. In molecular biology, many restriction enzyme sites are palindromic.
Before starting to code, think about the steps that you would need to go through in order to judge if something is a palindrome or not. Write these steps as comments in an R script window, then think about how you can tell the computer to execute those steps. Only start writing the code when you have a plan of what you want to do. Don’t be afraid to test lines independently in the console and to use easy test cases where you know the answer in order to check that your function works.
The bottom section of the R help sheets normally have examples of how a command can be used. Sometimes one of these examples will be a way to solve the problem that you are currently working on. The strsplit
helpsheet is particularly interesting in relation to this question.
5’ ACCTAGGT 3’
||||||||
3’ TGGATCCA 5’
Protein-coding regions in the genome can be predicted by detecting open reading frames. An open reading frame normally begins with the start codon ‘ATG’ and ends at one of three possible stop codons, ‘TGA’, ‘TAA’ and ‘TAG’. The sequence in between these two points is arranged in 3-base codons.
Hint - an open reading frame should always start with the start codon 'ATG' and end with one of the stop codons. Additionally, the length of the entire sequence should be a multiple of 3. If any of these features are not met, the sequence is not a functional open reading frame.
ATGGATTTTTAG
ATGGATTTTCTAG
CTAATGGATTTTTGAAT
atgctaaactaa
TCGATTAA
"forward_only"
if there is an open reading frame only on the forward strand, "reverse_only"
if only on the reverse, "both"
if in both or "none" if in neither.Run the following line of code to import the butterfly_sample
and the butterfly_reference
data frames:
butterfly_sam_url <- "http://wurmlab.github.io/SBC361-programming-in-R/butterfly_sample.csv"
butterfly_sample <- read.csv(butterfly_sam_url, header = TRUE)
butterfly_ref_url <- "http://wurmlab.github.io/SBC361-programming-in-R/butterfly_reference.csv"
butterfly_reference <- read.csv(butterfly_ref_url, header = TRUE)
The butterfly_reference
data frame contains the species name and the common name of a number of butterflies.
The butterfly_sample
data frame contains information on butterflies caught in sweep netting surveys in two locations under different pesticide treatments (locations A and B). This data was collected by multiple people, who have recorded the common names of the species that they encountered (without using a standard letter case). In order to be able to compare the diversity between the two different sites, you will need to standardise the names.
butterfly_sample
.TIP: There are several ways to do this. Remember that R is case sensitive, so you will need to account for case differences in your function. grep
and gsub
both allow you to set an ignore.case = TRUE
option. Alternately, you could use the R commands toupper()
and tolower()
. Use the help pages to see how these work, which you can access by typing a question mark before the command - ?toupper
.
This question is an extension of the question Q5. Again, we will analyse a data frame containing the species of butterflies observed in two locations (A and B). This time, however, some of the people who did the sampling recorded the common name of the species, while others recorded the latin name (all with inconsistent letter case).
butterfly_sam_bonus_url <- "http://wurmlab.github.io/SBC361-programming-in-R/butterfly_sample_bonus.csv"
butterfly_sample_bonus <- read.csv(butterfly_sam_bonus_url, header = TRUE)
One of the difficulties of this exercise is that you will have to perform a different process depending on whether the sample already has its Latin name or not. You may find that using if
statements may be helpful (which you will have to look up). Alternatively, you may want to subset the data into the two groups (where the transformation is either from Latin to Latin or common to Latin), and do the transformation independently on each.
This question is not easy! But it is typical of the sort of thing researchers do from day to day and a very good test of all the things you have learned this year!
rock_scissors_paper(your_play)
).NOTE: You will need to use if else statements in R. Have a look online to see how they work.