My classwork from bimm143
Sam Fisher (A18131929)
Functions are the heart of using R. Everything we do involves calling and using functions (from data input, analysis, to results output).
All functions have at least 3 things:
Let’s write a silly wee function to add some numbers:
add <- function(x) {
x + 1
}
Let’s try it out
add(100)
[1] 101
Will this work
add(c(100, 200, 300))
[1] 101 201 301
Modify to be more useful
add <- function(x, y=1) {
x + y
}
add(100, 10)
[1] 110
Will this still work?
add(100)
[1] 101
plot(1:10, col="blue", typ="b")

log(10, base=10)
[1] 1
Note Input arguments can either be required or optional. The later have a fall-back default that is specified in the function code with an equals sign.
All functions in R look like this
name <- function(arg) {body}
The sample() function in R…
sample(1:10, 4)
[1] 8 10 7 2
Q. Return 12 numbers picked randomly from the input 1:10
sample(1:10, 12, replace=T)
[1] 5 1 3 3 1 9 3 4 5 7 7 10
Q. Write the code to generate a 12 nucleotide long DNA sequence?
sample(c("A","T", "G", "C"), 12, replace=T)
[1] "C" "G" "A" "C" "T" "T" "C" "G" "T" "A" "T" "C"
Q. Write a first version function called
generate_dna()that generates a user specified lengthnrandom DNA sequence?
generate_DNA <- function(n = 10) {
sample(c("A","T","G","C"), size = n, replace = TRUE)
}
generate_DNA()
[1] "T" "G" "G" "A" "T" "A" "C" "A" "G" "A"
Q. Modify your function to return FASTA like sequence so rather than [1] “G” “A” “C” “A” “T” we want “GACAT”. We want an actual sequence returned.
generate_DNA <- function(n=10) {
seq <- sample(c("A", "T", "C", "G"), size = n, replace = T)
paste(seq, collapse = "")
}
generate_DNA()
[1] "TGAACAGTGC"
Q. Give the user an option to return FASTA format output sequence or standard multi-element vector format?
generate_DNA <- function(n=10, fasta=TRUE) {
ans <- sample(c("A", "T", "C", "G"), size = n, replace = T)
if(fasta) {
ans <- paste(ans, collapse = "")
cat("Hello...")
} else {
cat("... is it me you're looking for...")
}
return(ans)
}
generate_DNA(10)
Hello...
[1] "TCCGTACACA"
generate_DNA(10, fasta=FALSE)
... is it me you're looking for...
[1] "A" "T" "T" "C" "G" "G" "T" "C" "C" "A"
Q. Write a function called
generate_protein()that generates a user specified length protein sequence in FASTA like format?
generate_protein <- function(n = 10, fasta = TRUE) {
amino <- c("A","R","N","D","C","E","Q","G","H","I","L","K","M","F","P","S","T","W","Y","V")
seq <- sample(amino, size = n, replace = TRUE)
if (fasta) {
return(paste(seq, collapse = ""))
} else {
return(seq)
}
}
generate_protein(15)
[1] "EARIAFDKQEKLCQT"
generate_protein(15, fasta=FALSE)
[1] "C" "W" "A" "A" "W" "Y" "R" "H" "Y" "P" "W" "S" "F" "L" "T"
Q. Use your new
generate_protein()function to generate all sequences between length 6 and 12 amino-acids in length and check if any of these are unique in nature (i.e. found in the NR database at NCBI)?
for(i in 6:12) {
cat(">",i, sep="", "\n")
cat(generate_protein(i), "\n")
}
>6
VWRMAV
>7
TPWRMTC
>8
PVRDHCKI
>9
VAHCFRCQT
>10
YDSILCYHGW
>11
FKDETVGGYGM
>12
ICTMPSMVFANV
Identical matches were only seen in the amino acids that were generated with lengths 6 and 7.