Chapter 1 exercise setup
Exercise 1.8 setup
The code instructions in the exercise statement appear to be outdated. The code below worked on my machine. Note that when asked whether I would like to update packages from the binary version, I said no. (When I said yes, R gave an error.)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("Biostrings", "BSgenome.Celegans.UCSC.ce2","BSgenome"))
You can see the various data genome data sets available by loading the BSgenome library and typing available.genomes().
Once you have the needed packages installed, you can access the sequence data for this exercise via the following commands.
suppressMessages(library("BSgenome.Celegans.UCSC.ce2"))
Celegans
## Worm genome:
## # organism: Caenorhabditis elegans (Worm)
## # provider: UCSC
## # provider version: ce2
## # release date: Mar. 2004
## # release name: WormBase v. WS120
## # 7 sequences:
## # chrI chrII chrIII chrIV chrV chrX chrM
## # (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
## # to access a given sequence)
seqnames(Celegans)
## [1] "chrI" "chrII" "chrIII" "chrIV" "chrV" "chrX" "chrM"
Celegans$chrM
## 13794-letter "DNAString" instance
## seq: CAGTAAATAGTTTAATAAAAATATAGCATTTGGGTT...TATTTATAGATATATACTTTGTATATATCTATATTA
class(Celegans$chrM)
## [1] "DNAString"
## attr(,"package")
## [1] "Biostrings"
length(Celegans$chrM)
## [1] 13794
The Biostrings packages provides functions to summarize the sequence. For example:
library("Biostrings")
lfM = letterFrequency(Celegans$chrM, letters=c("A", "C", "G", "T"))
lfM
## A C G T
## 4335 1225 2055 6179
sum(lfM)
## [1] 13794