Chapter 1 exercise setup
Exercise 1.8 setup
The code instructions in the exercise statement appear to be outdated. The code below worked on my machine. Note that when asked whether I would like to update packages from the binary version, I said no. (When I said yes, R
gave an error.)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("Biostrings", "BSgenome.Celegans.UCSC.ce2","BSgenome"))
You can see the various data genome data sets available by loading the BSgenome
library and typing available.genomes()
.
Once you have the needed packages installed, you can access the sequence data for this exercise via the following commands.
suppressMessages(library("BSgenome.Celegans.UCSC.ce2"))
Celegans
## Worm genome:
## # organism: Caenorhabditis elegans (Worm)
## # provider: UCSC
## # provider version: ce2
## # release date: Mar. 2004
## # release name: WormBase v. WS120
## # 7 sequences:
## # chrI chrII chrIII chrIV chrV chrX chrM
## # (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
## # to access a given sequence)
seqnames(Celegans)
## [1] "chrI" "chrII" "chrIII" "chrIV" "chrV" "chrX" "chrM"
Celegans$chrM
## 13794-letter "DNAString" instance
## seq: CAGTAAATAGTTTAATAAAAATATAGCATTTGGGTT...TATTTATAGATATATACTTTGTATATATCTATATTA
class(Celegans$chrM)
## [1] "DNAString"
## attr(,"package")
## [1] "Biostrings"
length(Celegans$chrM)
## [1] 13794
The Biostrings packages provides functions to summarize the sequence. For example:
library("Biostrings")
lfM = letterFrequency(Celegans$chrM, letters=c("A", "C", "G", "T"))
lfM
## A C G T
## 4335 1225 2055 6179
sum(lfM)
## [1] 13794