Chapter 1 exercise setup

Exercise 1.8 setup

The code instructions in the exercise statement appear to be outdated. The code below worked on my machine. Note that when asked whether I would like to update packages from the binary version, I said no. (When I said yes, R gave an error.)

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(c("Biostrings", "BSgenome.Celegans.UCSC.ce2","BSgenome"))

You can see the various data genome data sets available by loading the BSgenome library and typing available.genomes().

Once you have the needed packages installed, you can access the sequence data for this exercise via the following commands.

suppressMessages(library("BSgenome.Celegans.UCSC.ce2"))
Celegans
## Worm genome:
## # organism: Caenorhabditis elegans (Worm)
## # provider: UCSC
## # provider version: ce2
## # release date: Mar. 2004
## # release name: WormBase v. WS120
## # 7 sequences:
## #   chrI   chrII  chrIII chrIV  chrV   chrX   chrM                              
## # (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
## # to access a given sequence)
seqnames(Celegans)
## [1] "chrI"   "chrII"  "chrIII" "chrIV"  "chrV"   "chrX"   "chrM"
Celegans$chrM
##   13794-letter "DNAString" instance
## seq: CAGTAAATAGTTTAATAAAAATATAGCATTTGGGTT...TATTTATAGATATATACTTTGTATATATCTATATTA
class(Celegans$chrM)
## [1] "DNAString"
## attr(,"package")
## [1] "Biostrings"
length(Celegans$chrM)
## [1] 13794

The Biostrings packages provides functions to summarize the sequence. For example:

library("Biostrings")
lfM = letterFrequency(Celegans$chrM, letters=c("A", "C", "G", "T"))
lfM
##    A    C    G    T 
## 4335 1225 2055 6179
sum(lfM)
## [1] 13794
Avatar
Bailey Fosdick
Assistant Professor of Statistics

My research interests include the development of statistical methods for analyzing network data, with particular attention to applications in ecology and the social sciences.

Related