FASTA-io-legacy {Biostrings}R Documentation

Legacy functions to read/write FASTA formatted files

Description

readFASTA and writeFASTA read from and write to a FASTA file.

Note that the object returned by readFASTA or passed to writeFASTA is a standard list. For faster and more memory efficient alternatives that return/accept an XStringSet object, see the read.DNAStringSet function and family.

Usage

readFASTA(file, checkComments=TRUE, strip.descs=TRUE)
writeFASTA(x, file="", desc=NULL, append=FALSE, width=80)

Arguments

file Either a character string naming a file or a connection. If "" (the default for writeFASTA), then the function writes to the standard output connection (the console) unless redirected by sink.
checkComments Whether or not comments, lines beginning with a semi-colon should be found and removed.
strip.descs Whether or not the ">" marking the beginning of the description lines should be removed. Note that this argument is new in Biostrings >= 2.8. In previous versions readFASTA was keeping the ">".
x A list as one returned by readFASTA if desc is not specified (i.e. NULL). If desc is specified (see below) then x can also be a list-like object with XString elements (for example it can be an XStringSet, XStringViews or BSgenome object) or just a character vector.
desc If NULL (the default) then x must be a list as one returned by readFASTA and all the sequences in x are written to the file. Otherwise desc must be a character vector no longer than the number of sequences in x containing the descriptions of the sequences in x that must be written to the file.
append TRUE or FALSE. If TRUE output will be appended to file; otherwise, it will overwrite the contents of file. See ?cat for the details.
width The maximum number of letters per line of sequence.

Details

FASTA is a simple file format for biological sequence data. A file may contain one or more sequences, for each sequence there is a description line which begins with a >.

FASTA is a widely used format in biology. It is a relatively simple markup. I am not aware of a standard. It might be nice to check to see if the data that were parsed are sequences of some appropriate type, but without a standard that does not seem possible.

There are many other packages that provide similar, but different capabilities. The one in the package seqinr seems most similar but they separate the biological sequence into single character strings, which is too inefficient for large problems.

Value

For readFASTA: A list with one element per FASTA record in the file. Each element is in two parts, one is the description of the record and the second a character string of the biological sequence.

Author(s)

R. Gentleman, H. Pages. Improvements to writeFASTA by Kasper D. Hansen

See Also

read.DNAStringSet, fasta.info, write.XStringSet, read.table, scan, write.table, BSgenome-class

Examples

  f1 <- system.file("extdata", "someORF.fa", package="Biostrings")
  ff <- readFASTA(f1, strip.descs=TRUE)
  desc <- sapply(ff, function(x) x$desc)
  desc

  ## Keep the "reverse complement" sequences only:
  ff2 <- ff[grep("reverse complement", desc, fixed=TRUE)]

  ## Write them to a FASTA file:
  temp_file <- file.path(tempdir(), "temp.fa")
  writeFASTA(ff2, file=temp_file)

  ## Write the first 2 to a FASTA file with a modified description:
  writeFASTA(ff2, file=temp_file, desc=c("a", "b"))

  ## Write a genome to a FASTA file:
  library(BSgenome.Celegans.UCSC.ce2)
  writeFASTA(Celegans, file=temp_file, desc=seqnames(Celegans))

[Package Biostrings version 2.18.2 Index]