Grouping-class {IRanges}R Documentation

Grouping objects

Description

In this man page, we call "grouping" the action of dividing a collection of NO objects into NG groups (some of which may be empty). The Grouping class and subclasses are containers for representing groupings.

The Grouping core API

Let's give a formal description of the Grouping core API:

Groups G_i are indexed from 1 to NG (1 <= i <= NG).

Objects O_j are indexed from 1 to NO (1 <= j <= NO).

Every object must belong to one group and only one.

Given that empty groups are allowed, NG can be greater than NO.

Grouping an empty collection of objects (NO = 0) is supported. In that case, all the groups are empty. And only in that case, NG can be zero too (meaning there are no groups).

If x is a Grouping object:

length(x): Returns the number of groups (NG).

names(x): Returns the names of the groups.

nobj(x): Returns the number of objects (NO). Equivalent to length(togroup(x)).

Going from groups to objects:

x[[i]]: Returns the indices of the objects (the j's) that belong to G_i. The j's are returned in ascending order. This provides the mapping from groups to objects (one-to-many mapping).

grouplength(x, i=NULL): Returns the number of objects in G_i. Works in a vectorized fashion (unlike x[[i]]). grouplength(x) is equivalent to grouplength(x, seq_len(length(x))). If i is not NULL, grouplength(x, i) is equivalent to sapply(i, function(ii) length(x[[ii]])).

members(x, i): Equivalent to x[[i]] if i is a single integer. Otherwise, if i is an integer vector of arbitrary length, it's equivalent to sort(unlist(sapply(i, function(ii) x[[ii]]))).

vmembers(x, L): A version of members that works in a vectorized fashion with respect to the L argument (L must be a list of integer vectors). Returns lapply(L, function(i) members(x, i)).

Going from objects to groups:

togroup(x, j=NULL): Returns the index i of the group that O_j belongs to. This provides the mapping from objects to groups (many-to-one mapping). Works in a vectorized fashion. togroup(x) is equivalent to togroup(x, seq_len(nobj(x))): both return the entire mapping in an integer vector of length NO. If j is not NULL, togroup(x, j) is equivalent to y <- togroup(x); y[j].

tofactor(x): Like togroup, except a factor is formed with the level set defined as seq_len(length(x)).

togrouplength(x, j=NULL): Returns the number of objects that belong to the same group as O_j (including O_j itself). Equivalent to grouplength(x, togroup(x, j)).

Given that length, names and [[ are defined for Grouping objects, those objects can be considered List objects. In particular, as.list works out-of-the-box on them.

One important property of any Grouping object x is that unlist(as.list(x)) is always a permutation of seq_len(nobj(x)). This is a direct consequence of the fact that every object in the grouping belongs to one group and only one.

The H2LGrouping and Dups subclasses

[DOCUMENT ME]

The Partitioning subclass

A Partitioning container represents a block-grouping, i.e. a grouping where each group contains objects that are neighbors in the original collection of objects. More formally, a grouping x is a block-grouping iff togroup(x) is sorted in increasing order (not necessarily strictly increasing).

A block-grouping object can also be seen (and manipulated) as a Ranges object where all the ranges are adjacent starting at 1 (i.e. it covers the 1:NO interval with no overlap between the ranges).

Note that a Partitioning object is both: a particular type of Grouping object and a particular type of Ranges object. Therefore all the methods that are defined for Grouping and Ranges objects can also be used on a Partitioning object. See ?Ranges for a description of the Ranges API.

The Partitioning class is virtual with 2 concrete subclasses: PartitioningByEnd (only stores the end of the groups, allowing fast mapping from groups to objects), and PartitioningByWidth (only stores the width of the groups).

Constructors

H2LGrouping(high2low=integer()): [DOCUMENT ME]

Dups(high2low=integer()): [DOCUMENT ME]

PartitioningByEnd(end=integer(), names=NULL): Return the PartitioningByEnd object made of the partitions ending at the values specified by end. end must contain sorted non-negative integer values. If the names argument is non NULL, it is used to name the partitions.

PartitioningByWidth(width=integer(), names=NULL): Return the PartitioningByWidth object made of the partitions with the widths specified by width. width must contain non-negative integer values. If the names argument is non NULL, it is used to name the partitions.

Note that these constructors don't recycle their names argument (to remain consistent with what `names<-` does on standard vectors).

Author(s)

H. Pages and P. Aboyoun

See Also

List-class, Ranges-class, IRanges-class, successiveIRanges, cumsum, diff

Examples

  showClass("Grouping")  # shows (some of) the known subclasses

  ## ---------------------------------------------------------------------
  ## A. H2LGrouping OBJECTS
  ## ---------------------------------------------------------------------
  high2low <- c(NA, NA, 2, 2, NA, NA, NA, 6, NA, 1, 2, NA, 6, NA, NA, 2)
  x <- H2LGrouping(high2low)
  x

  ## The Grouping core API:
  length(x)
  nobj(x)  # same as 'length(x)' for H2LGrouping objects
  x[[1]]
  x[[2]]
  x[[3]]
  x[[4]]
  x[[5]]
  grouplength(x)  # same as 'unname(sapply(x, length))'
  grouplength(x, 5:2)
  members(x, 5:2)  # all the members are put together and sorted
  togroup(x)
  togroup(x, 5:2)
  togrouplength(x)  # same as 'grouplength(x, togroup(x))'
  togrouplength(x, 5:2)

  ## The List API:
  as.list(x)
  sapply(x, length)

  ## ---------------------------------------------------------------------
  ## B. Dups OBJECTS
  ## ---------------------------------------------------------------------
  x_dups <- as(x, "Dups")
  x_dups
  duplicated(x_dups)  # same as 'duplicated(togroup(x_dups))'

  ### The purpose of a Dups object is to describe the groups of duplicated
  ### elements in a vector-like object:
  x <- c(2, 77, 4, 4, 7, 2, 8, 8, 4, 99)
  x_high2low <- high2low(x)
  x_high2low  # same length as 'x'
  x_dups <- Dups(x_high2low)
  x_dups
  togroup(x_dups)
  duplicated(x_dups)
  togrouplength(x_dups)  # frequency for each element
  table(x)

  ## ---------------------------------------------------------------------
  ## C. Partitioning OBJECTS
  ## ---------------------------------------------------------------------
  x <- PartitioningByEnd(end=c(4, 7, 7, 8, 15), names=LETTERS[1:5])
  x  # the 3rd partition is empty

  ## The Grouping core API:
  length(x)
  nobj(x)
  x[[1]]
  x[[2]]
  x[[3]]
  grouplength(x)  # same as 'unname(sapply(x, length))' and 'width(x)'
  togroup(x)
  togrouplength(x)  # same as 'grouplength(x, togroup(x))'
  names(x)

  ## The Ranges core API:
  start(x)
  end(x)
  width(x)

  ## The List API:
  as.list(x)
  sapply(x, length)

  ## Replacing the names:
  names(x)[3] <- "empty partition"
  x

  ## Coercion to an IRanges object:
  as(x, "IRanges")

  ## Other examples:
  PartitioningByEnd(end=c(0, 0, 19), names=LETTERS[1:3])
  PartitioningByEnd()  # no partition
  PartitioningByEnd(end=integer(9))  # all partitions are empty

  ## ---------------------------------------------------------------------
  ## D. RELATIONSHIP BETWEEN Partitioning OBJECTS AND successiveIRanges()
  ## ---------------------------------------------------------------------
  mywidths <- c(4, 3, 0, 1, 7)

  ## The 3 following calls produce the same ranges:
  x1 <- successiveIRanges(mywidths)  # IRanges instance.
  x2 <- PartitioningByEnd(end=cumsum(mywidths))  # PartitioningByEnd instance.
  x3 <- PartitioningByWidth(width=mywidths)  # PartitioningByWidth instance.
  stopifnot(identical(as(x1, "PartitioningByEnd"), x2))
  stopifnot(identical(as(x1, "PartitioningByWidth"), x3))

[Package IRanges version 1.14.4 Index]