The RBS Generator is a tool for the design of
(RBSs).
with user-defined relative translation rates
(rTR).
It combines predictions of the deep learning model
SAPIENs
with the genetic algorithm
GRASP
(Generation via Recursive Adaption of Sequence Populations) that allows highly efficient generative
5’-UTR
design in extremely large sequence spaces.
User manual
RBS Generator is based on
GRASP
a genetic algorithm that efficiently searches extremely large sequence pools for variants meeting user-defined criteria.
To feature
RBS
design, it uses
rTR
prediction by
SAPIENs
to score iteratively mutated populations of candidate
5’-UTRs
until they converge to the desired
rTR
value (i.e. design objective). Starting from randomly generated seed sequences, populations of 50 sequences are created through
in silico mutation, and mutants are propagated across generations depending on how close their predicted
rTR
matches the design objective. This heuristic process of “in silico evolution” is repeated until several
RBS
designs matching the objective are found, which are suggested to the user for experimental implementation.
Input
First, the design objective must be specified selecting one of three options: Maximize or Minimize are used to
increase or decrease the
rTR
to the highest or lowest possible level, respectively. With Target rTR, the user must additionally specify a desired target
rTR
value within the entire possible
rTR
range (0 – 100,000 a.u.).
Second, different sequence constraints are specified that define the mutational freedom and desired restrictions, under which the
subsequent optimization is taking place. Under Template, the user can specify the
5’-UTR
sequence from which the optimization should start (17 nts upstream of the start codon). In most cases, this should be the sequence
of the parent genetic construct (e.g. plasmid) that is to be optimized with the designed sequences. Via Mutable positions,
the user can indicate which positions in the Template may be mutated by
GRASP
. Positions are earmarked as mutable with N (full mutational freedom) or any other degenerate base following the
IUPAC
code.
Example:
| Template: |
AATATCTTAGCTAAATA[ATG] |
| Mutable positions: |
AATATNNNNNNNAAATA |
→
Result: The RBS Generator will iteratively create sequence populations allowing mutations only in the
underlined positions of the Template. N: A, C, T or G allowed. The [ATG] start codon must not be added to the query.
While adjusting Template and Mutable positions will suffice for most purposes, RBS Generator offers the
possibility to specify further sequence constraints and search parameters via the dropdown Advanced options. Here, the user
can further specify the maximum number of mutations and consecutive mutations that should appear in the target sequence relative to
the template (see mouseovers for further information). Further, the Number of top sequences can be adjusted to receive
between 1 and 50 (default: 10) of the top solutions (i.e. best-matching with the design objective) found across all generations.
Output
As main output, RBS Generator returns the top list of
5’-UTR
designs and corresponding predicted
rTRs
, which most closely approximate the desired target
rTR
. As secondary output, the
uncertainty
of each prediction in percent of the full
rTR-scale
(i.e. 1%
uncertainty
corresponds to ±1,000 a.u.) is specified. This parameter can additionally help users select designs, for which the model is
“most sure” about its predictions (<3%
uncertainty
is recommended).
Example output (*.txt):
| # sequence |
rTR_(a.u.) |
uncertainty_(%) |
| AATATAAGGAGGAAATA |
92723.224 |
0.063 |
| AATATATGGAGGAAATA |
87790.006 |
0.078 |
| AATATGGGAGGAAAATA |
86443.381 |
0.147 |
| ... |
|
|
Additionally, a graphical output file is generated detailing the optimization process over the different generations (line plot)
as well as graphical representations for the top solutions, which detail the sequences and predicted rTRs (bar plot) and the
positional nucleotide composition (logo plot).
Example graphical output (*.png):
Important considerations:
Via the Mutable positions field, the user defines the size of the search space in which optimal solutions are sought. If
only few positions are earmarked as mutable, the
SAPIENs
prediction model allows for an exhaustive search for the best sequences, which will be automatically performed for small sequence
space (up to 4,096 sequences). In these cases, no heuristic optimization is needed and correspondingly the line plot above not
shown. While this yields globally optimal designs, the limited mutational freedom may restrict the accessible
rTR-scale
range and lead to cases where the design objective cannot be approximated to a satisfactory degree. If possible, the mutational
freedom should be increased in such cases to obtain improved designs.
When you use RBS Designer in your published work, please do not forget to cite:
- Höllerer, S., Papaxanthos, L., Gumpinger, A. C., Fischer, K., Beisel, C., Borgwardt, K., Benenson, Y., & Jeschek, M. (2020).
Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping.
Nature communications, 11(1), 3551
(https://doi.org/10.1038/s41467-020-17222-4)
- For the genetic algorithm: publication coming soon!