rbsXpress - Glossary

The 5’-untranslated region (5’-UTR) is the mRNA part preceding the protein-coding sequence (CDS), i.e. reaching from transcriptional start site (TSS) to the nucleotide prior to the start codon (for polycistronic mRNAs also the part between two CDSs).

CDS

The protein-coding sequence (CDS) contains the primary sequence information that is translated into a polypeptide through the ribosome. It starts with a start codon (mainly AUG, in procaryotes also GUG and CUG) and ends with a stop codon. While not part of the RBS itself, the CDS is known to impact translation initiation, which is mainly exerted through mRNA secondary structures. Thus, RBSs have a context dependency imposing the need for RBS optimization in case-specific fashion. More details on sequence features that impact RBS behaviour like the CDS can be found here.

Degenerate RBS/sequence

A degenerate RNA or DNA sequence is one that contains at least one ambiguous/variable nucleotide (= ”degenerate nucleotide/base”). Degenerate bases can be designated using the IUPAC nucleotide code. As an example, the degenerate sequence AGN corresponds to one (or a mix) of the sequences AGA, AGC, AGG, AGU.

GRASP

GRASP (Generation via Recursive Aadaptation of Sequence Populations) is a genetic algorithm that efficiently searches extremely large sequence pools for variants meeting user-defined criteria. To feature RBS design, it uses deep learning-based rTR prediction to score iteratively diversified populations of candidate 5’-UTRs until they converge to the desired rTR value (i.e. design objective). Starting from randomly generated seed sequences, variant populations are created through in silico mutation, and mutants are propagated across generations depending on how close their predicted rTR matches the design objective.

This heuristic process of “in silico evolution” is repeated until several sequence designs matching the objective are found.

IUPAC nucleotide code

The IUPAC (International Union of Pure and Applied Chemistry) nucleotide code is a standardized system to represent nucleotide sequences with ambiguity. Despite the single-letter codes for adenine (A), cytosine (C), guanine (G), and thymine (T), it accounts for variable nucleotides. The full code is provided hereafter:

Code	Base(s)	Description
A	A	Adenine
C	C	Cytosine
G	G	Guanine
T	T	Thymine (in DNA)
U	U	Uracil (in RNA)
R	A, G	Purines
Y	C, T/U	Pyrimidines
S	G, C	Strong interactions (3 H-bonds)
W	A, T/U	Weak interactions (2 H-bonds)
K	G, T/U	Keto
M	A, C	Amino
B	C, G, T/U	Not A
D	A, G, T/U	Not C
H	A, C, T/U	Not G
V	A, C, G	Not T/U
N	A, C, G, T/U	Any nucleotide (unspecified)

RBS

The ribosome binding site (RBS) is a sequence upstream of the protein coding sequence in procaryotic mRNAs and therefore part of the 5’-UTR. It contains, the well-known Shine-Dalgarno (SD) sequence and controls the rate-limiting initiation step in translation and thus the resulting protein level in bacterial cells. Modifying the RBS sequence (i.e. RBS engineering) combines several attractive features such as the access to orders-of-magnitude changes in protein levels, the relative adjustment in polycistronic mRNAs and, most importantly, the possibility to predict their “strength” in silico using RBS sequences as queries. These features make RBSs an excellent target for expression level engineering in Synthetic Biology and Metabolic Engineering. For more information about RBSs and the optimization of multi-protein systems please have a look here.

RedLibs

RedLibs (brief for reduced libraries) is an algorithm developed to generate genetic variant libraries that uniformly span a range of a desired numerical functional/phenotypic properties. RedLibs was initially developed for RBS prediction data as input, which is commonly strongly skewed towards weak RBSs (i.e. low rTRs). To reach a better distribution and hence a more efficient search for optimal expression levels, RedLibs generates subsets of RBSs from the skewed input library, which have the following three key characteristics: first, they are encoded by a single, degenerate sequence allowing for facile cloning and experimental implementation of the library through the use of conventional and cheap oligos. Second, they have a user-defined size (i.e. no. of RBS variants) than the input library, which ensure that they can efficiently screened even at low experimental throughputs. And last, the uniformly span the entire accessible expression level/rTR range avoiding the skew and redundancy of randomly produced libraries such as the input.

Note that RedLibs accepts not only RBS prediction data but any data composed of pairs of sequences and numerical values as input. More details on RedLibs can be found in this publication, which also provides examples and protocols for RBS library cloning and testing.

rTR (a.u.)

The rTR (relative translation rate; a.u.) is the proxy for the relative strength of RBSs as predicted by rbsXpress. It corresponds to the linear slope of cell-specific protein accumulation within an interval of 290 minutes after induction. The rTR is normalized to a scale ranging from 0 (“very weak”) to 100,000 (“very strong”) a.u. for convenience. More details can be found here.

SAPIENs

SAPIENs (Sequence-Activity Prediction In Ensemble of Networks) is an ensemble of ten residual neural networks, which forms the core prediction model underlying rbsXpress. It was trained on approximately 250,000 experimentally tested RBSs sequences and considers a maximum of 17 nt upstream of the AUG start codon as input and predicts the relative translation rate (rTR) for each query with high accuracy (R²: 0.93, mean average error: 0.039). Furthermore, it provides an estimate for the confidence of each prediction. More information on the model as well as the underlying data generation can be found here.

Please note that the current version does not consider sequence features outside the 17 nt upstream of AUG, which will be addressed in newer versions.

Uncertainty

Besides the rTR, rbsXpress specifies an estimate of the uncertainty of each prediction, which ranges from 0 to 100%. The uncertainty is calculated from the deviation within the ensemble of ten neural networks that make up the core predictor of rbsXpress (see also: SAPIENs). It is a secondary parameter that can additionally help the user to select RBS sequences, for which the model is “most sure” about its predictions (the lower the percentage the better).

Uniformity

The uniformity is a score indicative of the quality of libraries designed by RedLibs. It can reach a maximum of 100%, which corresponds to a library that perfectly matches a uniform target distribution. Generally, values above 80% can be considered as nicely correlated. For more details on how RedLibs evaluates and compares the distribution of different libraries, please refer to the corresponding publication.

Usage data

Queue length

Summary statistics

Jobs this month

107

Jobs all time

2256

Registered users

Legal Disclaimer

This website is free to all users and licensed under CC BY-ND 4.0.

All tools provided on this website are for synthetic biology research and educational purposes. While efforts have been made to ensure the accuracy and reliability of the tools, they are provided "as is" without warranty of any kind, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement.

By using this tool, you acknowledge and agree that: The tools’ outputs are generated based on computational models and algorithms that may have limitations or inaccuracies. The developers, contributors, and distributors of this tool shall not be held liable for any direct, indirect, incidental, consequential, or punitive damages, including but not limited to loss of data, profits, or research opportunities, arising out of or related to your use or inability to use the tool. It is your responsibility to ensure that all use of the tool and any subsequent research or application complies with applicable laws, regulations, and ethical standards. The developers of this tool are not responsible for any misuse or unlawful activity arising from its use.

Data you provide may be collected and stored solely for the purpose of delivering and improving the service, which includes generating statistical insights to enhance user experience. We are committed to safeguarding your data and will not share, sell, or otherwise distribute your information to third parties, except as required by law.