General notes¶
This is an aggregation of titbits and useful notes that the developer feels are worth sharing.
Using the NUPACK website¶
If you are comparing the result of the tests with the online NUPACK website, it is common for a disparity whereby ToeholdTools suggests an RNA activates the toehold switch but the website disagrees. This is due to a difference in thermodynamic models used, since this package uses NUPACK 4 whereas the NUPACK website uses NUPACK 3.
Note
This information is accurate as of 28/09/2021. However, the developer understands that NUPACK hopes to release a new version of their website integrating NUPACK 4. Therefore by the time you read this the information may be out of date.
However, if you must emulate the website’s behavior, we provide the class thtools.utils.ModelNu3
,
which subclasses nupack.Model
. The majority of the ToeholdTools suite has a model
argument
you can pass an instance of this object to.
Performance¶
For performance, thtools.core
is compiled via Cython,
and uses pathos.multiprocessing
for distributing work across CPU cores [McKerns et al., 2012].
The majority of runtime is taken up by NUPACK simulations which we have no control over. However, we have made reasonable effort to bring down the time of analysing the results that NUPACK returns. Future releases may improve upon this by internally using GIL-free processing of typed memoryviews using OpenMP.
Tracking Progress¶
Since testing against databases which are thousands of RNAs long will take several minutes, we provide a few options you have if you wish to view the progress.
Firstly, you can set thtools.core.USE_TIMER
to True.
This will make the test print from each CPU core,
both when it has finished running the NUPACK simulation
and also when it has finished processing the result.
This will not slow down the test at all,
but may not be helpful since the majority of the time is taken up by NUPACK’s algorithms,
so there will not be much time difference
between the first set of printing (after the initial simulations) and the end of the whole test itself.
Second, you can use the generate()
methods of ToeholdTest
and CelsiusRangeTest
in lieu of run()
.
This will return a generator that you can iterate through
using something like tqdm to track the progress.
By itself, that will not be helpful, since there is only one worker process per CPU core
by default (and so the progress bar will not update until the very end),
but you can change that using the chunks_per_node
argument.
Note
For ToeholdTest
only, the caveat is that using many small chunks
instead of few large ones reduces performance significantly since NUPACK cannot cache
free energy values across separate simulations.
For example:
import thtools as tt
from tqdm.autonotebook import tqdm
my_fasta = tt.FParser.fromspecies("Acyrthosiphon pisum") # grab miRNAs from miRBase
my_test = tt.autoconfig(
ths=(
"UUAGCCGCUGUCACACGCAC"
"AGGGAUUUACAAAAAGAGGA"
"GAGUAAAAUGCUGUGCGUGC"
"ACCAUAAAACGAACAUAGAC"
),
rbs="AGAGGAGA",
triggers=my_fasta.seqs,
names=my_fasta.ids, # maps each name -> trigger
const_rna=[],
set_size=1,
)
my_generator = my_test.generate(max_size=3, n_samples=100, chunks_per_node=10)
for _ in tqdm(my_generator, total=my_fasta.num):
pass # iterate through progress bar
result = my_test.result
print(result.prettify(dp=4))
Naturally, since the generate()
methods return generators, not the results,
you cannot receive the result the same way as in run()
.
However, the results of both ToeholdTest
and CelsiusRangeTest
are stored in the result
attribute.