This tutorial walks you through small SeqAn programs. It is intended to give you a short overview of what to expect in the other tutorials and how to use this documentation.
Difficulty | Easy |
---|---|
Duration | 30 min |
Prerequisite tutorials | Quick Setup (using CMake) |
Recommended reading | Ranges, Concepts |
Every page in the tutorials begins with this section. It is recommended that you do the "prerequisite tutorials" before the current one. You should also have a look at the links provided in "recommended reading" and maybe keep them open in separate tabs/windows as reference.
These tutorials try to briefly introduce C++ features not well known, however they do not teach programming in C++! If you know how to program in another language, but are not familiar with C++ and/or the significant changes in the language in recent years, we recommend the following resources:
Most good tutorials start with an easy Hello World! program. So have a look:
You may ask, why we do not use std::cout or std::cerr for console output. Actually, for the given text it does not make a difference since seqan3::debug_stream prints to std::cerr as well. However, the debug stream provides convenient output for SeqAn's types as well as widely used data structures (e.g. std::vector), which is especially helpful when you debug or develop your program (that's where the name originates).
int
and initialise the vector with a few values. Then print the vector with seqan3::debug_stream. Does your program also work with std::cerr?
After we have seen the Hello World! program, we want to go a bit further and parse arguments from the command line. The following snippet shows you how this is done in SeqAn. Here the program expects a string argument in the program call and prints it to your terminal.
Implementing a program with seqan3::argument_parser requires three steps:
argc
and argv
variables.You will see that the entered text is now in the buffer variable input
. The argument parser provides way more functionality than we can show at this point, e.g. validation of arguments and different option types. We refer you to the respective tutorial if you want to know more.
You have just been introduced to one of the Modules of SeqAn, the Argument Parser. Modules structure the SeqAn library into logical units, as there are for instance alignment
, alphabet
, argument_parser
, io
, search
and some more. See the API Reference (Modules) section in the navigation column for a complete overview.
Some modules consist of submodules and the module structure is represented by the file hierarchy in the include
directory. Whenever you use functions of a module, make sure to include
the correct header file. Each directory in the SeqAn sources contains an all.hpp
file which includes all the functionality of the respective (sub-) module. For small examples and quick prototyping, you can just include these all.hpp
-headers. However, for larger projects we recommend you include only the necessary headers, because this will reduce the compile time measurably.
Let's look at some functions of the IO module: SeqAn provides fast and easy access to biological file formats. The following code example demonstrates the interface of seqan3::sequence_file_input.
Can you imagine anything easier? After you have initialised the instance with a filename, you can simply step through the file in a for loop and retrieve the fields via structured bindings. The returned fields are SEQ
, ID
and QUAL
to retrieve sequences, ids and qualities, respectively. The latter is empty unless you read FastQ files. The appropriate file format is detected by SeqAn from your filename's suffix.
Here is the content of seq.fasta
, so you can try it out!
./myprogram seq.fasta
).
Note that the same code can also read FastQ files and the qual
variable will not be empty then. If you like, try it!
snake_case
for almost everything, also class names. Only C++ concepts are named using CamelCase
.We have two sequences from the file above now – so let us align them. The pairwise sequence alignment is one of the core algorithms in SeqAn and used by several library components and apps. It is strongly optimised for speed and parallel execution while providing exact results and a generic interface.
The algorithm returns a range of result objects – which is the reason for the loop here (in this case the range has length 1). Instead of passing a single pair of sequences, we could give a vector of sequence pairs to the algorithm which then executes all alignments in parallel and stores the results in various seqan3::alignment_result objects. The second argument to seqan3::align_pairwise is the configuration which allows you to specify a lot of parameters for the alignment computation, for instance score functions, banded alignment and whether you wish to compute a traceback or not. The configurations have their own namespace seqan3::align_cfg and can be combined via the logical OR operator (|
) for building combinations. Check out the alignment tutorial if you want to learn more.
using namespace seqan3;
. This has the additional benefit of easily distinguishing between library features and standard C++. The only exception are string literals, where we often use using seqan3::operator""_dna4;
for convenience.<>
). We also always use {}
to initialise objects and not ()
which is only used for function calls. In general the style should be much easier for newcomers.Now that you reached the end of this first tutorial, you know how SeqAn code looks like and you are able to write some first code fragments. Let's go more into detail with the module-based tutorials!