Hello World!

Most good tutorials start with an easy Hello World! program. So have a look:

#include <seqan3/core/debug_stream.hpp> // for debug_stream
 
int main()
{
    seqan3::debug_stream << "Hello World!\n";
}

Note: This is a code snippet. You will see many code snippets in our documentation. Most of them are compilable as-is, but some are only valid in their context, e.g. they depend on other code snippets given before/after the current one or other statements implied by the text. You can copy'n'paste freely from these examples, this implies no copyright-obligations (however distributing SeqAn or an application using it does, see Copyright and Citing).

You may ask, why we do not use std::cout or std::cerr for console output. Actually, for the given text it does not make a difference since seqan3::debug_stream prints to std::cerr as well. However, the debug stream provides convenient output for SeqAn's types as well as widely used data structures (e.g. std::vector), which is especially helpful when you debug or develop your program (that's where the name originates).

Assignment 1: Debug stream

Write a program that creates a std::vector of type int and initialise the vector with a few values. Then print the vector with seqan3::debug_stream. Does your program also work with std::cerr?

Solution

#include <iostream>                          // for std::cerr
#include <vector>                            // for std::vector
#include <seqan3/core/debug_stream.hpp>      // for debug_stream
 
int main()
{
    std::vector<int> vec{-1,0,1};
    seqan3::debug_stream << vec << '\n';    // => [-1,0,1]
    // std::cerr << vec << '\n';            // compiler error: no operator<< for std::vector<int>
}

Note: This is an assignment with solution. You will find assignments in the tutorials to practise the discussed contents. We believe that programming them will help you to memorise better and makes the tutorials more interesting and interactive. The solutions provide the intended use; but often there are multiple ways to solve an assignment, so don't worry too much if your solution is different from ours.

Parse command line arguments

After we have seen the Hello World! program, we want to go a bit further and parse arguments from the command line. The following snippet shows you how this is done in SeqAn. Here the program expects a string argument in the program call and prints it to your terminal.

#include <string>                         // for std::string
#include <seqan3/argument_parser/all.hpp> // for argument_parser
#include <seqan3/core/debug_stream.hpp>   // for debug_stream
 
int main(int argc, char * argv[])
{
    // Create a buffer for the input.
    std::string input{};
    // Initialise the Argument Parser and add an option.
    seqan3::argument_parser parser("My-first-program", argc, argv);
    parser.add_positional_option(input, "Give me text.");
 
    // Parse the given arguments and catch possible errors.
    try
    {
        parser.parse();
    }
    catch (seqan3::argument_parser_error const & ext)
    {
        seqan3::debug_stream << "[PARSER ERROR] " << ext.what() << '\n';
        return 0;
    }
 
    seqan3::debug_stream << "The text was: " << input << "\n";
}

Implementing a program with seqan3::argument_parser requires three steps:

Initialise the seqan3::argument_parser with your program's name and pass the argc and argv variables.
Register (positional) options in the parser object. In this way it knows which options to expect and it can generate the help page for your program. You will learn more about the option types in the Argument Parser Tutorial.
Run the parser. As it throws exceptions on wrong user behaviour, it should be surrounded with a try-catch block.

You will see that the entered text is now in the buffer variable input. The argument parser provides way more functionality than we can show at this point, e.g. validation of arguments and different option types. We refer you to the respective tutorial if you want to know more.

Note: You may have spotted that the blue coloured keywords link you directly to the respective API documentation. This is helpful if you need further information on a function, concept or class. We recommend you to open them in separate browser tabs such that you can easily switch back to the tutorial.

Modules in SeqAn

You have just been introduced to one of the Modules of SeqAn, the Argument Parser. Modules structure the SeqAn library into logical units, as there are for instance alignment, alphabet, argument_parser, io, search and some more. See the API Reference (Modules) section in the navigation column for a complete overview.

Some modules consist of submodules and the module structure is represented by the file hierarchy in the include directory. Whenever you use functions of a module, make sure to include the correct header file. Each directory in the SeqAn sources contains an all.hpp file which includes all the functionality of the respective (sub-) module. For small examples and quick prototyping, you can just include these all.hpp-headers. However, for larger projects we recommend you include only the necessary headers, because this will reduce the compile time measurably.

Note: If you remember the name of a function or class, but don't know which (sub-)module it belongs to, you can enter it in the search bar (top-right).

Read sequence files

Let's look at some functions of the IO module: SeqAn provides fast and easy access to biological file formats. The following code example demonstrates the interface of seqan3::sequence_file_input.

#include <string>
 
#include <seqan3/core/debug_stream.hpp>         // for debug_stream
#include <seqan3/io/sequence_file/input.hpp>    // for sequence_file_input
 
int main ()
{
    std::filesystem::path tmp_dir = std::filesystem::temp_directory_path(); // get the tmp directory
 
    // Initialise a file input object with a FastA file.
    seqan3::sequence_file_input file_in{tmp_dir/"seq.fasta"};
 
    // Retrieve the sequences and ids.
    for (auto & [seq, id, qual] : file_in)
    {
        seqan3::debug_stream << "ID:     " << id << '\n';
        seqan3::debug_stream << "SEQ:    " << seq << '\n';
        seqan3::debug_stream << "Empty Qual." << qual << '\n';  // qual is empty for FastAfiles
    }
 
    return 0;
}

Can you imagine anything easier? After you have initialised the instance with a filename, you can simply step through the file in a for loop and retrieve the fields via structured bindings. The returned fields are SEQ, ID and QUAL to retrieve sequences, ids and qualities, respectively. The latter is empty unless you read FastQ files. The appropriate file format is detected by SeqAn from your filename's suffix.

Here is the content of seq.fasta, so you can try it out!

>seq1
ACGTGATG
>seq2
AGTGATACT

Assignment 2: Read a FastA file

Combine the code from above to read a FastA file and store its sequences in a std::vector of type seqan3::dna5_vector (which is a common DNA sequence type in SeqAn). Use the argument parser for obtaining the filename as command line argument to your program (e.g. call ./myprogram seq.fasta).

Solution

#include <string>                               // for std::string
#include <vector>                               // for std::vector
 
#include <seqan3/alphabet/nucleotide/dna5.hpp>  // for dna5 datastrucutre
#include <seqan3/argument_parser/all.hpp>       // for argument_parser
#include <seqan3/core/debug_stream.hpp>         // for debug_stream
#include <seqan3/io/sequence_file/input.hpp>    // for sequence_file_input
 
int main(int argc, char * argv[])
{
    // Receive the filename as program argument.
    std::string filename{};
    seqan3::argument_parser parser("My-first-program", argc, argv);
    parser.add_positional_option(filename, "The filename of the file to read.");
 
    try
    {
        parser.parse();
    }
    catch (seqan3::argument_parser_error const & ext)
    {
        seqan3::debug_stream << "[PARSER ERROR] " << ext.what() << '\n';
        return 0;
    }
 
    seqan3::debug_stream << "Reading file " << filename << "\n";
 
    // Create the vector to store sequences of type seqan3::dna5_vector.
    std::vector<seqan3::dna5_vector> sequences;
 
    // Iterate through the file and store the sequences.
    seqan3::sequence_file_input file_in{filename};
    for (auto & [ seq, id, qual ] : file_in)
    {
        sequences.push_back(seq);
    }
 
    seqan3::debug_stream << sequences << '\n';
    return 0;
}
 

Note that the same code can also read FastQ files and the qual variable will not be empty then. If you like, try it!

Note: SeqAn3 uses snake_case for almost everything, also class names. Only C++ concepts are named using CamelCase.

Align two sequences

We have two sequences from the file above now – so let us align them. The pairwise sequence alignment is one of the core algorithms in SeqAn and used by several library components and apps. It is strongly optimised for speed and parallel execution while providing exact results and a generic interface.

#include <vector>                                         // for std::vector
#include <tuple>                                          // for std::tie
 
#include <seqan3/alignment/aligned_sequence/all.hpp>      // for alignment stream operator <<
#include <seqan3/alignment/configuration/all.hpp>         // for all configs in the seqan3::align_cfg namespace
#include <seqan3/alignment/pairwise/all.hpp>              // for seqan3::align_pairwise
#include <seqan3/alphabet/nucleotide/dna5.hpp>            // for dna5 datastrucutre
#include <seqan3/argument_parser/all.hpp>                 // for argument_parser
#include <seqan3/core/debug_stream.hpp>                   // for debug_stream
#include <seqan3/io/sequence_file/all.hpp>                // for sequence_file_input and sequence_file_output
#include <seqan3/std/filesystem>                          // for tmp_dir
 
int main()
{
    auto tmp_dir = std::filesystem::temp_directory_path();
    std::string filename{tmp_dir/"seq.fasta"};
    {
        // Create a /tmp/seq.fasta file.
        seqan3::sequence_file_output file_out{filename};
 
        using seqan3::operator""_dna5;
 
        file_out.emplace_back("ACGTGATG"_dna5, std::string{"seq1"});
        file_out.emplace_back("AGTGATACT"_dna5, std::string{"seq2"});
    }
 
    // Initialise a file input object and a vector
    seqan3::sequence_file_input file_in{filename};
    std::vector<seqan3::dna5_vector> sequences;
 
    for (auto & [ seq, id, qual ] : file_in)
    {
        sequences.push_back(seq);
    }
 
    // Call a global pairwise alignment with edit distance and traceback.
    for (auto && res : align_pairwise(std::tie(sequences[0], sequences[1]),
                                      seqan3::align_cfg::method_global{} |
                                      seqan3::align_cfg::edit_scheme |
                                      seqan3::align_cfg::output_alignment{} |
                                      seqan3::align_cfg::output_score{}))
    {
        // Print the resulting score and the alignment.
        seqan3::debug_stream << res.score() << '\n';      // => -4
        seqan3::debug_stream << res.alignment() << '\n';  // =>       0     .    :
                                                          //            ACGTGATG--
                                                          //            | |||||
                                                          //            A-GTGATACT
    }
    std::filesystem::remove(filename);
    return 0;
}

The algorithm returns a range of result objects – which is the reason for the loop here (in this case the range has length 1). Instead of passing a single pair of sequences, we could give a vector of sequence pairs to the algorithm which then executes all alignments in parallel and stores the results in various seqan3::alignment_result objects. The second argument to seqan3::align_pairwise is the configuration which allows you to specify a lot of parameters for the alignment computation, for instance score functions, banded alignment and whether you wish to compute a traceback or not. The configurations have their own namespace seqan3::align_cfg and can be combined via the logical OR operator (|) for building combinations. Check out the alignment tutorial if you want to learn more.

Note: We encourage you to avoid declaring using namespace seqan3;. This has the additional benefit of easily distinguishing between library features and standard C++. The only exception are string literals, where we often use using seqan3::operator""_dna4; for convenience.; We use a lot of Modern C++ in SeqAn so some things might look alien at first, e.g. type templates are used like ordinary types in many situations (no <>). We also always use {} to initialise objects and not () which is only used for function calls. In general the style should be much easier for newcomers.

Now that you reached the end of this first tutorial, you know how SeqAn code looks like and you are able to write some first code fragments. Let's go more into detail with the module-based tutorials!

Difficulty	Easy
Duration	30 min
Prerequisite tutorials	Quick Setup (using CMake)
Recommended reading	Ranges, Concepts

Table of Contents