JOBIM 2025 - GrAuFlow : A snakemake workflow for pangenome graph augmentation using short read data

Poster presented at JOBIM 2025 as part of Antoine Malet’s apprenticeship within the GraPanPhy project (SPE, INRAE), focusing on the establishment of genome-wide graphs for the analysis and monitoring of plant fungal pathogen populations.

Background

Pangenome graphs are gaining popularity in genomic analysis as they address the bias introduced by using a single reference genome in population variant analyses. However, with the constant acquisition of new sequencing data, it is essential to update these graphs to incorporate new genomic resources. When new fully sequenced genomes are available, reconstructing the graph is often the most convenient method. In the case of small sequences, such as those from amplicon sequencing, augmenting the graph may be more straightforward, as only a small portion of the graph will be modified. In this study, we are interested in augmenting a graph with fragmented genomes assembled from short reads. This data represents a valuable resource of genetic diversity that is not currently utilized in graphs, where use of T2T genomes are recommended.

Results

In this context, we are developing a workflow called GrAuFlow (Graph Augmentation Workflow) using the Snakemake workflow manager [1]. First, GrAuFlow performs an assembly of Illumina short read data using the SPAdes assembly toolkit [2], retaining only contigs that pass stringent quality filters. Then, contigs are fractioned in long reads sequence like and mapped onto the graph using different tools: Palss [3], GraphAligner [4] and SVArp [5], before graph augmentation with vg augment [6]. GrAuFlow then extracts structural variants (SV) from the different strategies to retain only well supported with a minimal length. Finally, SVs are compared to modify the graph only with those that show consistent variants across all graph augmentation tools. To test our approach, we use Zymoseptoria tritici, a fungal pathogen responsible for septoria tritici blotch of wheat.

Conclusion

Based on graphs generated by Minigraph and Minigraph-Cactus using 8 genomes of Zymoseptoria tritici, we validate that short-reads data could be useful to add new information in pangenome graph. Nevertheless, this approach is limited to medium-size variants. Structural variants that are not easily assembled due to repeat contents or complex events may not be detected, which makes this approach interesting for enriching specific loci of interest.

References

[1] Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, et al. Sustainable data analysis with Snakemake. F1000Research. 2021 Apr;10:33. Available from: http://dx.doi.org/10.12688/f1000research.29032.2.
[2] Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics. 2020 Jun;70(1). Available from: http://dx.doi.org/10.1002/cpbi.102.
[3] Denti L, Bonizzoni P, Brejova B, Chikhi R, Krannich T, Vinar T, et al. Pangenome graph augmentation from unassembled long reads. 2025 Feb. Available from: http://dx.doi.org/10.1101/2025.02.07.637057.
[4] Rautiainen M, Marschall T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biology. 2020 Sep;21(1). Available from: http://dx.doi.org/10.1186/s13059-020-02157-2.
[5] Soylev A, Ebler J, Pani S, Rausch T, Korbel J, Marschall T. SVarp: pangenome-based structural variant discovery. 2024 Feb. Available from: http://dx.doi.org/10.1101/2024.02.18.580171.
[6] Garrison E, Sir´en J, Novak AM, Hickey G, Eizenga JM, Dawson ET, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.2018 Oct;36(9):875–879. Available from: http://dx.doi.org/10.1038/nbt.4227.

This work was carried out as part of Antoine Malet's apprenticeship with the bioinformatics platform BioinfoBioger

Link to the poster: https://hal.science/hal-05209521