Abstract
We introduce BlasTax, a standalone software tool wrapping the BLAST algorithm for finding regions of similarity between nucleotide and amino acid sequences. BlasTax is designed to serve both general users of local BLAST who seek a simple and user-friendly interface, and taxonomists engaged in phylogenomics and museomics projects. BlasTax is driven by a graphical user interface that makes various BLAST functions accessible without separately installing the BLAST+ executables. It introduces several advanced modes to retrieve matching reads from FASTQ files of high-throughput sequencing of archival DNA from recent or historical collection material, to append matching sequences to existing alignments, or to decontaminate sequence data sets from sequences of non-target taxa. The program also comprises functions for the preparation of sequence files to be used as reference or query for BLAST, as well as utilities for sequence merging based on species labels, codon trimming and codon-aware multiple sequence alignments.
References
- Agne, S., Preick, M., Straube, N. & Hofreiter, M. (2022) Simultaneous barcode sequencing of diverse museum collection specimens using a mixed RNA bait set. Frontiers in Ecology and Evolution, 10, 909846. https://doi.org/10.3389/fevo.2022.909846
- Alikhan, N.F., Petty, N.K., Ben Zakour, N.L. & Beatson, S.A. (2011) BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics, 12, 402. https://doi.org/10.1186/1471-2164-12-402
- Alser, M., Rotman, J., Deshpande, D., Taraszka, K., Shi, H., Baykal, P.I., Yang, H.T., Xue, V., Knyazev, S., Singer, B.D., Balliu, B., Koslicki, D., Skums, P., Zelikovsky, A., Alkan, C., Mutlu, O. & Mangul, S. (2021) Technology dictates algorithms: recent developments in read alignment. Genome Biology, 22 (1), 249. https://doi.org/10.1186/s13059-021-02443-7
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215 (3), 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
- Bolyen, E., Rideout, J.R., Dillon, M.R., Bokulich, N.A., Abnet, C.C., Al-Ghalith, G.A., Alexander, H., Alm, E.J., Arumugam, M., Asnicar, F., Bai, Y., Bisanz, J.E., Bittinger, K., Brejnrod, A., Brislawn, C.J., Brown, C.T., Callahan, B.J., Caraballo-Rodríguez, A.M., Chase, J., Cope, E.K., Da Silva, R., Diener, C., Dorrestein, P.C., Douglas, G.M., Durall, D.M., Duvallet, C., Edwardson, C.F., Ernst, M., Estaki, M., Fouquier, J., Gauglitz, J.M., Gibbons, S.M., Gibson, D.L., Gonzalez, A., Gorlick, K., Guo, J., Hillmann, B., Holmes, S., Holste, H., Huttenhower, C., Huttley, G.A., Janssen, S., Jarmusch, A.K., Jiang, L., Kaehler, B.D., Kang, K.B., Keefe, C.R., Keim, P., Kelley, S.T., Knights, D., Koester, I., Kosciolek, T., Kreps, J., Langille, M.G.I., Lee, J., Ley, R., Liu, Y.-X., Loftfield, E., Lozupone, C., Maher, M., Marotz, C., Martin, B.D., McDonald, D., McIver, L.J., Melnik, A.V., Metcalf, J.L., Morgan, S.C., Morton, J.T., Naimey, A.T., Navas-Molina, J.A., Nothias, L.F., Orchanian, S.B., Pearson, T., Peoples, S.L., Petras, D., Preuss, M.L., Pruesse, E., Rasmussen, L.B., Rivers, A., Robeson, M.S. II, Rosenthal, P., Segata, N., Shaffer, M., Shiffer, A., Sinha, R., Song, S.J., Spear, J.R., Swafford, A.D., Thompson, L.R., Torres, P.J., Trinh, P., Tripathi, A., Turnbaugh, P.J., Ul-Hasan, S., van der Hooft, J.J.J., Vargas, F., Vázquez-Baeza, Y., Vogtmann, E., von Hippel, M., Walters, W., Wan, Y., Wang, M., Warren, J., Weber, K.C., Williamson, C.H.D., Willis, A.D., Xu, Z.Z., Zaneveld, J.R., Zhang, Y., Zhu, Q., Knight, R. & Caporaso, J.G. (2019) Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology, 37, 852–857. https://doi.org/10.1038/s41587-019-0209-9
- Boyer, F., Mercier, C., Bonin, A., Le Bras, Y., Taberlet, P. & Coissac, E. (2016) OBITools: A UNIX-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16, 176–182. https://doi.org/10.1111/1755-0998.12428
- Buchfink, B., Xie, C. & Huson, D.H. (2015) Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12 (1), 59–60. https://doi.org/10.1038/nmeth.3176
- Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. & Madden, T.L. (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10, 421. https://doi.org/10.1186/1471-2105-10-421
- Chen, C., Chen, H., Zhang, Y., Thomas, H.R., Frank, M.H., He, Y. & Xia, R. (2020) TBtools: An integrative toolkit developed for interactive analyses of big biological data. Molecular Plant, 13 (8), 1194–1202. https://doi.org/10.1016/j.molp.2020.06.009
- Cock, P.J.A., Chilton, J.M., Grüning, B., Johnson, J.E. & Soranzo, N. (2015) NCBI BLAST+ integrated into Galaxy. GigaScience, 4 (1), 39. https://doi.org/10.1186/s13742-015-0080-7
- Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B. & de Hoon, M.J.L. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25 (11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163
- Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M. & Robles, M. (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 21 (18), 3674–3676. https://doi.org/10.1093/bioinformatics/bti610
- Dietz, L., Eberle, J., Mayer, C., Kukowka, S., Bohacz, C., Baur, H., Espeland, M., Huber, B.A., Hutter, C., Mengual, X., Peters, R.S., Vences, M., Wesener, T., Willmott, K., Misof, B., Niehuis, O. & Ahrens, D. (2023) Standardized nuclear markers improve and homogenize species delimitation in Metazoa. Methods in Ecology and Evolution, 14, 543–555. https://doi.org/10.1111/2041-210X.14041
- Dowd, S.E., Zaragoza, J., Rodriguez, J.R., Oliver, M.J. & Payton, P.R. (2005) Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST). BMC Bioinformatics, 6, 93. https://doi.org/10.1186/1471-2105-6-93
- Du, Z., Wu, Q., Wang, T., Chen, D., Huang, X., Yang, W. & Luo, W. (2020) BlastGUI: A Python-based cross-platform local BLAST visualization software. Molecular Informatics, 39, e1900120. https://doi.org/10.1002/minf.201900120
- Dufresnes, C., Brelsford, A., Jeffries, D.L., Mazepa, G., Suchan, T., Canestrelli, D., Nicieza, A., Fumagalli, L., Dubey, S., Martínez-Solano, I., Litvinchuk, S.N., Vences, M., Perrin, N. & Crochet, P.-A. (2021) Mass of genes rather than master genes underlie the genomic architecture of amphibian speciation. Proceedings of the National Academy of Sciences of the U.S.A., 118, e2103963118. https://doi.org/10.1073/pnas.2103963118
- Edwards, S.V., Liu, L. & Pearl, D.K. (2016) Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Molecular Phylogenetics and Evolution, 94 (Pt A), 447–462. https://doi.org/10.1016/j.ympev.2015.10.027
- Federhen, S. (2012) The NCBI Taxonomy database. Nucleic Acids Research, 40, D136–D143. https://doi.org/10.1093/nar/gkr1178
- Ferrari, G., Esselens, L., Hart, M.L., Janssens, S., Kidner, C., Mascarello, M., Peñalba, J.V., Pezzini, F., von Rintelen, T., Sonet, G., Vangestel, C., Virgilio, M. & Hollingsworth, P.M. (2023) Developing the protocol infrastructure for DNA sequencing natural history collections. Biodiversity Data Journal, 11, e102317. https://doi.org/10.3897/BDJ.11.e102317
- Fujita, M.K., Leaché, A.D., Burbrink, F.T., McGuire, J.A. & Moritz, C. (2012) Coalescent-based species delimitation in an integrative taxonomy. Trends in Ecology & Evolution, 27 (9), 480–488. https://doi.org/10.1016/j.tree.2012.04.012
- Gotea, V., Veeramachaneni, V. & Makalowski, W. (2003) Mastering seeds for genomic size nucleotide BLAST searches. Nucleic Acids Research, 31 (23), 6935–6941. https://doi.org/10.1093/nar/gkg886
- He, J., Dai, X. & Zhao, X. (2007) PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results. BMC Bioinformatics, 8, 53.
- Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K., von Haeseler, A. & Jermiin, L.S. (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods, 14, 587. https://doi.org/10.1038/nmeth.4285
- Kapun, M., Schwentner, M., Haring, E., Akkari, N., Kroh, A., Kruckenhauser, L., Palandačić, A. & Vohland, K. (2025) Museomics, the extended specimen and collectomics – how to frame and name the diversity of information linked to specimens in natural history collections. Natural History Collections and Museomics, 2, 1–21. https://doi.org/10.3897/nhcm.2.161331
- Karbstein, K., Kösters, L., Hodač, L., Hofmann, M., Hörandl, E., Tomasello, S., Wagner, N.D., Emerson, B.C., Albach, D.C., Scheu, S., Bradler, S., de Vries, J., Irisarri, I., Li, H., Soltis, P., Mäder, P. & Wäldchen, J. (2024) Species delimitation 4.0: integrative taxonomy meets artificial intelligence. Trends in Ecology & Evolution, 39 (8), 771–784. https://doi.org/10.1016/j.tree.2023.11.002
- Kent, W.J. (2002) BLAT—the BLAST-like alignment tool. Genome Research, 12 (4), 656–664. https://doi.org/10.1101/gr.229202
- Kerfeld, C.A. & Scott, K.M. (2011) Using BLAST to teach “E-value-tionary” concepts. PLoS Biology, 9, e1001014. https://doi.org/10.1371/journal.pbio.1001014
- Lalueza-Fox, C. (2022) Museomics. Current Biology, 32 (21), R1214–R1215. https://doi.org/10.1016/j.cub.2022.09.019
- Letsch, H., Greve, C., Hundsdoerfer, A.K., Irisarri, I., Moore, J.M., Espeland, M., Wanke, S., Arifin, U., Blom, M.P.K., Corrales, C., Donath, A., Fritz, U., Köhler, G., Kück, P., Lemer, S., Mengual, X., Salas, N.M., Meusemann, K., Palandačić, A., Printzen, C., Sigwart, J.D., Silva-Brandão, K.L., Simões, M., Stange, M., Suh, A., Szucsich, N., Tilic, E., Töpfer, T., Böhne, A., Janke, A. & Pauls, S.U. (2025) Type genomics: a framework for integrating genomic data into biodiversity and taxonomic research. Systematic Biology, 74, 1029–1044. https://doi.org/10.1093/sysbio/syaf040
- Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 1, 10–12. https://doi.org/10.14806/ej.17.1.200
- Minh, B.Q., Nguyen, M.A.T. & von Haeseler, A. (2013) Ultrafast approximation for phylogenetic bootstrap. Molecular Biology and Evolution, 30, 1188–1195. https://doi.org/10.1093/molbev/mst024
- Miralles, A., Bruy, T., Wolcott, K., Scherz, M.D., Begerow, D., Beszteri, B., Bonkowski, M., Felden, J., Gemeinholzer, B., Glaw, F., Glöckner, F.O., Hawlitschek, O., Kostadinov, I., Nattkemper, T.W., Printzen, C., Renz, J., Rybalka, N., Stadler, M., Weibulat, T., Wilke, T., Renner, S.S. & Vences, M. (2020) Repositories for taxonomic data: where we are and what is missing. Systematic Biology, 69, 1231–1253. https://doi.org/10.1093/sysbio/syaa026
- Mohanty, J.N., Sahoo, S. & Mishra, P. (2022) NBLAST: a graphical user interface-based two-way BLAST software with a dot plot viewer. Genomics & Informatics, 20 (3), e40. https://doi.org/10.5808/gi.21075
- National Center for Biotechnology Information. (n.d.) Developer information – BLAST help. Available from: https://blast.ncbi.nlm.nih.gov/doc/blast-help/developerinfo.html (accessed 17 August 2025).
- Neumann, R.S., Kumar, S., Haverkamp, T.H. & Shalchian-Tabrizi, K (2014) BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data. BMC Bioinformatics, 15, 128. https://doi.org/10.1186/1471-2105-15-128
- Newell, P.D., Fricker, A.D., Roco, C.A., Chandrangsu, P. & Merkel, S.M. (2013) A small-group activity introducing the use and interpretation of BLAST. Journal of Microbiology & Biology Education, 14, 238–243. https://doi.org/10.1128/jmbe.v14i2.637
- Nguyen, L.T., Schmidt, H.A., von Haeseler, A. & Minh, B.Q. (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32, 268–274. https://doi.org/10.1093/molbev/msu300
- Okonechnikov, K., Golosova, O., Fursov, M. & UGENE team. (2012) Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics, 28 (8), 1166–1167. https://doi.org/10.1093/bioinformatics/bts091
- Page, M., MacLean, D. & Schudoma, C. (2016) blastjs: a BLAST+ wrapper for Node.js. BMC Research Notes, 9, 130. https://doi.org/10.1186/s13104-016-1938-1
- Paijmans, J.L., Baleka, S., Henneberger, K., Taron, U.H., Trinks, A., Westbury, M.V. & Barlow, A. (2017) Sequencing single-stranded libraries on the Illumina NextSeq 500 platform. arXiv preprint, arXiv:1711.11004.
- Priyam, A., Woodcroft, B.J., Rai, V., Moghul, I., Munagala, A., Ter, F., Chowdhary, H., Pieniak, I., Maynard, L.J., Gibbins, M.A., Moon, H., Davis-Richardson, A., Uludag, M., Watson-Haigh, N.S., Challis, R., Nakamura, H., Favreau, E., Gómez, E.A., Pluskal, T., Leonard, G., Rumpf, W. & Wurm, Y. (2019) Sequenceserver: A modern graphical user interface for custom BLAST databases. Molecular Biology and Evolution, 36 (12), 2922–2924. https://doi.org/10.1093/molbev/msz185
- Rancilhac, L., Bruy, T., Scherz, M.D., Almeida Pereira, E., Preick, M., Straube, N., Lyra, M.L., Ohler, A., Streicher, J.W., Andreone, F., Crottini, A., Hutter, C.R., Randrianantoandro, J.C., Rakotoarison, A., Glaw, F., Hofreiter, M. & Vences, M. (2020) Target-enriched DNA sequencing from historical type material enables a partial revision of the Madagascar giant stream frogs (genus Mantidactylus). Journal of Natural History, 54, 87–118. https://doi.org/10.1080/00222933.2020.1748243
- Rannala, B. & Yang, Z. (2020) Species delimitation. In: Scornavacca, C., Delsuc, F. & Galtier, N. (Eds.), Phylogenetics in the Genomic Era. Ch. 5.5, pp. 5.5:1–5.5:18.
- Renner, S.S., Scherz, M.D., Schoch, C.L., Gottschling, M. & Vences, M. (2024) Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains. Systematic Biology, 73, 486–494. https://doi.org/10.1093/sysbio/syae009
- Rodríguez, A., Burgon, J.D., Lyra, M., Irisarri, I., Baurain, D., Blaustein, L., Göçmen, B., Künzel, S., Mable, B.K., Nolte, A.W., Veith, M., Steinfartz, S., Elmer, K.R., Philippe, H. & Vences, M. (2017) Inferring the shallow phylogeny of true salamanders (Salamandra) by multiple phylogenomic approaches. Molecular Phylogenetics and Evolution, 115, 16–26. https://doi.org/10.1016/j.ympev.2017.07.009
- Roure, B., Rodriguez-Ezpeleta, N. & Philippe, H. (2007) SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evolutionary Biology, 7 (Suppl 1), S2. https://doi.org/10.1186/1471-2148-7-S1-S2
- Santiago-Sotelo, P. & Ramirez-Prado, J.H. (2021) prfectBLAST: a platform-independent portable front end for the command terminal BLAST+ stand-alone suite. Biotechniques, 53, 299–300.
- Salles, M.M.A. & Domingos, F. (2025) Towards the next generation of species delimitation methods: an overview of machine learning applications. Molecular Phylogenetics and Evolution, 210, 108368. https://doi.org/10.1016/j.ympev.2025.108368
- Sayers, E. (2022) A general introduction to the E-utilities. Available from: https://www.ncbi.nlm.nih.gov/books/NBK25497/#chap ter4.Usage_Guidelines (accessed 17 August 2025).
- Scherz, M.D., Rasolonjatovo, S.M., Köhler, J., Rancilhac, L., Rakotoarison, A., Raselimanana, A.P., Ohler, A., Preick, M., Hofreiter, M., Glaw, F. & Vences, M. (2020) ‘Barcode fishing’ for archival DNA from historical type material overcomes taxonomic hurdles, enabling the description of a new frog species. Scientific Reports, 10 (1), 19109. https://doi.org/10.1038/s41598-020-75431-9
- Schmid, S., Straube, N., Albouy, C., Delling, B., Maclaine, J., Matschiner, M., Møller, P.R., Nocita, A., Palandačić, A., Rüber, L., Sonnewald, M., Alvarez, N., Manel, S. & Pellissier, L. (2025) Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring. BioScience, 75 (12), 1083–1095. https://doi.org/10.1093/biosci/biaf140
- Singhal, S., Leaché, A.D., Fujita, M.K., Cadena, C.D. & Zapata, F. (2025) A genomic perspective on species delimitation. Annual Review of Ecology, Evolution, and Systematics, 56, 467–489. https://doi.org/10.1146/annurev-ecolsys-102723-055311
- Steinegger, M. & Söding, J. (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35, 1026–1028. https://doi.org/10.1038/nbt.3988
- Straube, N., Lyra, M.L., Paijmans, J.L.A., Preick, M., Basler, N., Penner, J., Rödel, M.-O., Westbury, M.V., Haddad, C.F.B., Barlow, A. & Hofreiter, M. (2021) Successful application of ancient DNA extraction and library construction protocols to museum wet collection specimens. Molecular Ecology Resources, 21, 2299–2315. https://doi.org/10.1111/1755-0998.13433
- Talamantes-Becerra, B., Carling, J. & Georges, A. (2021) omicR: A tool to facilitate BLASTn alignments for sequence data. SoftwareX, 14, 100702. https://doi.org/10.1016/j.softx.2021.100702
- Unger, S. & Rollins, M. (2022) Bioinformatics is a BLAST: Engaging first-year biology students on campus biodiversity using DNA barcoding. CourseSource, 9, 32. https://doi.org/10.24918/cs.2022.32
- Vences, M., Miralles, A., Brouillet, S., Ducasse, J., Fedosov, A., Kharchev, V., Kostadinov, I., Kumari, S., Patmanidis, S., Scherz, M.D., Puillandre, N. & Renner, S.S. (2021) iTaxoTools 0.1: kickstarting a specimen-based software toolkit for taxonomists. Megataxa, 6, 77–92. https://doi.org/10.11646/megataxa.6.2.1
- Vences, M., Patmanidis, S., Fedosov, A., Miralles, A. & Puillandre, N. (2024) iTaxoTools 1.0: improved DNA barcode exploration with TaxI2. In: DeSalle, R. (Ed.), DNA barcoding: methods and protocols. Methods in Molecular Biology. Vol. 2744. Humana, New York, pp. 281–296. https://doi.org/10.1007/978-1-0716-3581-0_18
- Vences, M., Patmanidis, S., Kharchev, V. & Renner, S.S. (2022) Concatenator, a user-friendly program to concatenate DNA sequences, implementing graphical user interfaces for MAFFT and FastTree. Bioinformatics Advances, 2, vbac050. https://doi.org/10.1093/bioadv/vbac050
- Wood, D.E. & Salzberg, S.L. (2014) Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15, R46. https://doi.org/10.1186/gb-2014-15-3-r46
