Abstract
While powerful and user-friendly software suites exist for phylogenetics, and an impressive cybertaxomic infrastructure of online species databases has been set up in the past two decades, software targeted explicitly at facilitating alpha-taxonomic work, i.e., delimiting and diagnosing species, is still in its infancy. Here we present a project to develop a bioinformatic toolkit for taxonomy, based on open-source Python code, including tools focusing on species delimitation and diagnosis and centered around specimen identifiers. At the core of iTaxoTools is user-friendliness, with numerous autocorrect options for data files and with intuitive graphical user interfaces. Assembled standalone executables for all tools or a suite of tools with a launcher window will be distributed for Windows, Linux, and Mac OS systems, and in the future also implemented on a web server. The initial version (iTaxoTools 0.1) distributed with this paper (https://github.com/iTaxoTools/iTaxoTools-Executables) contains graphical user interface (GUI) versions of six species delimitation programs (ABGD, ASAP, DELINEATE, GMYC, PTP, tr2) and a simple threshold-clustering delimitation tool. There are also new Python implementations of existing algorithms, including tools to compute pairwise DNA distances, ultrametric time trees based on non-parametric rate smoothing, species-diagnostic nucleotide positions, and standard morphometric analyses. Other utilities convert among different formats of molecular sequences, geographical coordinates, and units; merge, split and prune sequence files, tables and species partition files; and perform simple statistical tests. As a future perspective, we envisage iTaxoTools to become part of a bioinformatic pipeline for next-generation taxonomy that accelerates the inventory of life while maintaining high-quality species hypotheses. The open source code and binaries of all tools are available from Github (https://github.com/iTaxoTools) and further information from the website (http://itaxotools.org)
References
- Anslan, S., Bahram, M., Hiiesalu, I. & Tedersoo, L. (2017) PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources, 17, e234–e240. https://doi.org/10.1111/1755-0998.12692
- Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., De Maio, N., Matschiner, M., Mendes, F.K., Müller, N.F., Ogilvie, H.A., du Plessis, L., Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., Suchard, M.A., Wu, C.H., Xie, D., Zhang, C., Stadler, T. & Drummond, A.J (2019) BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15, e1006650. https://doi.org/10.1371/journal.pcbi.1006650
- Brown, J.L., Bennett, J.R. & French, C.M. (2017) SDMtoolbox 2.0: the next generation Python-based GIS toolkit for landscape genetic, biogeographic and species distribution model analyses. PeerJ, 5, e4095. https://doi.org/10.7717/peerj.4095
- Calvo-Flores, M.D., Contreras, W.F., Galindo, E.G. & Pérez-Pérez, R. (2006) XKey: A tool for the generation of identification keys. Expert Systems with Applications, 30, 337–351. https://doi.org/10.1016/j.eswa.2005.07.034
- Clark, J.Y. (2003) Artificial neural networks for species identification by taxonomists. Biosystems, 72, 131–147. https://doi.org/10.1016/S0303-2647(03)00139-4
- Coleman, C.O., Lowry, J.K. & Macfarlane, T. (2010) DELTA for Beginners: An introduction into the taxonomy software package DELTA. ZooKeys, 45, 1–75. https://doi.org/10.3897/zookeys.45.263.
- Dallwitz, M.J. (1974) A flexible computer program for generating identification keys. Systematic Zoology, 23, 50–57. https://doi.org/10.1093/sysbio/23.1.50
- Dallwitz, M.J. (1980) A general system for coding taxonomic descriptions. Taxon, 29, 41–46. https://doi.org/10.2307/1219595
- Dayrat, B. (2005) Toward integrative taxonomy. Biological Journal of the Linnean Society, 85, 407–415. https://doi.org/10.1111/j.1095-8312.2005.00503.x
- Ducasse, J., Ung, V., Lecointre, G. & Miralles, A. (2020) LIMES: a tool for comparing species partition. Bioinformatics, 36, 2282–2283. https://doi.org/10.1093/bioinformatics/btz911
- Fedosov, A., Achaz, G. & Puillandre, N. (2019) Revisiting use of DNA characters in taxonomy with MolD - a tree independent algorithm to retrieve diagnostic nucleotide characters from monolocus datasets. bioRxiv, 838151. https://doi.org/10.1101/838151
- Flot, J.F., Couloux, A. & Tillier, S. (2010) Haplowebs as a graphical tool for delimiting species: a revival of Doyle’s “field for recombination” approach and its application to the coral genus Pocillopora in Clipperton. BMC Evolutionary Biology, 10, 372. https://doi.org/10.1186/1471-2148-10-372
- Fujisawa, T., Aswad, A. & Barraclough, T.G. (2016) A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets. Systematic Biology, 65, 759–771. https://doi.org/10.1093/sysbio/syw028
- Fujisawa, T. & Barraclough, T.G. (2013) Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets. Systematic Biology, 62, 707–724. https://doi.org/10.1093/sysbio/syt033
- Gill, P.E., Murray, W. & Wright, M.H. (1981) Practical Optimization. Academic Press, New York, 401 pp.
- Hütter, T., Ganser, M.H., Kocher, M., Halkic, M., Agatha, S. & Augsten, N. (2020) DeSignate: detecting signature characters in gene sequence alignments for taxon diagnoses. BMC Bioinformatics, 21, 151. https://doi.org/10.1186/s12859-020-3498-6
- Katoh, K. & Standley, D.M. (2013) MAFFT Multiple Sequence Alignment Software Version 7: improvements in performance and usability. Molecular Biology and Evolution, 30, 772–780. https://doi.org/10.1093/molbev/mst010
- Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. (2018) MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Molecular Biology and Evolution, 35, 1547–1549. https://doi.org/10.1093/molbev/msy096
- Lanfear, R., Frandsen, P.B., Wright, A.M., Senfeld, T. & Calcott, B. (2016) PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution, 34, 772–773. https://doi.org/10.1093/molbev/msw260
- Lendemer, J., Thiers, B., Monfils, A.K., Zaspel, J., Ellwood, E.R., Bentley, A., LeVan, K., Bates, J., Jennings, D., Contreras, D., Lagomarsino, L., Mabee, P., Ford, L.S., Guralnick, R., Gropp, R.E., Revelez, M., Cobb, N., Seltmann, K. & Aime, M.C. (2020) The extended specimen network: a strategy to enhance US biodiversity collections, promote research and education. BioScience, 70, 23–30. https://doi.org/10.1093/biosci/biz165
- Lipman, D.J. & Pearson, W.R. (1985) Rapid and sensitive protein similarity searches. Science, 227, 1435–1441. https://doi.org/10.1126/science.2983426
- MacLeod, N. (Ed.) (2008) Automated Taxon Identification in Systematics: Theory, Approaches and Applications. CRC Press, Boca Raton FL, USA, 350 pp.
- Maddison, D.R., Swofford, D.L. & Maddison, W.P. (1997) Nexus: an extensible file format for systematic information. Systematic Biology, 46, 590–621. https://doi.org/10.1093/sysbio/46.4.590
- Merckelbach, L.M. & Borges, L.M.S. (2020) Make every species count: fastachar software for rapid determination of molecular diagnostic characters to describe species. Molecular Ecology Resources, 20, 1761–1768. https://doi.org/10.1111/1755-0998.13222
- Meier, R., Kwong, S., Vaidya, G. & Ng, P.K.L. (2006) DNA Barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Systematic Biology, 55, 715–728. https://doi.org/10.1080/10635150600969864
- Miralles, A., Bruy, T., Wolcott, K., Scherz, M.D., Begerow, D., Beszteri, B., Bonkowski, B., Felden. J., Gemeinholzer, B., Glaw, F. , Glöckner, F.O., Hawlitschek, O., Kostadinov, I., Nattkemper, T.W., Printzen, C., Renz, J., Rybalka, N., Stadler, M., Weibulat, T., Wilke, T., Renner, S.S. & Vences, M. (2020) Repositories for taxonomic data: Where we are and what is missing. Systematic Biology, 69, 1231–1253. https://doi.org/10.1093/sysbio/syaa026
- Miralles, A. & Vences, M. (2013) New metrics for comparison of taxonomies reveal striking discrepancies among species delimitation methods in Madascincus lizards. PLoS ONE, 8, e68242. https://doi.org/10.1371/journal.pone.0068242
- Miralles, A., Ducasse, J., Brouillet, S., Flouri, T., Fujisawa, T., Kapli, P., Knowles, L.L., Kumari, S., Stamatakis, A., Sukumaran, J., Lutteropp, S., Vences, M. & Puillandre, N. (2021) SPART, a versatile and standardized data exchange format for species partition information. BioRxiv. https://doi.org/10.1101/2021.03.22.435428
- Padial, J.M., Miralles, A., De la Riva, I. & Vences, M. (2010) The integrative future of taxonomy. Frontiers in Zoology, 7, e16. https://doi.org/10.1186/1742-9994-7-16
- Paradis, E., Claude, J. & Strimmer, K. (2004) APE: Analyses of Phylogenetics and Evolution in R language, Bioinformatics, 20, 289–290. https://doi.org/10.1093/bioinformatics/btg412
- Pons, J., Barraclough, T.G., Gomez-Zurita, J., Cardoso, A., Duran, D.P., Hazell, S., Kamoun, S., Sumlin, W.D. & Vogler, A.P. (2006) Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology, 55, 595–609. https://doi.org/10.1080/10635150600852011
- Powell, M.J.D. (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal, 7, 155–162. https://doi.org/10.1093/comjnl/7.2.155
- Press, W.H., Flannery, B.P., Teukolsky, S.A. & Vetterling, W.T. (1992) Numerical Recipes in C. Cambridge University Press, New York. 2nd ed, 1018 pp.
- Puillandre, N., Brouillet, S. & Achaz, G. (2021) ASAP: assemble species by automatic partitioning. Molecular Ecology Resources, 21(2), 609–620 https://doi.org/10.1111/1755-0998.13281
- Puillandre, N., Lambert, A., Brouillet, S. & Achaz, G. (2012) ABGD, Automatic Barcode Gap Discovery for primary species delimitation. Molecular Ecology, 21, 1864–1877. https://doi.org/10.1111/j.1365-294x.2011.05239.x
- Renner, S.S. (2016) A return to Linnaeus’s focus on diagnosis, not description: The use of DNA characters in the formal naming of species. Systematic Biology, 65, 1085–1095. https://doi.org/10.1093/sysbio/syw032
- Sanderson, M.J. (1997) A non-parametric approach to estimating divergence times in the absence of rate constancy. Molecular Biology and Evolution, 14, 1218–1231. https://doi.org/10.1093/oxfordjournals.molbev.a025731
- Sanderson, M.J. (2003) r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19, 301–302. https://doi.org/10.1093/bioinformatics/19.2.301
- Sarkar, I.N., Planet, P.J. & Desalle, R. (2008) CAOS software for use in character-based DNA barcoding. Molecular Ecology Resources, 8, 1256–1259. https://doi.org/10.1111/j.1755-0998.2008.02235.x
- Solís-Lemus, C., Knowles, L.L. & Ané, C. (2015) Bayesian species delimitation combining multiple genes and traits in a unified framework. Evolution, 69, 492–507. https://doi.org/10.1111/evo.12582
- Steinke, D., Salzburger, W., Vences, M. & Meyer, A. (2005) TaxI - A software tool for DNA barcoding using distance methods. Philosophical Transactions of the Royal Society London, Series B, 360, 1975–1980. https://doi.org/10.1098/rstb.2005.1729
- Sukumaran, J. & Knowles, L.L. (2017) Multispecies coalescent delimits structure, not species. Proceedings of the National Academy of the U.S.A., 114, 1607–1612. https://doi.org/10.1073/pnas.1607921114
- Sukumaran, J., Holder, T.M. & Knowles, L.L. (2020) Incorporating the speciation process into species delimitation. https://github.com/jeetsukumaran/delineate.
- Sukumaran, J. & Holder, M.T. (2010) DendroPy: A Python library for phylogenetic computing. Bioinformatics, 26, 1569–1571. https://doi.org/10.1093/bioinformatics/btq228
- Swofford, D.L. (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.
- Tofilski, A. (2018) DKey software for editing and browsing dichotomous keys. ZooKeys, 735, 131–140. https://doi.org/10.3897/zookeys.735.21412
- Vignes Lebbe, R., Chesselet, P. & Diep Thi, M.H. (2015) Xper3: new tools for collaborating, training and transmitting knowledge on botanical phenotypes. In: Rakotoarisoa, N.R., Blackmore, S., Riéra, B. (Eds) Botanists of the 21st Century. UNESCO, Paris, 11 pp.
- Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, Ý., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P. & SciPy 1.0 Contributors (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272. https://doi.org/10.1038/s41592-019-0686-2
- Yang, Z. & Rannala, B. (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution, 23, 212–226. https://doi.org/10.1093/molbev/msj024
- Zhang, J., Kapli, P., Pavlidis, P. & Stamatakis, A. (2013) A general species delimitation method with applications to phylogenetic placements. Bioinformatics, 29, 2869–2876. https://doi.org/10.1093/bioinformatics/btt499
- Zhang, X.-B., Chen, X.-X. & Cheng, J.-A. (2006) Lucid Phoenix: A tool for building and deploying interactive, multimedia keys through internet. Entomotaxonomia, 28, 231–234.