Skip to main content Skip to main navigation menu Skip to site footer
Review
Published: 2021-07-23

iTaxoTools 0.1: Kickstarting a specimen-based software toolkit for taxonomists

Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany
Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles 57 rue Cuvier, CP 50, 75005 Paris, France
Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles 57 rue Cuvier, CP 50, 75005 Paris, France
Independent researcher, 49 rue Eugène Carrière, 75018 Paris, France
A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Leninsky prospect 33, 119071 Moscow, Russian Federation
Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany
GFBio - Gesellschaft für Biologische Daten e.V., c/o Research II, Campus Ring 1, 28759 Bremen, Germany
Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany
School of Electrical and Computer Engineering, National Technical University of Athens, Iroon Polytechniou St 9, 15780 Athens, Greece
Faculty of Mathematics and Natural Sciences, Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany
Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles 57 rue Cuvier, CP 50, 75005 Paris, France
Department of Biology, Washington University, 1 Brookings Drive, Saint Louis, MO 63130, USA
integrative taxonomy molecular diagnosis species delimitation ABGD PTP GMYC TR2 DELINEATE Limes.

Abstract

While powerful and user-friendly software suites exist for phylogenetics, and an impressive cybertaxomic infrastructure of online species databases has been set up in the past two decades, software targeted explicitly at facilitating alpha-taxonomic work, i.e., delimiting and diagnosing species, is still in its infancy. Here we present a project to develop a bioinformatic toolkit for taxonomy, based on open-source Python code, including tools focusing on species delimitation and diagnosis and centered around specimen identifiers. At the core of iTaxoTools is user-friendliness, with numerous autocorrect options for data files and with intuitive graphical user interfaces. Assembled standalone executables for all tools or a suite of tools with a launcher window will be distributed for Windows, Linux, and Mac OS systems, and in the future also implemented on a web server. The initial version (iTaxoTools 0.1) distributed with this paper (https://github.com/iTaxoTools/iTaxoTools-Executables) contains graphical user interface (GUI) versions of six species delimitation programs (ABGD, ASAP, DELINEATE, GMYC, PTP, tr2) and a simple threshold-clustering delimitation tool. There are also new Python implementations of existing algorithms, including tools to compute pairwise DNA distances, ultrametric time trees based on non-parametric rate smoothing, species-diagnostic nucleotide positions, and standard morphometric analyses. Other utilities convert among different formats of molecular sequences, geographical coordinates, and units; merge, split and prune sequence files, tables and species partition files; and perform simple statistical tests. As a future perspective, we envisage iTaxoTools to become part of a bioinformatic pipeline for next-generation taxonomy that accelerates the inventory of life while maintaining high-quality species hypotheses. The open source code and binaries of all tools are available from Github (https://github.com/iTaxoTools) and further information from the website (http://itaxotools.org)

References

  1. Anslan, S., Bahram, M., Hiiesalu, I. & Tedersoo, L. (2017) PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources, 17, e234–e240. https://doi.org/10.1111/1755-0998.12692
  2. Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., De Maio, N., Matschiner, M., Mendes, F.K., Müller, N.F., Ogilvie, H.A., du Plessis, L., Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., Suchard, M.A., Wu, C.H., Xie, D., Zhang, C., Stadler, T. & Drummond, A.J (2019) BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15, e1006650. https://doi.org/10.1371/journal.pcbi.1006650
  3. Brown, J.L., Bennett, J.R. & French, C.M. (2017) SDMtoolbox 2.0: the next generation Python-based GIS toolkit for landscape genetic, biogeographic and species distribution model analyses. PeerJ, 5, e4095. https://doi.org/10.7717/peerj.4095
  4. Calvo-Flores, M.D., Contreras, W.F., Galindo, E.G. & Pérez-Pérez, R. (2006) XKey: A tool for the generation of identification keys. Expert Systems with Applications, 30, 337–351. https://doi.org/10.1016/j.eswa.2005.07.034
  5. Clark, J.Y. (2003) Artificial neural networks for species identification by taxonomists. Biosystems, 72, 131–147. https://doi.org/10.1016/S0303-2647(03)00139-4
  6. Coleman, C.O., Lowry, J.K. & Macfarlane, T. (2010) DELTA for Beginners: An introduction into the taxonomy software package DELTA. ZooKeys, 45, 1–75. https://doi.org/10.3897/zookeys.45.263.
  7. Dallwitz, M.J. (1974) A flexible computer program for generating identification keys. Systematic Zoology, 23, 50–57. https://doi.org/10.1093/sysbio/23.1.50
  8. Dallwitz, M.J. (1980) A general system for coding taxonomic descriptions. Taxon, 29, 41–46. https://doi.org/10.2307/1219595
  9. Dayrat, B. (2005) Toward integrative taxonomy. Biological Journal of the Linnean Society, 85, 407–415. https://doi.org/10.1111/j.1095-8312.2005.00503.x
  10. Ducasse, J., Ung, V., Lecointre, G. & Miralles, A. (2020) LIMES: a tool for comparing species partition. Bioinformatics, 36, 2282–2283. https://doi.org/10.1093/bioinformatics/btz911
  11. Fedosov, A., Achaz, G. & Puillandre, N. (2019) Revisiting use of DNA characters in taxonomy with MolD - a tree independent algorithm to retrieve diagnostic nucleotide characters from monolocus datasets. bioRxiv, 838151. https://doi.org/10.1101/838151
  12. Flot, J.F., Couloux, A. & Tillier, S. (2010) Haplowebs as a graphical tool for delimiting species: a revival of Doyle’s “field for recombination” approach and its application to the coral genus Pocillopora in Clipperton. BMC Evolutionary Biology, 10, 372. https://doi.org/10.1186/1471-2148-10-372
  13. Fujisawa, T., Aswad, A. & Barraclough, T.G. (2016) A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets. Systematic Biology, 65, 759–771. https://doi.org/10.1093/sysbio/syw028
  14. Fujisawa, T. & Barraclough, T.G. (2013) Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets. Systematic Biology, 62, 707–724. https://doi.org/10.1093/sysbio/syt033
  15. Gill, P.E., Murray, W. & Wright, M.H. (1981) Practical Optimization. Academic Press, New York, 401 pp.
  16. Hütter, T., Ganser, M.H., Kocher, M., Halkic, M., Agatha, S. & Augsten, N. (2020) DeSignate: detecting signature characters in gene sequence alignments for taxon diagnoses. BMC Bioinformatics, 21, 151. https://doi.org/10.1186/s12859-020-3498-6
  17. Katoh, K. & Standley, D.M. (2013) MAFFT Multiple Sequence Alignment Software Version 7: improvements in performance and usability. Molecular Biology and Evolution, 30, 772–780. https://doi.org/10.1093/molbev/mst010
  18. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. (2018) MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Molecular Biology and Evolution, 35, 1547–1549. https://doi.org/10.1093/molbev/msy096
  19. Lanfear, R., Frandsen, P.B., Wright, A.M., Senfeld, T. & Calcott, B. (2016) PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution, 34, 772–773. https://doi.org/10.1093/molbev/msw260
  20. Lendemer, J., Thiers, B., Monfils, A.K., Zaspel, J., Ellwood, E.R., Bentley, A., LeVan, K., Bates, J., Jennings, D., Contreras, D., Lagomarsino, L., Mabee, P., Ford, L.S., Guralnick, R., Gropp, R.E., Revelez, M., Cobb, N., Seltmann, K. & Aime, M.C. (2020) The extended specimen network: a strategy to enhance US biodiversity collections, promote research and education. BioScience, 70, 23–30. https://doi.org/10.1093/biosci/biz165
  21. Lipman, D.J. & Pearson, W.R. (1985) Rapid and sensitive protein similarity searches. Science, 227, 1435–1441. https://doi.org/10.1126/science.2983426
  22. MacLeod, N. (Ed.) (2008) Automated Taxon Identification in Systematics: Theory, Approaches and Applications. CRC Press, Boca Raton FL, USA, 350 pp.
  23. Maddison, D.R., Swofford, D.L. & Maddison, W.P. (1997) Nexus: an extensible file format for systematic information. Systematic Biology, 46, 590–621. https://doi.org/10.1093/sysbio/46.4.590
  24. Merckelbach, L.M. & Borges, L.M.S. (2020) Make every species count: fastachar software for rapid determination of molecular diagnostic characters to describe species. Molecular Ecology Resources, 20, 1761–1768. https://doi.org/10.1111/1755-0998.13222
  25. Meier, R., Kwong, S., Vaidya, G. & Ng, P.K.L. (2006) DNA Barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Systematic Biology, 55, 715–728. https://doi.org/10.1080/10635150600969864
  26. Miralles, A., Bruy, T., Wolcott, K., Scherz, M.D., Begerow, D., Beszteri, B., Bonkowski, B., Felden. J., Gemeinholzer, B., Glaw, F. , Glöckner, F.O., Hawlitschek, O., Kostadinov, I., Nattkemper, T.W., Printzen, C., Renz, J., Rybalka, N., Stadler, M., Weibulat, T., Wilke, T., Renner, S.S. & Vences, M. (2020) Repositories for taxonomic data: Where we are and what is missing. Systematic Biology, 69, 1231–1253. https://doi.org/10.1093/sysbio/syaa026
  27. Miralles, A. & Vences, M. (2013) New metrics for comparison of taxonomies reveal striking discrepancies among species delimitation methods in Madascincus lizards. PLoS ONE, 8, e68242. https://doi.org/10.1371/journal.pone.0068242
  28. Miralles, A., Ducasse, J., Brouillet, S., Flouri, T., Fujisawa, T., Kapli, P., Knowles, L.L., Kumari, S., Stamatakis, A., Sukumaran, J., Lutteropp, S., Vences, M. & Puillandre, N. (2021) SPART, a versatile and standardized data exchange format for species partition information. BioRxiv. https://doi.org/10.1101/2021.03.22.435428
  29. Padial, J.M., Miralles, A., De la Riva, I. & Vences, M. (2010) The integrative future of taxonomy. Frontiers in Zoology, 7, e16. https://doi.org/10.1186/1742-9994-7-16
  30. Paradis, E., Claude, J. & Strimmer, K. (2004) APE: Analyses of Phylogenetics and Evolution in R language, Bioinformatics, 20, 289–290. https://doi.org/10.1093/bioinformatics/btg412
  31. Pons, J., Barraclough, T.G., Gomez-Zurita, J., Cardoso, A., Duran, D.P., Hazell, S., Kamoun, S., Sumlin, W.D. & Vogler, A.P. (2006) Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology, 55, 595–609. https://doi.org/10.1080/10635150600852011
  32. Powell, M.J.D. (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal, 7, 155–162. https://doi.org/10.1093/comjnl/7.2.155
  33. Press, W.H., Flannery, B.P., Teukolsky, S.A. & Vetterling, W.T. (1992) Numerical Recipes in C. Cambridge University Press, New York. 2nd ed, 1018 pp.
  34. Puillandre, N., Brouillet, S. & Achaz, G. (2021) ASAP: assemble species by automatic partitioning. Molecular Ecology Resources, 21(2), 609–620 https://doi.org/10.1111/1755-0998.13281
  35. Puillandre, N., Lambert, A., Brouillet, S. & Achaz, G. (2012) ABGD, Automatic Barcode Gap Discovery for primary species delimitation. Molecular Ecology, 21, 1864–1877. https://doi.org/10.1111/j.1365-294x.2011.05239.x
  36. Renner, S.S. (2016) A return to Linnaeus’s focus on diagnosis, not description: The use of DNA characters in the formal naming of species. Systematic Biology, 65, 1085–1095. https://doi.org/10.1093/sysbio/syw032
  37. Sanderson, M.J. (1997) A non-parametric approach to estimating divergence times in the absence of rate constancy. Molecular Biology and Evolution, 14, 1218–1231. https://doi.org/10.1093/oxfordjournals.molbev.a025731
  38. Sanderson, M.J. (2003) r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19, 301–302. https://doi.org/10.1093/bioinformatics/19.2.301
  39. Sarkar, I.N., Planet, P.J. & Desalle, R. (2008) CAOS software for use in character-based DNA barcoding. Molecular Ecology Resources, 8, 1256–1259. https://doi.org/10.1111/j.1755-0998.2008.02235.x
  40. Solís-Lemus, C., Knowles, L.L. & Ané, C. (2015) Bayesian species delimitation combining multiple genes and traits in a unified framework. Evolution, 69, 492–507. https://doi.org/10.1111/evo.12582
  41. Steinke, D., Salzburger, W., Vences, M. & Meyer, A. (2005) TaxI - A software tool for DNA barcoding using distance methods. Philosophical Transactions of the Royal Society London, Series B, 360, 1975–1980. https://doi.org/10.1098/rstb.2005.1729
  42. Sukumaran, J. & Knowles, L.L. (2017) Multispecies coalescent delimits structure, not species. Proceedings of the National Academy of the U.S.A., 114, 1607–1612. https://doi.org/10.1073/pnas.1607921114
  43. Sukumaran, J., Holder, T.M. & Knowles, L.L. (2020) Incorporating the speciation process into species delimitation. https://github.com/jeetsukumaran/delineate.
  44. Sukumaran, J. & Holder, M.T. (2010) DendroPy: A Python library for phylogenetic computing. Bioinformatics, 26, 1569–1571. https://doi.org/10.1093/bioinformatics/btq228
  45. Swofford, D.L. (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.
  46. Tofilski, A. (2018) DKey software for editing and browsing dichotomous keys. ZooKeys, 735, 131–140. https://doi.org/10.3897/zookeys.735.21412
  47. Vignes Lebbe, R., Chesselet, P. & Diep Thi, M.H. (2015) Xper3: new tools for collaborating, training and transmitting knowledge on botanical phenotypes. In: Rakotoarisoa, N.R., Blackmore, S., Riéra, B. (Eds) Botanists of the 21st Century. UNESCO, Paris, 11 pp.
  48. Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, Ý., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P. & SciPy 1.0 Contributors (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272. https://doi.org/10.1038/s41592-019-0686-2
  49. Yang, Z. & Rannala, B. (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution, 23, 212–226. https://doi.org/10.1093/molbev/msj024
  50. Zhang, J., Kapli, P., Pavlidis, P. & Stamatakis, A. (2013) A general species delimitation method with applications to phylogenetic placements. Bioinformatics, 29, 2869–2876. https://doi.org/10.1093/bioinformatics/btt499
  51. Zhang, X.-B., Chen, X.-X. & Cheng, J.-A. (2006) Lucid Phoenix: A tool for building and deploying interactive, multimedia keys through internet. Entomotaxonomia, 28, 231–234.