BlueTools scientists have developed a new cloud-based workflow called the Metagenomics-Toolkit that uses machine learning to optimise computational efficiency. Developed by our partners at Bielefeld University, the toolkit streamlines metagenomic data analysis, making it more scalable, cost-effective, and reproducible.
The Metagenomics-Toolkit includes all essential features of a metagenome workflow — quality control, assembly, binning, and annotation — while also offering advanced capabilities. These include plasmid identification through multiple approaches, recovery of unassembled microbial community members, and insights into microbial interdependencies via dereplication, co-occurrence analysis, and genome-scale metabolic modeling. Additionally, the toolkit integrates a machine learning-optimized assembly step that dynamically adjusts peak RAM allocation for metagenome assemblers.
Metagenomic studies generate vast amounts of sequencing data, demanding substantial computational power. Traditional workflows often struggle with resource allocation, leading to inefficiencies and high costs. The Metagenomics-Toolkit addresses this challenge with a machine learning-based system that accurately predicts RAM requirements for metagenome assemblers, minimising unnecessary resource usage and reducing reliance on high-memory hardware.
While the toolkit can run on individual user workstations, it is specifically optimised for efficient execution in cloud-based clusters.
To showcase its capabilities, researchers applied the toolkit to 757 metagenomic datasets from untreated sewage samples worldwide. The goal was to identify microbial species consistently present across global sewage samples, for an investigation of a possible sewage core microbiome. This demonstration highlights the toolkit’s ability to support large-scale microbial surveillance and paves the way for future applications such as monitoring antimicrobial resistance genes (AMR) and tracking pathogenic organisms on a global scale.
By reducing computational costs and increasing reproducibility, the Metagenomics-Toolkit enables more extensive microbiome studies. Its open-source availability on GitHub ensures that researchers worldwide can implement it for various applications, from environmental monitoring to public health research.
Read the full article: https://www.biorxiv.org/content/10.1101/2024.10.22.619569v2