COMMBAT DB and software

Version 1.1 - September 2025

COMMBAT is released under the AGPL-3 license


BGC selection



Matrix selection



Submit job



BGC selection



Matrix selection



Submit job



BGC selection



Matrix selection



Submit job



BGC selection



Matrix selection



Submit job



BGC selection



Matrix selection



Submit job



BGC selection



Matrix selection



Submit job




Contact us if you have questions or want to report a bug




COMMBAT software


The source code for the COMMBAT software is available at https://gitlab.uliege.be/Silvia.RibeiroMonteiro/commbat.git under the AGPL-3 license.

Download database


The formatted antiSMASH DB for the COMMBAT software can be downloaded here. Follow the instructions of the software README file to propoerly install and use the database.

Publications


If you have found the COMMBAT web service useful, please cite:
  • For the Meta-Analysis leading to the Methodology:
    • The Transcriptional Architecture of Bacterial Biosynthetic Gene Clusters
      Ribeiro Monteiro Silvia, Kerdel Yasmine, Gathot Julianne, Rigali Sébastien
      Journal of Natural Products (2025) doi: 10.1021/acs.jnatprod.5c00529
  • For the Methodology:
    • Enhanced prediction of expression control in bacterial biosynthetic gene clusters via genomic and functional data integration
      Ribeiro Monteiro Silvia, Rigali Sébastien
      Microbial Genomics (2025) doi: 10.1099/mgen.0.001512
  • For the Webtool and Software:
    • COMMBAT: A Web Platform for Exploring Expression Control of Biosynthetic Gene Clusters
      Ribeiro Monteiro Silvia, Rigolet Augustin, Jeunehomme Clément, Gathot Julianne, Kerdel Yasmine, Henry Matthias, Augustijn Hannah E, Medema Marnix M, Van Wezel Gilles P, Rigali Sébastien
      Manuscript under review (2026)

COMMBAT Help page


Welcome to the documentation of the COMMBAT (COnditions for Microbial Metabolite Biosynthesis Activated Transcription) framework dedicated to the prediction of transcription factor binding sites (TFBSs) within bacterial biosynthetic gene clusters (BGCs).

The help page contains the following sections:

WARNING: We recommend using Chrome, Edge or Firefox for this website. Certain functionalities may not work properly on Safari.

COMMBAT Workflow


COMMBAT provides users with an up-to-date collection of BGCs from the MIBiG and antiSMASH databases, together with ungapped position weight matrices (PWMs) sourced from four transcription factor binding site (TFBS) repositories. Users can also upload their own BGCs of interest and/or use their PWMs.

COMMBAT offers two complementary modes of analysis:

Application #1: predict BGCs potentially regulated by a specific transcription factor (TF).
Application #2: predict TFs most likely to regulate a given BGC.

The workflow consists of three main steps:

Step 1: Select or upload one or more BGCs depending on the COMMBAT application type. See documentation on the “BGC selection” section for further details.
Step 2: Select or upload/create one or more PWMs depending on the COMMBAT application type. See documentation on the “PWM selection” section for further details.
Step 3: Launch the COMMBAT job where the software will scan all selected BGCs with the selected PWM(s). The results will be provided within a few minutes to a few hours depending on the number of BGCs or PWMs selected. The result output page includes an interactive cloud plot and table, and a TFBS mapping viewer. The output results can be adjusted via a customization panel.

Workflow of the two types of COMMBAT applications. Application #1, prediction of all BGCs controlled by one TF; Application #2, prediction of all TFs potentially controlling the expression of one specific BGC. The result output page includes an Interactive Cloud Plot, a TFBS Mapping Viewer, and an Interactive Result Table.


BGC selection


For the COMMBAT option “Multiple BGCs with one matrix”, users can choose one, multiple, or all biosynthetic gene clusters (BGCs) depending on your analysis’ needs. Three types of repositories are available: MIBiG, antiSMASH DB, and Your BGC Collection.

MIBiG repository: Select specific BGCs by their reference IDs from the MIBiG repository or by the name of the producing organism(s). For larger-scale analyses, you can retrieve all BGCs from a bacterial group (from phylum down to species level) by entering the taxon name(s).


antiSMASH DB: This database contains over 250,000 bacterial BGCs from over 35,000 species (version 4). You may select up to 20 bacterial species by name or accession number (e.g. GCF_008931305.1). This limitation helps preserve computing resources and ensures faster analysis time.


Your BGC Collection: For the COMMBAT option “Multiple BGCs with one matrix” (A), users can upload the BGCs of a specific organism of their own collection directly by entering the antiSMASH job ID. When using the COMMBAT option “One BGC versus multiple matrices” (B), include both the antiSMASH job ID with the contig+region numbers of the specific BGC.

Matrix selection


For the COMMBAT option “Multiple BGCs with one matrix” users can choose a position weight matrix (PWM) for a transcription factor of interest from one of four databases: COMMBAT DB, RegPrecise, Prodoric, or Logomotif.


Alternatively, users can also create a custom PWM by providing sequences bound by your transcription factor in FASTA format, either by typing them in or uploading a file.


When using the COMMBAT option “One BGC versus multiple matrices” you have three ways to select PWMs from the databases:

By transcription factor name: Enter the name of your TF(s) of interest one by one.
By taxonomy: Retrieve all PWMs associated with a specific bacterial phylum (e.g., Streptomycetaceae).
By transcription factor family: Retrieve all PWMs belonging to a specific TF family (e.g., LacI family).

COMMBAT result interpretation


The output results page of COMMBAT provides both graphical visualizations and tabular outputs organized into four sections including, (A) Customization Panel, (B) Interactive Cloud Plot, (C) Interactive Result Table, and (D) TFBS Mapping Viewer. The type of cloud plot depends on the analysis setup: “Multiple BGCs with one matrix” or “One BGC versus multiple matrices”, and will also vary based on whether the selected or uploaded BGCs are all known (MIBiG repository) or include cryptic BGCs (antiSMASH DB and Your BGC collection).

Application #1a: “Multiple BGCs with one matrix” using MIBiG.

Overview of the four sections of the COMMBAT output results page using option “Multiple BGCs vs One Matrix” on the MIBiG database. ( A ) Customization Panel. Allows users to modify the COMMBAT output by adjusting display options and filters. All changes are applied in real time and immediately update the Cloud Plot and Results Table. Users can also highlight specific BGCs (such as control BGCs with TF-TFBS interaction previously validated experimentally) to gauge the relevance of the COMMBAT score (red circles in the Cloud Plot). ( B ) Interactive Cloud Plot. A visual representation where each circle corresponds to a predicted TFBS within a BGC. BGCs are ranked in decreasing order of their highest TFBS COMMBAT score. Clicking a circle reveals the corresponding BGC identity in both the Customization Panel and the Results Table. ( C ) Interactive Result Table. The table lists by default TFBSs from BGCs according to the COMMBAT score. Users can sort the table results by any column, organizing the data in either ascending or descending order (numerical values are sorted accordingly, while text-based columns follow alphabetical order). It is also possible to directly look for a specific result by typing key words in the search window above the table. ( D ) TFBS Mapping Viewer. View of the genetic organization of a selected BGC (which can be selected in the Customization Panel, and the Interactive Cloud Plot and Results Table), including its functional gene categories, and the TFBS locations within the BGC.

Application #1b: “Multiple BGCs with one matrix” using antiSMASH results.

COMMBAT also supports predictions on BGCs identified by antiSMASH, either retrieved from the database or via job ID. Using the ‘KnownClusterBlast’ option, antiSMASH assigns each predicted BGC a similarity percentage to known MIBiG clusters. COMMBAT converts this into a Novelty score (0–1), where 0 indicates complete similarity and 1 indicates no similarity, highlighting cryptic BGCs with potential for novel metabolite production. The output page retains the four standard panels, with the Customization Panel and Results Table now including Novelty scores. In the Interactive Cloud Plot, BGCs are positioned by combining their COMMBAT score (x-axis) and Novelty score (y-axis), enabling identification of cryptic BGCs with predicted regulatory control.

Overview of the four sections of the COMMBAT output results page using option “Multiple BGCs vs One Matrix” on antiSMASH results. ( A ) Customization Panel with the filter for adjusting the Novelty score threshold in addition to the filter for the COMMBAT score threshold. ( B ) Interactive Cloud Plot. Visual representation where each circle corresponds to a predicted TFBS within a BGC. TFBSs of BGCs are placed according to the COMMBAT score (x axis) and BGCs are ranked according to their Novelty score (y axis). ( C ) Interactive Result Table. ( D ) TFBS Mapping Viewer with localization of the two TFBSs selected in the Interactive Result Table.

Application #2: “One BGCs vs Multiple Matrices”

As a second application, COMMBAT enables prediction of all TFs that may regulate the expression of a given BGC (option “One BGC versus multiple matrices” ). The output page retains the four panels described above, with the difference that in the interactive cloud plot TFBSs are ranked by top-scoring PWMs, highlighting TFs most likely to control the selected BGC. In the Customization Panel, users can tag one or more reference TFs previously validated as regulators of the selected BGC by selecting their associated PWMs. These references are highlighted in the Interactive Cloud Plot to provide benchmarks to assess the likelihood that additional TFs may also control the BGC’s expression.

Overview of the four sections of the COMMBAT output results page using option “One BGC vs Multiple Matrices” ( A ) Customization Panel allowing tagging of experimentally validated TFs as benchmarks to assess other candidate regulators (see selected PWMs 1, 2, 3 in the “Tag reference matrices” window). ( B ) Interactive Cloud Plot. PWMs are ranked in decreasing order of their highest TFBS COMMBAT score and the selected reference TFs/PWMs (1, 2, 3) are highlighted in red. ( C ) Interactive Result Table. In the example, the table only shows the TFBSs associated with the PWM selected in the Customization Panel. ( D ) TFBS Mapping Viewer with localization of the TFBS selected in the Interactive Result Table.

Customization panel


COMMBAT enables users to configure a set of parameters and options that directly affect both the interactive cloud plot and the results table.


Interactive Result Table


Depending on the selected COMMBAT application ( “Multiple BGCs with one matrix” or “One BGC versus multiple matrices” ) and the chosen BGC database, the fields displayed in the interactive table header may vary.

Table headers common across all applications and databases:

  • BGC: The name and ID of the analyzed BGC.
  • class: The biosynthetic class of the analyzed BGC.
  • matrix: The name of the analyzed transcription factor.
  • locus: The name and ID of the gene.
  • function: The function of the protein produced by the gene.
  • category: The functional gene category of the gene (which was determined following the standardized classification framework provided by MIBiG).
  • species: The producing organism’s species.
  • sequence: The transcription factor binding site nucleotide sequence.
  • position: The position of the binding site relative to the translation start codon of the gene.
  • target_score: Score of a transcription factor binding site (TFBS) according to the functional categories associated with the TFBS (e.g., regulatory, core biosynthetic, transport, other, etc.), and the type of region (e.g., upstream region, regulatory region, coding sequence,etc.).
  • interaction_score: Score of the interaction between a TF and its predicted TFBS obtained by normalizing the PWM score between 0 and 1. The maximum score is the consensus of the PWM and is set to 1 for each TF.
  • COMMBAT_score: The COMMBAT score estimates the likelihood of a BGC to be regulated by a specific TF (see explanation in ‘COMMBAT score calculation' section).

Table headers only in ‘MIBiG’ predictions:

  • compounds: Compounds produced by the MIBiG BGC.

Table headers only in ‘antiSMASH DB’ and ‘Your BGC collection’ predictions:

  • accession: The genome accession number of the producing organism.
  • knownclusterblast: Closest MIBiG BGC to the predicted BGC.
  • compounds_MIBiG: Compounds produced by the closest MIBiG BGC.
  • class_MIBiG: The biosynthetic class of the closest MIBiG BGC.
  • novelty_score: The Novelty score shows how similar a predicted BGC is to known BGCs from the MIBiG database. (see explanation in ‘Novelty score calculation' section).

Users can also choose to display co-transcribed genes and/or the PWM score in the table:

  • PWM_score: The score of the interaction between a TF and its predicted TFBS based on the sequences used to generate the position weight matrix (PWM). The PWM score is calculated by the PREDetector software.
  • cotr_locus: The names and IDs of the co-transcribed genes.
  • cotr_functions: The functions of the proteins produced by the co-transcribed genes.
  • cotr_categories: The functional gene categories of the co-transcribed genes (which was determined following the standardized classification framework provided by MIBiG).

COMMBAT score calculation


The COMMBAT scoring method includes two key components: the INTERACTION SCORE and the TARGET SCORE (see figure below), and is calculated according to the expression:


where:

  • I is the INTERACTION SCORE evaluating the binding affinity between a TF and its predicted binding site. I is calculated as the ratio of the PWM score of a predicted TFBS (ITFBS) to the maximum PWM score (Imax). Scores of predicted TFBS are thus normalized on a scale from 0 to 1, where a score of 1 corresponds to the consensus sequence, representing the highest binding affinity for the TF.
  • T is the TARGET SCORE which incorporates two aspects: (i) The Region Score ( R ) which reflects the genomic location or type of region bound by the TF, and (ii) The Function Score ( F ) which considers the functional classes of genes within BGC that are predicted to be regulated by the TF. F is calculated following the prediction of all co-transcribed genes whose expression may be under control of the same TFBS. The selected F score, max( F ), corresponds to the score of the gene within the transcription unit whose functional category is predicted to most significantly impact the expression of the BGC.

Genomic and Functional Data integration into the COMMBAT scoring method. ( a ) Definition of the different types of regions and the different genes’ functional categories in BGCs. Upstream region: intergenic region between a start and a start or a stop codon; Coding region: between a start and a stop codon; terminator region: intergenic region between two stop codons; regulatory region: region encompassing both the upstream and coding region of a gene where most TFBS are found. The gene functions are color-coded according to the scheme applied across MIBiG entries. ( b ) Proposed integration of genomic (Region) and functional (Function) data into the COMMBAT scoring method.

Novelty score calculation


The Novelty score evaluates whether a BGC is cryptic (the genetic material is not yet associated to a metabolite) or instead similar to characterized clusters from the MIBiG database. The score is based on the similarity percentage given by the ‘KnownClusterBlast' option of antiSMASH. COMMBAT calculates the Novelty score as 1 minus the similarity percentage, yielding values ranging between 0 to 1, where 0 indicates complete similarity (100%) and 1 indicates no similarity (0%).