MAIA: Integrating Genome Assemblies

Introduction

MAIA is an algorithm to integrate multiple genome assemblies. For example, assemblies originating from:

  • Different runs of a de novo assembler
  • Assemblies of different data types
  • Comparative assemblies

What you need is:

  1. A set of assemblies
  2. A relatively closely related reference genome

PubMed link to the corresponding paper.

Overview

An overview of the process of integrating several assemblies with MAIA:

  1. Multiple de novo and comparative assemblies are created using specialized assemblers.
  2. The resulting contigs are pairwise aligned to each other to find end-to-end overlaps.
  3. An overlap graph is constructed, in which nodes represent contigs and edges represent overlaps. A forward and a reverse edge is added between the pairs of nodes, but these are indicated by an undirected edge for simplicity. A start node and an end node is determined using a reference genome. Edges are assigned weights based on several properties of the alignments and contigs, combined using weighted Z-scores.
  4. An orientation is assigned to the contigs by traversing the graph depth-first in order of weight (indicated by the numbers). When an edge assigns reverse orientation to a node, while a forward orientation has already been assigned via another edge, it is recognized as conflicting and it is removed.
  5. Oriented contigs and end-to-end overlaps form a directed graph.
  6. The highest scoring path is found using a Tabu search procedure, which leads to the assembly of a chromosome.

Overlap graph visualization

MAIA produces one .xgmml file per chromosome, which you can visualize in Cytoscape (File > Import > Network). Inspecting the graph might help interpreting the output.

Requirements

  • A Unix operating system
  • Matlab 2009b or later (with Bioinformatics toolbox)
  • The MAIA Matlab code: MAIA v0.5
  • The MUMmer package (nucmer and delta-filter)
  • The GAIMC Graph toolbox for Matlab

Installation

  • Extract the MAIA code and add the folder to your MATLAB Path ( File > Set Path > Add wit subfolders)
  • Install MUMmer and make sure nucmer and delta-filter are findable in the unix path
  • Install the GAIMC Graph toolbox

Usage

  1. Start MATLAB.
  2. Try the CENPK chromosome 9 example, that's in the 'example' folder
    1. cd into the ./maia/example folder
    2. Run the example by typing:
      >> maia('assembly_list.txt','./data/ref_genome/chr9_s288c.fa')
      
    3. Now the example folder contains the file maia_assembly.fa and one cytoscape .xgmml per chromosome (only one in this case)
  3. Now run maia with yout own data
    >> maia('tab delimited assembly list', 'reference genome')
    

    The tab delimited assembly list should be in the format:

    AssemblyName1 TAB FastaFileName1 TAB Zscore1
    AssemblyName2 TAB FastaFileName2 TAB Zscore2
    ... etc...
    
  4. Checkout optional paramters with
    >> help maia
    

Last update: December 15th, 2010. Contact: Jurgen Nijkamp