Viewing Multiple Alignments and Trees

Step 1: Introduction

This tutorial was updated using Genome Workbench version 3.6.0

This tutorial demonstrates how to manage views in Genome Workbench. After completing it you will know how to create new views, use options to manipulate the views, and move views between different locations. You will also get knowledge on how views communicate between each other and learn about the ways to see selections in one view reflect in other views of Genome Workbench.

In order to get the full benefit of this tutorial you will need to download the sample data from https://ftp.ncbi.nlm.nih.gov/toolbox/gbench/tutorial/Tutorial3/collembola_COI_prot_aln_tree.gbp

You should complete the Basic Operation tutorial first.

Step 2: Open sample project in the Project View

Open Genome Workbench. Choose the open folder icon from the main toolbar or choose File=>Open from the menu bar. Choose Project from the left side of the dialog, click the folder icon on the right side. Then navigate to the collembola_COI_prot_aln_tree.gbp file that you have downloaded, select it, and click Open. Then click Finish. The system will open the project in the Project view.

Open project

The sample project contains two folders:

  1. set_of_collembola_sequences – contains sequences that were used to create multiple alignment and construct the tree.
  2. Tool Results folder with multiple alignment created by the MUSCLE program and tree created by the Phylogenetic Tree Builder Tool/Neighbor Joining method.

MUSCLE is not distributed with Genome Workbench but it can be downloaded to your local computer from: https://www.drive5.com/muscle/ and run in Genome Workbench if the path to exe file is provided.

Step 3: Opening a Multiple Alignment View

Open a multiple alignment view on the protein alignment. You can do this by selecting the multiple alignment in the project tree (the MUSCLE alignment), right-clicking and choosing Open New View. Another easier way to get there is by double-clicking on the item; this will bring up the Open View dialog, as shown below. Or you can choose View=>Open New View from the main menu.

Open in_MSA

Select Multiple Alignment View and click Next.

Step 4: Multiple Alignment View Features

The default view for the multiple alignment will appear. You will see an image like the one below.

MSA view no color

There are several features to note:

  • For zooming in/out use Z + Left Click/Drag (or push the mouse wheel and drag), for zooming to a range use R + Left Click/Drag
  • A tooltip will appear if you hover over a location. This tooltip shows information about the sequence annotation, and the sequence/alignment positions and statistics
  • The header row contains a set of column headings. You can rearrange the columns using drag-and-drop. The set of columns visible is up to you to decide: right-click on the header and choose Settings to bring up a menu to select/unselect columns.

MSA properties

Step 5: Coloration Methods

Now we will look at alternate ways to score and color an alignment. Genome Workbench provides a variety of means for scoring alignments. If you right-click in the alignment view you will see a context menu like the one below. Choose Coloration => Select Method... to see the list of available schemes for coloring proteins.

MSA coloring options

Note: there is no selected method by default, but Genome Workbench will remember the method you have used (if any) and will show it next time you open the MSA view.

MSA coloring options 2

Let us select the hydropathy scale method. This method scores each residue in an amino acid independently and provides a colorimetric scale between hydrophobic (red) and hydrophilic (blue). Click on Hydropathy Scale and click Select. Once this is set you should see the multiple alignment view change to look like the image below.

MSA coloring hydropathy

Step 6: Settings for Coloration Methods

Each coloration method offers its own configuration settings. While many of these settings are not the ones that most people would want to change, some of these are notable, so let us look at how to change them. Right-click in the multiple alignment view and choose Coloration => Method Properties....

You should see the menu as in the image below.

MSA coloring hydropathy prop

Choosing this brings up a properties dialog for the coloration scheme.

MSA coloring hydropathy prop 2

There are several things to note here:

  • The colors used are configurable. The default for the hydropathy scale is red for hydrophobic and blue for hydrophilic residues. In addition, the color used for neutral is provided. You may change each of these colors. In addition, there is a slider above the color scale so you can select the degree of gradation between colors.
  • When using consensus scoring, you can choose to provide a window for averaging across an alignment. The default value is 1. Consider the average score for a column to consist of the averages of all residues in that column. You can change this to include adjacent columns as well. For hydropathy this provides means for identifying regions of the alignment that are more or less hydrophobic or hydrophilic than expected.
  • There is a check box to toggle consensus scoring. Toggling this changes the calculation of score so that coloration is based on the difference from the average score in a column rather than on the single score provided for the amino acid. Choosing this allows you to investigate variance within a column. Please check Use consensus now and click OK. You should see the screen change to match the image below (might need to adjust gray color – make it lighter to see differences clearly).

MSA coloring hydropathy prop 3

Step 7: Adding a Phylogenetic Tree View

Next let us add a phylogenetic tree view. Select the Phylogenetic tree item in the Project Tree.

Tree selected in project

Right-click the selected tree and choose Open New View or select View => Open New View

TreeView open dialog

Click Tree View. You should see a view like the one in the image below appear.

TreeView

This tree is a tree constructed from the alignment in this project. The tree was obtained by running the tool at Tools => Run Tool, choosing Phylogenetic Tree Builder Tool, and using the Neighbor Joining algorithm. Tree is midpoint rooted.

Step 8: Phylogenetic Tree View Features

The phylogenetic tree view offers a variety of ways to manipulate and edit trees. We will discuss a few of these below.

Layout Options

The phylogenetic tree offers several different methods to lay out the nodes in the tree. In order to choose the layout option Right-click on the tree image and in the pop-up menu select Layout.

TreeView menu

The available options are:

  • Rectangular Cladogram - the default view
  • Slanted Cladogram - provides a triangular view of the tree
  • Radial Tree - shows the tree in radial format
  • Circular Tree - shows the tree in circular format

The image below demonstrates our sample tree in the Circular layout.

TreeView circular

Searching

The phylogenetic tree offers a powerful search implemented via the search bar at the top of the window. Two search methods are implemented within the single interface:

  • Simple string matching
  • Full query search

Simple string matching allows you to type in some text and then press enter or Start to search for that text within all node properties in the current tree. If your text includes blanks, enclose it in quotes to force the search tool to use simple string matching. If the text in the search box has blank spaces and is not enclosed in quotes, the search engine will attempt to parse it according to the query language syntax.

TreeView search tooltip

After a query is executed, the matching nodes replace any currently selected nodes. To enhance visualization of the results, check the Filter on the toolbar which draws the nodes not selected by the query semi-transparently.

TreeView search filtered

Full query search allows you to create logical queries similar to how you select records in an SQL database. In this format use the node properties and compare them with other properties or values of your choice. Queries can be built from a combination of comparisons, such as equal and greater-than, combined with logical operators, such as AND and OR. Logical operators may be given in upper or lower case. While typing in a query node property names will be highlighted in blue. To execute the query, press Enter in the search box or click the Start button. While a query is running you will not be able to manipulate the tree. If a query takes too long click the Stop button to stop the query.

The valid query elements include:

  • String, numeric and boolean values (such as 5, 0.2, true, "mitochondrial"
  • Node properties (such as seq-id, dist, organism, cluster-id)
  • Simple comparisons: <, <=, >, >=, =, !=
  • The 'Like' Comparison which allows wildcards: organism like Desoria*
  • 'Between' comparison: dist between 0.02 and 0.05
  • 'In' comparison: seq-id in (AAT66216, AAT66240.1)
  • Logical Operators: AND, OR, XOR, NOT

Some valid queries for the sample project are:

organism = "Archisotoma polaris" and seq-id = "AAT66228"

dist between 0.002 and 0.003 or seq-id==AAT66206

label like "AAT6619*" xor dist > 0.002

seq-id in (AAT66197, AAT66220, AAT66229)

If a search returns multiple nodes (example label like AAT6622*) you can view the nodes one-at-a-time.

TreeView All

To view the nodes one-at-a-time uncheck the Select All check box and then use Prev and Next arrow buttons to go through the selected nodes individually. The search result is illustrated below.

TreeView one by one

Distances

Our example neighbor joining tree is a distance tree. Thus, for every child node there is a distance to the parent node and tree is rendered in accordance with the real distances. Distances can be removed from tree to see clear topology. To remove distances, open context (right-click) menu, select Layout and remove checkmark from the Use Distances option.

TreeView no distances

You should see tree similar to the image below. A distance free tree can be seen in any available layouts.

TreeView no distances 2

Labeling

The phylogenetic tree displays the sequence identifier at each node. To customize it, select the Settings option in the right-click context menu. In the Properties dialog open the Labels tab.

This dialog contains some simple and custom labeling options to select the properties available in the tree. Let us select Custom Labels and construct the label $(label) - $(organism). The drop-down and Insert button on this page should be used to insert the properties without needing to know the syntax. In the testing area part of the dialog you can see how your custom labels will look in the tree.

TreeView properties labels dialog custom

Once the labels are set, click the OK button, and see the phylogenetic tree view has changed: each terminal node (leaf) is now marked with the sequence accession as well as the species name.

TreeView properties labels custom

Node Properties dialog

Sometimes it may be desirable to add a new property or update the properties of some nodes in the tree. It can be done using the Node Property dialog. To open this dialog, you need to either right-click on the node to open menu and select Properties or hover over the node, wait for tooltip, and click the information icon (i) in the tooltip.

Node properties dialog

To add a new property, you need to add a property name in the Name box, hit the Add button, and observe that the list of the properties has been updated. Then you need to select the new property in the list by click on it (the name should be populated in the name box), add a new value, and hit Update button. You should see the properly updated property in the list. Notice that the color can be easily selected in the dialog.

Node properties dialog2

Below there is an explanation on how to manage and what parameters to use for such properties as: marker, $NODE_BOUNDED, $LABEL_COLOR, and $EDGE_COLOR.

Node Markers

To highlight individual nodes, use the property marker. Add this property as a Name in the Node Properties dialog. You can display this dialog by using the Properties option in the right-click context menu for the node or by clicking on the information (i) icon in the node tooltip. The marker value may include one or more colors and, optionally, a size parameter. The colors are specified as RGB values between 0 and 255 in square brackets, e.g. [64 0 128]. The numbers may be separated by commas and/or spaces. If a fourth value, commonly called the alpha channel, is given between the brackets, it is ignored. When multiple colors are given, the marker is divided evenly between the given colors, and looks much like a pie chart.

TreeView properties node markers

Examples to try:

Example for Property Marker Expected Result
[255 0 0] Red marker, default size
[64 0 128] Dark purple marker, default size
[0 255 0] [255 0 0] size=4 Marker that is 50% red and 50% green with large size

Subtree Boundaries

The phylogenetic tree supports adding a colored boundary to one or more subtrees. Boundaries are added as a property $NODE_BOUNDED to the parent node of the subtree using the "Node Properties" dialog. You can display this dialog by using the Properties option in the right-click context menu for the node or by clicking on the information (i) icon in the node tooltip. There are several parameters for a boundary including its shape, color, border width and whether or not the boundary should include text. It is also possible to define different boundary shapes for each of the different layout methods. Parameters other than the shape will remain the same for each layout method.

Parameters for the boundary regions are not case-sensitive. Colors are specified in the format [0..255, 0..255, 0..255, 0..255] for red, blue, green and, optionally, alpha. The numbers may be separated by spaces and/or commas. Parameters that require a value are specified in the form "parameter=x", and the possible values for 'x' are shown below. Boolean parameters can be 'true', 'yes', 'y', 'false', 'no', or 'n'. Parameters such as color and border that apply to more than one boundary shape will be applied to all applicable shapes.

TreeView subtree rectangle background

TreeView subtree triangle background

Parameters for the $NODE_BOUNDED property

Shape Parameters
1. Shape={Rectangle, RoundedRectangle, Triangle} The following parameters specify the shapes to be used for different layouts. If the same boundary shape is to be used for all layouts, specify only the 'Shape' parameter. To override the 'Shape' parameter for other layouts, specify the shape for that layout.
2. RectCladogram={Rectangle, RoundedRectangle, Triangle}
3. SlantedCladogram={Rectangle, RoundedRectangle, Triangle}
4. Radial={Rectangle, RoundedRectangle, Triangle}
5. Circular=={Rectangle, RoundedRectangle, Triangle}
Appearance Parameters (apply to all the different shapes)
[0..255, 0..255, 0..255, 0..255] The boundary color is specified as [r, g, b, a] without the 'keyword=' syntax and it can include an optional transparency, or alpha, value where 0 is fully transparent and 255 is fully opaque.
Border=n 'Border' expands the overall shape by a specified number of pixels.
Corner=n 'Corner' rounds off the corners in RoundedRectangles and Triangles.
DrawEdge={true, false} or {yes (y), no (n)} The 'DrawEdge' parameter adds a 1-pixel border to the boundary.
EdgeColor=[0..255, 0..255, 0..255, 0..255] The edge color defaults to black but can be changed with the 'EdgeColor' parameter.
IncludeText={true, false} or {yes (y), no (n)} If 'IncludeText' is true, the boundary shape will be expanded to include node labels.
Triangle Parameters (apply only to triangles)
AxisAligned={true, false} or {yes (y), no (n)} If 'AxisAligned' is true, then the shape is aligned with the nearest x or y axis. This defaults to 'true'.
TextBox={true, false} or {yes (y), no (n)} The 'TextBox' parameter forces the text of the bounded nodes to be placed in a square box rather than expanding the triangle to include the text.
TriOffset=n TriOffset' is the distance behind the root node at which the triangle apex should be placed. It defaults to '40' units.

Examples to try

Example for Property $NODE_BOUNDED Expected Result
[0 255 0 64] Shape=RoundedRectangle Light green rectangle with rounded angels, includes text
[0 255 0 255] shape=Rectangle
IncludeText=true
Green rectangle boundary for all layouts with text included but no border or edge.
[255 0 0 128] Shape=Triangle corner=5 border=5 textbox=false drawedge=y AxisAligned=false Red triangle that does not include a text box and has rounded corners and a black edge.
[0 0 255 128] shape=RoundedRectangle SlantedCladogram=Triangle Radial=RoundedRectangle drawedge=5 corner=5 border=5 textbox=false IncludeText=false AxisAligned=false Blue rectangle with rounded corners for the rectangular cladogram layout, triangle for slanted cladogram layout and rounded rectangle for radial layout. Boundaries will not be expanded to include text. Corners will be rounded, and a 5-pixel border will expand the boundary size

Color Labels

To color individual labels for the terminal nodes (leaves) you need to use the property $LABEL_COLOR. Add this property as a Name in the Node Properties dialog. You can display this dialog by using the Properties option in the right-click context menu for the node or by clicking on the information (i) icon in the node tooltip. Select and add the new color value using the color option available in the Node Properties dialog. You should see the labels colored with the new color.

TreeView label coloring

Color Branches

To color individual branches, use property the $EDGE_COLOR. Select the node of interest, open the tooltip for this node, and click the information (i) icon in the tooltip to open the Node Properties dialog. Add the property $EDGE_COLOR in the Name box and click the Add button. Select the newly added property and add a value (color) in the Value box, click Update and OK to close the dialog. Observe that the new color has been assigned to the branch from the selected node to its parent node.

Note: in case the child node has a cluster color set (property is cluster-id, see example in the Loading Attributes section of this tutorial) the cluster color will take precedence over $EDGE_COLOR.

TreeView branch coloring

The color on the branch will be blended between the child's color and its parent's color if the $EDGE_GRADIENT property set for the corresponding child node has a value of 1, as in the image below:

TreeView branch with gradient

Loading Attributes

To update the properties of nodes in a tree from a file, right-click on the background to bring up the context menu, and then select Load Attributes. The loading attributes feature allows you to update the properties of nodes in a tree by loading them from a flat file. The attributes in the file can include both updates to existing nodes attributes as well as new attributes. The sequence identifier, seq-id, property is used as the key to match nodes in the file to nodes in the tree. This of course implies that the feature can't be used to directly update nodes that do not have a seq-id.

TreeView Load Attributes menu

The file that provides the updates to the node properties has a well-defined format an example of which is shown below:

#BKBTA-1
#seq-id cluster-id label dist
AAT66197 2 Hypogastrura concolor 0.02

The first line of the any attribute file must contain the file-identifier #BKBTA-1.

The next line (#seq-id cluster-id label dist) must specify the names of all the node properties that are given in the file. The list of property names should start with # and the individual properties should be separated by spaces or tabs. The first property has to be a key value that can be used to look up the elements in the tree that will be modified by the corresponding row in the attribute table. Attribute rows in which the first element - the key value - do not match any nodes in the tree are ignored.

After these two lines, the following lines contain the actual node identifiers and properties to update (AAT66197 2 Hypogastrura concolor 0.02). Additionally, any lines after the first two lines that start with # are read as comments and will be ignored. The list of properties for each node must be separated by tabs, not spaces.

Below are two examples of attribute files to try.

Example 1 provides a cluster-id for the set of nodes in the sample project:

#BKBTA-1
#seq-id cluster-id
#Add a cluster id to the tree
AAT66197    9
AAT66196    9
AAT66189    9
AAT66223    2
AAT66236    2
AAT66216    2
AAT66230    2
AAT66203    2
AAT66195    2

TreeView Load Attributes uploaded

Example 2 contains the following attributes: marker - for a node color, $NODE_BOUNDED - for a label background color, and $LABEL_COLOR - for a label color:

#BKBTA-1
#seq-id marker $NODE_BOUNDED $LABEL_COLOR
#Add markers, background, and label color to the tree
AAT66197    [255 0 128 64]    [0 255 0 64]
AAT66196    [255 0 128 64]    [0 255 0 64]
AAT66189    [255 0 128 64]    [0 255 0 64]
AAT66223    [0 0 255 64]        [255 0 0 64]
AAT66236    [0 0 255 64]        [255 0 0 64]
AAT66216    [0 0 255 64]        [255 0 0 64]
AAT66230    [0 0 255 64]
AAT66203    [0 0 255 64]
AAT66195    [0 0 255 64]

To download attribute file examples (attributes_clusters.txt and attributes_others.txt), please follow this link: https://ftp.ncbi.nlm.nih.gov/toolbox/gbench/tutorial/Tutorial3

Saving Images

If you need to save a screen capture of the current tree, select Save Images... from the File menu to bring up the Save Images dialog.

TreeView save images

The dialog allows you to save the tree as a single image, or to divide the image into equal-sized tiles (sub-images) and save those to a directory. When saving the images, you can, via Printing Guides, display cutting markers and names of adjacent image tiles in the image margins. This is useful for saving images that will be printed and then reassembled into a poster presentation.

In the Save Images dialog, use the Partitions slider to subdivide the image into multiple sub-images, each of which will be saved to a separate file in the directory name given by Directory. The names of the image files are displayed on each tile and are a combination of File Name and the image's index given according to the numbering scheme in Numbering. Use the Image Size to specify the size of each individual image saved and use proportions to set the width-to-height ratio to make images as small as possible or to force them to a standard (paper) size. Click on every individual image to preview it before printing.

TreeView save images preview

TreeView save images saved

Re-rooting Tree

By default, the tree constructed in the Genome Workbench is “rooted” at some node. To see topology clearer it is better to re-root tree using Set Midpoint Root option or use the longest branch as outgroup and place root at the middle of it.

To place the root at the middle of the desired branch, right click on the branch and, when you see it has been highlighted in read, select “Place Root at Middle of Branch”.

TreeView reroot at branch

This will split the selected edge in two and place a new root for the tree in the middle, see image for the re-rooted tree below. If you do not want the root to be exactly in the middle, then after the tree is re-rooted, you can edit the distance property “dist” of the two children of the new root node to represent the position you prefer.

TreeView rerooted at branch

Midpoint rooting is also supported. This computes all the leaf to leaf distances in the tree and selects the longest one. The distance-based midpoint of the path between these leaves is then found and a new node is added at that point. The added node is then made to be the new root of the tree.

Use the option menu to select “Set Midpoint Root” which searches the tree for its “middle point” to re-root back your middle-branch re-rooted tree from the above step, see images below.

TreeView reroot at midpoint

TreeView rerooted at midpoint

Step 9: Arranging Windows

One of the powerful features in Genome Workbench is the ability to move the views where you'd like them, create tabbed stacks and resize any view. Our goal here is to take the Search view and the Active Objects Inspector view and dock them with the tab group on the bottom left. Then resize the Active Objects inspector and use it to inspect the nodes in our Phylogenetic Tree.

Click on the Search view tab and drag it over the title bar in the bottom left view. As you drag, you'll see the dock icons appear giving you choices where to put the view. Choose the center icon when over the bottom left view.

Do the same thing with the Active Objects Inspector. Then resize the bottom panel by moving the mouse over the divider and when it changes to a double arrow, click and drag. All the frames are resizable using the same technique.

Feel free to experiment with moving, docking and undocking, and resizing windows to find the set up that works for you.

Search view docking

Then go ahead and click on a node on our phylogenetic tree and the Active Objects Inspector will show the items dynamically. The items displayed change based on where you click in the tree.

Once this is completed, you should have a view that looks like the image below.

TreeView subbranch selected

Step 10: Interactions Between Views

So why would you want to go to the trouble of arranging views like this? The primary reason to do this is to see several aspects of the same data simultaneously. Genome Workbench provides this ability. To see it in action, open a Multiple Alignment View on the MUSCLE alignment in the Project Tree View.

Dock the Multiple Alignment View on the bottom of the Genome Workbench window like we did in the previous step. Your view should be like the view below.

TreeView MSA adjusted

If you click on a node in the Phylogenetic Tree view, the corresponding rows will highlight in the Multiple Alignment View. There can be many rows in the Multiple Alignment View so there are two ways to see the relevant rows.

The first way is to right-click (or control-click) on a description in the Multiple Alignment View and select Hide/Show -> Show Only Selected from the contextual menu.

MSA show only selected

MSA only selected shown

The second way is to right-click (or control-click) on a description in the Multiple Alignment View and select Move Selected Items Up from the contextual menu.

MSA selected moved up

You can reverse this operation at any time by right clicking and selecting Hide/Show -> Show All.

Step 11: Finished

This completes this tutorial. In this tutorial, we covered:

  • How to create different kinds of views on your data (Multiple Alignment View, Phylogenetic Tree View, Active Objects Inspector)
  • How to use scoring and coloration schemes in the multiple alignment view to see differences in your data.
  • How to manipulate the phylogenetic tree view to provide more informative displays.
  • How to arrange views to provide several different views of the same data on the screen at once.
  • How to see selections shown between different views.

Tree View on the Web

Tree View on the Web is also available at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/projects/treeview/

Current Version is 3.7.1 (released October 13, 2021)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2021-08-13T19:08:14Z