oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Systems Biology
Pages: 1, 2, 3


Most of the software in systems biology is targeted to either the specifics of proteomics analysis, interpreting mass spectra, etc., or to the visualization of interaction networks. I am going to focus on the latter.

Cytoscape ( is a joint effort between groups at the Institute for Systems Biology in Seattle, Univ. California in San Diego, and Memorial Sloan Kettering Cancer Center in New York. They have built a general purpose network visualization tool with plugin capabilities.

The idea is that people will contribute plugins that allow integration and overlay of different types of data. The stable version is 1.1.1, but they have an alpha v2.0 available which includes a new open source graph library. They have a couple of tutorials that you can work through on their web site, and I encourage you to look at those. Here is a screenshot of a simple interaction network from yeast, which has gene expression data overlaid on the nodes, with green indicating relatively high expression and red indicating low expression.

Screenshot taken from ISB BioTapestry Screenshot taken from Cytoscape

While this sort of software can produce some impressive networks, it can be difficult to demonstrate a simple, practical use of the data. So although Cytoscape is an impressive tool, I’m going to use a different application for our example. Osprey is a Java application from Mike Tyers’ group at Mt. Sinai in Toronto. It is free for academics, and commercial users can get a free trial license for 30 days. Their home page is here:

In our example, we’re going to look at a signaling pathway that plays a critical role in development of the fly embryo (it’s the “Wnt” pathway, for those in the know). Names of genes and proteins are used interchangeably, which can be a bit confusing. We care about protein interactions, but often times we use genetic techniques to discover them.

We’ll start by defining two proteins of interest and then use the database to find other proteins with which both of them interact. This is an example of exploring the large dataset to discover interactions that might not otherwise be apparent.

Fire up the application, and when Database Connection Settings pops up, select Fly Grid and click Continue. This configures Osprey to use the GRID database of interaction in the Fruit Fly, Drosophila.

You will then see blank “canvas” with a toolbar on the top and a panel to the left. We will add two nodes to the canvas, where each node represents a gene from Drosophila. Click on the red circle with a black cross in the toolbar to bring up the Add Node window. Enter the gene names “axin” and “beta-catenin” as shown and click Add.

Add Node

Two dots will appear on the canvas. Left-click on either of them, and the left panel will display information about the gene. An important thing to notice is the list of alternate names for each of these genes. What we are calling “Axin” is also known as “axn,” “din,” “CT6340,” and “0442/30.” Gene nomenclature is a nightmare throughout biology, and perhaps nowhere more so than in the Drosophila community, where researchers went through a phase of giving genes names such as “disheveled,” “Van Gogh,” and “Mothers Against Decapentaplegic” -- the products of a bunch of no-good graduate students with too much time on their hands, if you ask me! This site lists more examples of their tomfoolery:

But back to our example! Holding down the left button on a selected node lets you move it around the canvas. Place the two nodes so that they are separate but still have plenty of space around them. Select both nodes with Ctrl-Z or by sweeping them. Then go to the Insert menu and select “All interactions for selected nodes.” Osprey will then go out to the Fly GRID database and fetch all nodes that are connected to these two. When it is finished, you will have something like this, depending on your node placement.

Screenshot taken from Osprey Screenshot taken from Osprey

This looks like the seed head of a dandelion! It is telling us that beta-catenin has a lot of interactions, but Axin has only a few.

What we care about are proteins that interact with both Axin and beta-catenin. We can remove everything but these using the filters in the lower-left panel of the application. Under Connection Filters, click on Minimum. Enter “2” in the pop-up window and click “filter”.

This has simplified things, but all of our nodes are collapsed on each other. We need to update the layout of the nodes in order to see what’s going on. Select all the nodes, go to Layout -> Circular -> One Circle to spread things out.

Finally, we'll add a bit more complexity by selecting all of the nodes and going to Insert -> Only interactions within selected nodes.

The color of each node represents the primary function of that gene or protein. You will see references in the information panel to “GO component,” “GO process,” etc. “GO” refers to the Gene Ontology project, a set of three ontologies that try and categorize biological processes, structures, etc. to provide a framework and controlled vocabulary for molecular biology. In our example, the purple nodes are involved in cell organization and the light blue ones are involved in signal transduction (communication between and within cells).

This network has almost all nodes linked to all other nodes. In reality, not all proteins will physically interact with each other. Some of the interactions shown here are inferred from genetic experiments and with some of those, a single perturbation can have multiple indirect effects. So understanding the evidence behind each interaction is important.

The edges of the graph, the lines between the nodes, are colored according to the evidence that supports that interaction. You can click on these to get the detailed information. Click on the link between “pan” and beta-catenin and the upper-left panel will show you the experimental technique(s) used to define the interaction.

Below that is a button called PubMed. This will open up your browser and point it to the PubMed database of biomedical literature at the National Library of Medicine. It will display abstracts for the papers that demonstrated the interaction and in many cases, these will include links to the full text of the papers, some of which provide free access.

Click on other nodes and build out the network by adding new interactions. Keep the complexity manageable with the various filters and explore the literature that supports the interactions. Notice how some proteins are involved in huge numbers of interactions whereas others are quite limited. Get a feel for the complexity of the data and the amount of work that has gone into its discovery.

Who are the Players in Systems Biology?

Systems biology initiatives are popping up all over the place at the moment. These range from new standalone institutes to loose collaborations between existing labs. Here are a few of the leading lights in the field.

Institute for Systems Biology (ISB) in Seattle
ISB is a non-profit institute set up by Lee Hood that works on bioinformatics, genomics, and proteomics, with an emphasis on new technologies. Lee is an eloquent evangelist for systems biology and his talks are well worth hearing if you get the chance.

MIT Computational and Systems Biology Initiative (CSBi)
MIT has taken the approach of coordinating work in existing labs across campus.

Bio-X at Stanford University
Bio-X is a new program that is bringing together biologists, physicians, engineers, chemists, and computer scientists to work on big problems in biology. The program is a combination of campus labs and a central hub in the form of a dramatic new building.

Final Thoughts

By its very nature, systems biology demands input from a wide range of scientific disciplines. Every aspect of the work involves complex data management and analysis, and that means there are plenty of opportunities for creative developers. The big centers are an obvious focus for the work, but there are many smaller labs around the world that are broadening their horizons to make use of, and contribute to, these growing resources. Do some background reading, see who is working near you, and see where your skills might be useful. This turbulent interface where different areas of science flow into each other is an exciting place for developers like us to work in.


Here are some important papers on systems biology with links to the free full text or PDF of each paper.

"The Digital Code of DNA"
Hood, L. and Galas, D. (2003) Nature 421, 444 – 448

"Regulatory Gene Networks and the Properties of the Developmental Process"
Davidson, E.H., McClay, D.R. and Hood, L. (2003) Proc. Natl. Acad. Sci. USA 100:1475-1480.

"Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry"
Yuen Ho et al. (2002) Nature 415:180-3

"BIND: the Biomolecular Interaction Network Database"
Bader GD, Betel D, Hogue CW. (2003) Nucleic Acids Res. 31(1):248-50

"Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks"
Shannon P, et al. (2003) Genome Res. 11:2498-504

"Osprey: A Network Visualization System"
Breitkreutz, BJ., Stark, C., Tyers M. (2003) Genome Biology 2003 4(3):R22

Robert Jones runs Craic Computing, a small bioinformatics company in Seattle that provides advanced software and data analysis services to the biotechnology industry. He was a bench molecular biologist for many years before programming got the better of him.

Return to