An ultimate goal of systems biology is to understand a complex biological process in such sufficient detail that we can build a computational model of it. That would let us run simulations of its behavior and gain a quantitative understanding of its function.
This goal has been pursued for quite some time now. Enzyme kinetics is an area of biochemistry that quantifies the mechanisms and rates at which proteins catalyze the chemical reactions of their substrates. Some of the seminal work in that field was done almost a hundred years ago by the pioneers of biochemistry. Today, we have models that describe how certain proteins operate in exquisite detail. But making the next step in complexity, from two or three proteins to even a small network, is proving incredibly difficult.
Part of the problem is that we cannot replicate most systems in vitro, in the test tube, in the way that we can with purified enzymes and their substrates. Another part of the problem is that many systems involve the regulation of gene expression and protein synthesis, each of which involve a huge number of different proteins and many, many unknown interactions. People have had some success in modeling specific regulatory networks in bacteria or yeast at a qualitative level, but no general approach has emerged.
Eric Davidson’s work on sea urchin development at CalTech is a dramatic example of where we might end up. The early-stage development of an embryo involves exquisite cascades of regulation. The switching on and off of genes in those first few hours determines the fate of the early cells, whether they give rise to the nervous system, the gut, or the muscles of the organism. The sea urchin happens to be an excellent experimental system in which to study development.
Through years of painstaking work, Eric and his group have identified many of the genes involved in the early stages of embryo development. They know which genes are turned on at which stage and can determine how each of them is regulated. Now they are getting to the really fun part and have assembled all of their data into a network that resembles an electronic circuit with a series of gates.
Eric’s group has worked with the Institute for Systems Biology (ISB) in Seattle to develop software to display their network. You can download ISB BioTapestry here: sugp.caltech.edu/endomes. The biology behind the network is a bit too involved for it to make a good hands-on example for this article (translation: it’s too complicated for me), but it’s worth taking a look. Fire up the application, click on one of the document icons on the left panel, such as “PMC hourly”, and then use the Hours control at the bottom left to see how genes are turned on and off during the first few hours of development of the fertilized egg.
|Screenshot taken from ISB BioTapestry|
An important use of interaction networks is to predict how a system will respond to specific changes in its environment or to a genetic defect in one of its components. For example, you might explore how normal cells in a tissue become malignant by looking at the effect of perturbations on a network of relevant proteins. Armed with a hypothesis, you can go back into the lab and see if it holds up in the real world.
A classic approach to experimental perturbation is to knock out a specific gene by a targeted mutation and see what happens. This approach has been used for decades, but the technologies of today allow us to generate vast numbers of mutations and to monitor the expression of thousands of genes. Rather than looking at specific responses to single mutations, we can now look at everything going on in the cell. This “wide-angle” view lets us see changes in things that we never thought to look at before.
High-throughput proteomics technologies are yielding a huge amount of data. No doubt about it, proteomics represents a major advance. But this is not the same as DNA sequencing, where we have, in effect, digital information. Comparing and integrating proteomics data is challenged by variation from cell to cell and by the ambiguity in the results that emerge when different techniques are compared. Put simply, the data are messy.
Probably the largest database of protein interactions is the Biomolecular Interaction Network Database (BIND), based at Mt. Sinai Hospital in Toronto (www.blueprint.org/bind/bind.php). Currently, this database has around 96,000 interactions between 34,000 sequences from 871 organisms. A nice feature of the site is the tutorial page that shows you three small networks. These are presented in their real biological context. You can find those here: www.blueprint.org/bind/bind_tutorials.html. Click on the link at the bottom of each page to see the specific interactions described, and then click through for more detailed information, including the supporting evidence. Note that some of the image maps do not work as advertised at the time of this writing. Also based at Mt. Sinai is The GRID, which we will access as part of our worked example. The relationship between these two groups is not clear to me.
Another major source is the Database of Interacting Proteins (DIP) from David Eisenberg’s group at UCLA (dip.doe-mbi.ucla.edu). DIP focuses on validated protein-protein interactions and is free for non-commercial access. As an aside, I hope to return to the issues of “free” access to data and software within bioinformatics in the future. It is a messy area that can cause all sorts of problems for non-academic users such as myself.
This site has links to some of the other interaction databases: www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html.