Experiment-11: Protein Function Annotation
To perform protein function annotation
Proteins have many parameters which can be used to classify and characterize them; some of these parameters are molecular weight, isoelectric point, cellular location, biological function etc. Elucidating and cataloguing this information about proteins is known as Annotation. This information is very useful for determining protein functions.
Protein sequence and structure prediction has been of major importance to determine the structure and functions of proteins. Vast amounts of these data are being stored, organized and analyzed for the same purpose. Proteins are made up of amino acid subunits that form a chain and ultimately folds into a protein. Its sequence determines the tertiary structure, homology, and function of the protein.
Due to genome sequencing it is now understood that a large fraction of the genes and their end products are involved in the core biological functions common to most organisms. Knowledge of such shared proteins in one organism can often be used to infer their functions in other organisms.
Protein annotation covers three domains:
1) Cellular components, whether the protein is the part of a cell or its extracellular environment.
2) Molecular functions, the elemental activities of a protein at the molecular level, such as binding or catalysis.
3) Biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.
Cellular component refers to the place in the cell where a protein is active. Cellular component includes such terms as ‘ribo-some’ or ‘proteosome’, specifying where multiple proteins would be found.
Example: nuclear membrane or Golgi apparatus.
Molecular function is defined as the biochemical activity (including specific binding to ligands or structures) of a protein. This definition also applies to the capability that a protein carries as a potential. Examples: enzyme, transporter or ligand.
Biological process or pathway refers to the biological objective of a protein. They often involve a chemical or physical transformation and are brought about by a group of protein working together. Examples: include biological process terms are ‘cell growth and maintenance’ or ‘signal transduction’.
The following is an example of protein annotation:
The protein cytochrome c can be described by its molecular function as an oxidoreductase, whole function (biological process) is oxidative phosphorylation and induction of cell death, and is found (cellular component) in the mitochondrial matrix and mitochondrial inner membrane.
Usually such details about a protein would be stored in a database or ontologies, depending on availability. A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. Similarly ontologies have long been used in an attempt to describe all entities within an area of reality and all relationships between those entities. Data can be annotated to varying levels depending on the amount and completeness of available information. Ontologies can be vital tools in enabling researchers to turn data into knowledge.
Example of a very commonly known ontology is Gene Ontology. The detailed information about the same is given below:
The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO Consortium has grown to include many databases, for plants, animals and microbial genomes.
The GO project has developed three structure controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. There are three separate aspects to this effort: first, the development and maintenance of the ontologies themselves; second, the annotation of gene products, which entails making associations between the ontologies and the genes and gene products in the collaborating databases; and third, development of tools that facilitate the creation, maintenance and use of ontologies.
This URL has all the tools and databases obtained at the Gene Ontology site. There are various tools like GoAnna, Gopubchem, AgBase, AmiGo and databases like Genes2Diseases, ToppGene Suite, Gene Ontology For Functional Analysis etc.
Other annotation tools are Protein Family Alignment Annotation Tool, JAFA etc
While GO provides a lot of information about proteins that help determine its function, it does not however, cover the following aspects:
- Protein domains or structural features.
- Protein and detailed information
- Protein-protein interactions.
- Environment, evolution and expression.
- Anatomical or histological features above the level of cellular components, including cell types.
- Literature of the protein and experiments done.
- Protein pathways and interactions.