johnnyphung / biology / 05:_DNA / 5.10:_The_Human_Genome_Projects

5.10: The Human Genome Projects

Shortly after their press conferences, the two groups that had been striving for several years to map the human genome published their findings:

These achievements were monumental, but before we examine them, let us be clear as to what they were not.

What was not found

The number of genes were much smaller than predicted

The two groups came up with slightly different estimates of the number of protein-encoding genes, but both in the range of 30 to 38 thousand:

  • barely two times larger than the genomes of
    • Drosophila (~17,000 genes)
    • C. elegans (<22,000 genes)
  • and representing only 1– 2% of the total DNA in the cell;
  • and a third of the 100,000 genes that many had predicted would be found.
  • (By 2011, the number had been reduced to some 21,000.)

Are the tiny roundworm and fruit fly almost as complex as we are?

Probably not, although we share many homologous genes (called "orthologs") with both these animals. But many of our protein-encoding genes produce more than one protein product (e.g., by alternative splicing of the primary transcript of the gene). On average, each of our ORFs produces 2 to 3 different proteins. So the human "proteome" (our total number of proteins) may be 10 or more times larger than that of the fruit fly and roundworm.

A larger proportion of our genome encodes transcription factors and is dedicated to control elements (e.g., enhancers) to which these transcription factors bind. The combinatorial use of these elements probably provides much greater flexibility of gene expression than is found in Drosophila and C. elegans.

Gene diversity and density

There are some giants such as dystrophin with its 79 exons spread over 2.4 million base pairs of DNA and titin whose 363 exons can encode a single protein with as many as ~38,000 amino acids. The average human gene contains 4 exons totaling 1,350 base pairs and thus encodes an average protein of 450 amino acids. The density of genes on the different chromosomes varies from 23 genes per million base pairs on chromosome 19 (for a total of 1,400 genes) to only 5 genes per million base pairs on chromosome 13.

Humans have many genes not found in invertebrates

Humans, and presumably most vertebrates, have genes not found in invertebrate animals like Drosophila and C. elegans. These include genes encoding:

  • antibodies and T cell receptors for antigen (TCRs)
  • the transplantation antigens of the major histocompatibility complex (MHC) (HLA, the MHC of humans)
  • cell-signaling molecules including the many types of cytokines
  • the molecules that participate in blood clotting
  • mediators of apoptosis. Although these proteins occur in Drosophila and C. elegans, we have a much richer assortment of them.

Gene Duplication

Both groups added to the list of human genes that have arisen by repeated duplication (e.g., by unequal crossing over) from a single precursor gene; for examples, the genes (several hundred) for olfactory receptors and the various globin genes.

Repetitive DNA

Both groups verified the presence of large amounts of repetitive DNA. In fact, this DNA — with similar sequences occurring over and over — is one of the main obstacles to assembling the DNA sequences in proper order.

  • LINES (long interspersed elements)
  • SINES (short interspersed elements) including Alu elements
  • Retrotransposons
  • DNA transposons

All told, repetitive DNA probably accounts for over 50% of our total genome.

What remains to be done?

Contributors and Attributions