|
Chromatin
organisation
The human genome consists of
approximately 3.0 X 109 nucleotides having a maximum length
of more than 1 meter if fully stretched and still it can be found
packaged within the nucleus of each individual cell, which measures
approximately 5 µm in diameter. The human
genome is made up of 23 pairs of chromosomes (diploid number 46) that
consists of 22 autosomes and a pair of sex chromosomes (XX in females,
XY in males).
DNA is packed inside the nucleus
in association with a number of proteins, which are extensively coiled
and folded forming nucleosomes. Each nucleosome is made up a histone
octamer mainly made up of histones H2A, H2B, H3 and H4. Histones
consists of large amounts of positively charged amino acids mainly
lysine and arginine, that binds electro statically to the negatively
charged phosphate groups of the DNA backbone. The DNA turns in a 1.65
left handed orientation around each histone octamer covering a total of
146 bp of double stranded DNA. The next 50 bp links one nucleosome to
another also interacting with another histone (H1) forming a thicker
fibre consisting of six nucleosomes, known as the solenoid. Besides
histones there are other proteins that make up what is known as the
nuclear scaffold. One of these proteins is the enzyme topoisomerase type
II. Different solenoids will in turn form what is known as chromatin
fibres of approximately 200 nm in diameter and eventually make up the
chromatids that are 600 - 700 nm in diameter. Histones does not
dissociate from DNA during replication in S phase of the cell cycle but
new histones assemble on the lagging strand.
Histones are encoded by clusters
of genes that are repeated many times in the genome and that are highly
conserved through different species. Histone genes also lack introns.
Histone proteins are replaced by protamines in sperm heads.

Figure 1. Structure of nucleosome
Satellite
DNA
Re-association kinetics and sedimentation
equilibrium centrifugation showed that when eukaryotic DNA was sheared
and analysed a main band DNA and one or two satellite peaks were
observed. Re-association kinetics also showed that the satellites
observed in the eukaryotic genomes were a result of the re-association of
highly repetitive DNA sequences, of which there are two main types
either moderately or highly repetitive. Only 10% of the human genome is
thought to behave as single copy DNA.
Highly repetitive sequences are short sequences
that are repeated a large number of times, usually occurring as tandem
repeats. Satellite DNA is found in specific areas on the chromosomes,
better known as heterochromatin. Such areas include those around the
centromere. The centromere is the region of the chromosome to which the
spindle fibres attach during mitosis and meiosis that helps the
chromosome to move to one of the poles during anaphase. This region is
known as the CEN and in yeasts it consists of approximately 225 bp
divided into three regions. Region CEN III is the largest region and is
95% AT rich and is thought to be the most important region for
centromere function since the sequences at this region are important to
bind spindle fibres. In humans alphoid family of repetitive sequences
are found at the centromere and are about 170 bp in length present in
tandem arrays of up to 1 million base pairs.
Another very important structure of the chromosome
is the telomere, which also consists of repetitive sequences of DNA.
Telomeres are found at the tips of linear chromosomes. There are
telomeric sequences that consist of short tandem repeats while there are
telomere associated sequences found adjacent to and within the telomere.
With each cycle of DNA replication these telomeres become shorter and
eventually serve as an internal biological clock for the cell and thus
determines its age. In germ cells (but not in somatic cells) telomeres
are protected by the presence of an RNA-containing enzyme known as
telomerase. In immortalised human cancer cells, the activation of
telomerase is a very important step in the transition to malignancy.
Repetitive DNA
Moderate repetitive DNA can be found either
interspersed or else in tandem across the genome. There are two main
types of interspersed repetitive elements known as short or long. The
short interspersed elements or SINEs are less than 500 bp long and can
be found as much as 500,000 times in the genome. An example of a SINE is
the AluI element found in mammals. The long interspersed elements
or LINES are about 6400 bp long and can be found as much as 40,000
times. Moderate repetitive DNA can be clustered and some functional
genes also fall within this category including those coding for 5.8S,
18S, 28S rRNA in humans that are clustered on the p arms of chromosomes
13, 14, 15, 21, and 22. There are also tandem repeats such as the
variable number tandem repeats (VNTRs) that consists of repeats of 15 to
100 bp and were very useful for forensic work. Another type of tandem
repeats are the short tandem repeats (STRs) that can be either di-,
tri-, tetra- or even pentanucleotide repeats. These repeats are also
used for genetic identification in forensic DNA analysis.
Structure of the Eukaryotic gene
It is estimated that there are about 20,000 to
50,000 genes in the human genome that code for proteins, that is less
than two times the amount found in much simpler organisms. The structure
of the human protein coding gene is quite complex. Sizes of eukaryotic
genes can vary greatly in size ranging from less than 1 kb (histones) to
as much as 2500 kb for the dystrophin gene. A typical gene consists of coding and
non-coding sequences known as exons and introns, respectively. The exon
(coding part) is the code which is transcribed into the mature mRNA and
eventually translated into protein. An exon is usually small in size and
codes for a single protein domain, averaging 150 nucleotides encoding
about 50 amino acids. Each amino acid is encoded by a triplet code known
as a codon, and most amino acids are encoded by more than one codon. On
the other hand non-coding intervening introns are relatively large and
can even be made up of 20,000 bp. The sequence within introns is random
but it can contain regulatory sequences that affect the splicing
mechanism. Introns are transcribed into the primary RNA but will be
eventually removed (or spliced) and so does not make up part of the
mature mRNA molecule. The number of introns and exons between genes vary
greatly and a gene can consists of simply two or three exons, but can be up to more
than 20 exons.

Figure 2. Typical Structure of a Eukaryotic Gene
Besides the introns and coding exons, genes also
have other regulatory elements that mainly affect the way how the gene
itself is expressed and regulated. The 5' and 3' untranslated regions
usually consists of sequences that serve this purpose. The 5' region
ahead of the transcriptional start site usually makes up what is known
as the promoter region. The promoter region consists of sequences such
as the TATA box, where RNA polymerase binds to initiate
transcription. Further upstream there is the CCAAT box which also
plays a part in the regulation of transcription. Usually there are a
number of other consensus sequences to which a number of proteins or
transcriptional factors bind and control transcription. A number of
enhancers or/and silencers that can be found close or sometimes even
quite distant from the gene itself are involved in the regulation of
gene expression. Also sequences at the 3' end of the gene act as
regulators and terminators of transcription as well as for
polyadenylation of the mRNA molecules.
The Genetic Code
The sequence of nucleotides found in exons code for
the sequence of amino acids synthesised during translation forming
different protein domains. It was shown that a triplet of bases
specifies the ribosomal translation of a given amino acid. All amino
acids are coded by more than one codon (degenerate code) with the
exceptions of tryptophan and methionine. In each codon the last base has
reduced specificity and so four codons differing by the last base only
will encode for the same amino acid. This ensures that random mutations
at this base does not lead to alteration in the amino acid sequence. The
code also has three codons that are termination signals and a start
codon which is AUG that codes for methionine, in such a way that the
first amino acid in a protein is always methionine. This code is shared
by all living organisms although some variations exist in the
mitochondrial genome.

Gene clusters or families
In the human genome there are a number of related
genes found in clusters on the same chromosome or even scattered on
different chromosomes, that have similar functions or are switched on
and off through a lifetime. Among the families of genes there are:
-
The α and β-globin gene clusters found on
chromosomes 16 and 11, respectively
-
Ribosomal RNA, myosin and actin
-
The major histocompatibility complex (MHC) also known
as HLA on chromosome 6
The mitochondrial genome
The mitochondria are organelles
found within eukaryotic cells, thought to be of a prokaryotic origin
that throughout an evolutionary process integrated together as a form of
symbiosis. In a single cell there are a number of mitochondria that can
be up to 1500 in a liver cell. Mitochondria multiply within the cell by
division and each mitochondrion has its own genetic material as well as
smaller ribosomes than those found in the cytoplasm. The mitochondrial
genome consists of 16.6 kb, is circular and encodes for genes such as
those of transfer RNA, 12s and 16s rRNA and a number of cytochome c
oxidase subunits, cytochome b, ATPase subunits and eight protein coding
genes. Although mtDNA is double stranded, a small part of it appears to
be triple stranded due to repetitive synthesis of a short segment of the
heavy strand DNA. The genes encoded on mtDNA does not contain non-coding
regions as those on the genomic DNA and both strands are transcribed and
translated. There are also small variations in the genetic code where
some codons code for different amino acids. On the other hand there are
genes necessary for mitochondrial functions which are encoded on the
nuclear DNA. All these characteristics support the hypothesis that
mitochondria originated as a prokaryotic cell. It is now known that
variations within the mitochondrial genome can also lead to some
diseases in humans. Also mtDNA is usually inherited through the maternal
line (with some very rare exceptions) since the oocytes contain multiple
copies and the sperm cell only has four mitochondria at the neck of the
sperm that does not penetrate the oocyte at fertilization. Since mtDNA
is inherited only from the maternal line and does not usually undergo
recombination and mutations are rare, mtDNA analysis can be used to
study origins of populations through maternal line and also for forensic
purposes. Reference
Concepts of Genetics, 5th Edition, (1997) Prentice
Hall Inc, New Jersey, USA
Human Genetics, 3rd Edition,
(1997) Springer-Verlang, Berlin Heidelberg, New York
Human Molecular Genetics, (2004)
Garland Publishing, New York, USA
Some images taken from
Wikipedia online free encyclopedia |