MB
Written by Mustafa Bilgic • Updated 20 Feb 2026 • Molecular Biology

DNA Sequence Analyser

Use letters A, T, G, C only. Spaces and line breaks are ignored. Maximum 300 bases.

Invalid characters detected. Please use only A, T, G, and C.

Sequence Analysis Results

DNA Structure and Base Pairing Rules

Deoxyribonucleic acid (DNA) is the molecule that carries genetic information in all living organisms. It consists of two complementary strands wound around each other in a right-handed double helix — a structure first described by Watson and Crick in 1953, building on X-ray crystallography work by Rosalind Franklin.

Each strand is made of nucleotides, with each nucleotide containing:

Complementary Base Pairing

The two strands of DNA are held together by hydrogen bonds between complementary base pairs. The rules are strict and universal:

A and G are purines (double-ring structures); T and C are pyrimidines (single-ring structures). A purine always pairs with a pyrimidine, ensuring the double helix maintains a constant width along its entire length.

RNA Base Pairing Rules

RNA (ribonucleic acid) differs from DNA in three ways: it is single-stranded, it uses ribose sugar instead of deoxyribose, and it contains uracil (U) instead of thymine (T). During transcription:

Chargaff's Rules

Erwin Chargaff discovered in 1950 that in any double-stranded DNA molecule, the percentage of adenine always equals the percentage of thymine (%A = %T), and the percentage of cytosine always equals the percentage of guanine (%C = %G). This is a direct consequence of complementary base pairing. Chargaff's rules provided key evidence for the double helix model.

An important consequence: in double-stranded DNA, you only need to know the GC content of one strand to know it for the whole molecule. The AT content = 100% minus the GC content.

GC Content and DNA Stability

GC content is the percentage of bases that are guanine or cytosine. Because G-C pairs have three hydrogen bonds (versus only two for A-T pairs), DNA with a higher GC content is more thermally stable — it requires more heat energy to separate the two strands (denature the DNA). This has practical implications:

Transcription and Translation

The central dogma of molecular biology describes the flow of genetic information: DNA → RNA → Protein. This two-step process is how genes encode protein structure.

Transcription: DNA → mRNA

Transcription takes place in the cell nucleus. RNA polymerase unwinds the DNA double helix and reads the template strand (3' to 5') to synthesise a complementary mRNA strand (5' to 3'). The resulting mRNA has the same sequence as the non-template (coding) strand, but with U replacing T.

The mRNA then leaves the nucleus through nuclear pores and enters the cytoplasm where translation occurs.

Translation: mRNA → Protein

Ribosomes read the mRNA sequence in triplets of bases called codons. Each codon specifies one amino acid (or a start/stop signal). Transfer RNA (tRNA) molecules carry amino acids and have anticodons complementary to the mRNA codons. Amino acids are joined by peptide bonds to form a polypeptide chain.

The genetic code has 64 codons (4³) but only 20 amino acids, so the code is degenerate (redundant) — most amino acids are coded for by multiple codons. Three codons (UAA, UAG, UGA) are stop codons that terminate translation. AUG is the universal start codon and codes for methionine.

Applications of DNA Analysis

Gel Electrophoresis (DNA Fingerprinting): DNA fragments are separated by size using an electric field through an agarose gel. Smaller fragments travel further. This technique forms the basis of forensic DNA profiling (using STR loci) and is used in paternity testing, crime scene investigation, and evolutionary studies.

PCR (Polymerase Chain Reaction): PCR amplifies specific DNA sequences exponentially, producing billions of copies from a tiny original sample. It requires two primers that flank the target sequence and a heat-stable DNA polymerase (Taq polymerase). PCR is essential for COVID-19 testing, genetic disease diagnosis, and ancient DNA studies.

Restriction Enzymes: These are bacterial enzymes that cut DNA at specific recognition sequences (restriction sites). For example, EcoRI cuts at GAATTC. They are essential tools in genetic engineering for cutting and joining DNA segments (recombinant DNA technology). The resulting fragments have "sticky ends" that can be joined to other cut DNA using DNA ligase.

Human Genome: The human genome contains approximately 3.2 billion base pairs encoding around 20,000–25,000 protein-coding genes. The Human Genome Project, completed in 2003, took 13 years and over $3 billion. Today, a whole human genome can be sequenced in under 24 hours for under £1,000, using next-generation sequencing technologies.

Frequently Asked Questions

What are the base pairing rules for DNA?

In DNA, adenine (A) always pairs with thymine (T) via two hydrogen bonds, and cytosine (C) always pairs with guanine (G) via three hydrogen bonds. These complementary base pairs form the rungs of the DNA double helix. This strict complementarity means that if you know the sequence of one strand, you can immediately determine the sequence of the opposite (complementary) strand.

How is RNA different from DNA in terms of bases?

RNA uses uracil (U) instead of thymine (T). During transcription, when the DNA template strand is read, adenine in DNA pairs with uracil in RNA (A-U), while cytosine still pairs with guanine (C-G). RNA is also single-stranded and uses ribose sugar instead of deoxyribose. There are several types of RNA: messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA).

What is GC content and why does it matter?

GC content is the percentage of bases in a DNA sequence that are guanine (G) or cytosine (C). G-C base pairs are held together by three hydrogen bonds, while A-T pairs have only two. DNA with higher GC content is therefore more thermally stable and requires more energy to denature (separate the strands). This matters in PCR design, where primer GC content affects the melting temperature, and in understanding why some organisms tolerate extreme temperatures.

What is Chargaff's rule?

Chargaff's rules state that in double-stranded DNA, the amount of adenine equals the amount of thymine (%A = %T), and the amount of cytosine equals the amount of guanine (%C = %G). This is a direct consequence of complementary base pairing. The total percentage of purines (A+G) always equals the total percentage of pyrimidines (T+C). These rules provided crucial evidence supporting the double helix model proposed by Watson and Crick in 1953.

What is the difference between transcription and translation?

Transcription is the process where a DNA sequence is copied into a messenger RNA (mRNA) molecule, occurring in the nucleus. RNA polymerase reads the DNA template strand (3' to 5') and synthesises mRNA (5' to 3'). Translation is the subsequent process where the mRNA sequence is read by ribosomes in the cytoplasm to build a polypeptide (protein). Each three-base mRNA codon codes for a specific amino acid. Together, these processes convert genetic information into functional proteins.

How many base pairs does the human genome contain?

The human genome contains approximately 3.2 billion base pairs (3.2 × 10&sup9; bp) spread across 23 pairs of chromosomes. Only about 1.5% of the genome codes for proteins; the rest includes regulatory regions, introns, and repetitive sequences. If all the DNA from a single human cell were stretched out end to end, it would be about 2 metres long. The Human Genome Project completed the first full sequence in 2003.

What are restriction enzymes and how are they used?

Restriction enzymes (restriction endonucleases) are bacterial proteins that cut double-stranded DNA at specific recognition sequences of 4–8 base pairs. They evolved as a bacterial defence against viral DNA. In genetic engineering, restriction enzymes are used to cut DNA at precise locations, creating fragments with compatible "sticky ends" that can be joined to other DNA sequences using DNA ligase. This is the basis of recombinant DNA technology used to produce insulin, vaccines, and genetically modified organisms.

Related Calculators