
Cancer samples collected in the form of Formalin-fixed paraffin-embedded (FFPE) tissue are most often the primary source for the characterisation of tumours. FFPE specimens are routinely obtained from patients during biopsies or surgical resections and preserved for long-term clinical diagnosis and research, making FFPE the gold standard for tissue preservation. The preservation process involves the use of fixatives and embedding media such as formalin and paraffin, respectively, to maintain the tissue morphology and cellular structure as close to the natural state as possible for long-term storage. This makes FFPE tissues extremely useful for several lab processes, such as immunohistochemistry, histopathology, and molecular analysis, and thus, a valuable resource for cancer research. When the tissue is preserved this way, the DNA and other contents are preserved. This means that researchers can extract genetic material from tumour tissues long after the sample was collected.
The Double-Edged Nature of FFPE
While formalin is excellent at preserving tissue structure, its chemical mechanism can also damage nucleic acids. Formalin creates unnatural chemical links: DNA-DNA, DNA-Protein, and Protein-Protein crosslinks, which make it difficult to extract and prepare DNA for sequencing. Beyond inducing crosslinking, formalin fixation also triggers various forms of DNA damage, including cytosine deamination, oxidative stress, nicks, gaps, and abasic sites. Coupled with the often limited amount of tissue available from diagnostic biopsies, these changes make it challenging to recover sufficient quantities of high-quality DNA for downstream applications such as whole-genome sequencing.
Formalin fixation therefore presents a paradox. It preserves tissue beautifully for research and long-term storage, yet this same process can complicate the genetic analysis researchers hope to perform years later. A helpful analogy is if one orders a 500-piece puzzle and receives a 50,000-piece puzzle instead. The picture is still there, but now it’s been shattered into thousands of tiny fragments, and the effort required to reconstruct the original image increases dramatically.
“It’s an excitingly frustrating challenge”
From Tissue to Genetic Data
Extracting genetic information from FFPE samples for cancer research is neither straightforward nor as simple as fresh tissues. Extreme precaution at different analytical stages needs to be observed when working with FFPE samples to mitigate frequently occurring issues, which could potentially lead to misinterpretation of mutations in the genomic data. Despite all of these challenges, there has been significant advancement in finding solutions for whole-genome sequencing of FFPE samples.
The first step is to extract DNA from FFPE tumour tissues, which are usually stored as thin sections embedded on slides. The extraction procedure has to be carefully optimised and curated for efficient and successful DNA isolation. This basically includes a well-controlled process that prevents further damage, reverses the formalin-induced cross-linking, while ensuring sufficient DNA yields are obtained. In addition, automation of the extraction workflow standardises procedures, enhancing accuracy, efficiency and productivity. Quality analysis of the extracted DNA is crucial for selecting samples that are suitable for sequencing. The analysis determines the degree of damage in FFPE-derived DNA and identifies metrics that correlate with DNA quality. This allows sequencing outcomes to be predicted and unsuitable samples to be identified early, enabling their exclusion from sequencing to reduce costs and prevent the generation of low-quality, unusable data.
The DNA extracted from FFPE tissue is further prepared for whole genome sequencing (WGS) for a comprehensive overview of the genome. The human genome contains about 3.2 billion base pairs (base pairs are the fundamental building blocks of DNA). Since this is too much for the sequencing instruments to read all at once, the genome is first divided into many short fragments called “libraries”. These libraries are designed with modifications that allow the DNA fragments to be read properly by the sequencer. However, because the FFPE tissue contains damaged DNA, preparing it for sequencing is challenging and requires a well-designed library preparation and sequencing protocol to obtain meaningful data. Only samples that pass rigorous quality control checks for DNA quality and quantity proceed to library preparation. Every step of library preparation must be extremely precise to target, repair, and amplify the remaining usable DNA fragments. This is because all the mechanisms that cause DNA damage lead to loss of sequence information, and the amount of usable DNA in the sample should therefore be maximised.
The libraries undergo a quality control step to verify that fragment sizes are within the expected range and that sufficient material is available to generate high volumes of high-quality sequencing data. Sequencing is then performed on a high-throughput sequencing platform like the Illumina NovaSeqX Plus. Within the sequencer, each short DNA fragment is read billions of times over. This repeated “reading” ensures that the digital DNA sequence is accurate and the mutations observed are truly present in the tumour DNA.
What follows are downstream computational processes that help us understand the full picture of genetic changes present in the tumour. The information gathered from the sequencing process is then analysed using high-performance computing systems. Billions of tiny pieces of genetic code are put back together, like rebuilding a massive puzzle, to create a complete picture of the patient’s genetic information.
All of these intricate molecular techniques help us unlock valuable genomic information in the field of cancer research, transforming years-old FFPE blocks into previously unseen genomic data from the African continent.