The complete sequence of a human genome. Image credit: Gio.tto / Shutterstock In 2003, the historical work of the human genome was able to determine the sequence of 92% of the human genome. These were essentially codes for human echromatin, which contains many loosely packaged genes that encode many key proteins that play a key role in our physiology. However, for almost two decades, researchers struggled to decipher the remaining 8%, which is a smaller, tightly packaged part of the genome known as heterochromatin. Its main feature is that it is not responsible for the production of proteins. This was one of the reasons why scientists initially chose to prioritize echromatin, but also due to the fact that heterochromatin sequencing is extremely demanding. In other words, we needed much more advanced genomic tools to take a deep dive into this part of the genome. This means that for a long time, we had a huge gap in our knowledge of some basic cellular functions. If we look at the reference genome, there are many large series of unknown bases, and not even the whole chromatic genome has been adequately followed, as many errors (such as duplications) have been observed. That has now changed in this flagship study by the Telomere-to-Telomere Consortium (T2T), which brings together researchers from different academic institutions and the National Institutes of Health (NIH) in the United States.
Use of Merfin and methods that have been read for a long time
With cutting-edge techniques and renewed determination, this team of researchers was able to help complete what successfully started the Human Genome Project, reviewing errors found in color-coded regions, as well as providing a complete picture of heterochromatic regions. One of the most important tools they have used for this search is Merfin, which easily clears some of the most difficult sequences found in the human genome. More specifically, this tool allows you to check the accuracy of the sequence and find a potentially incorrect code alignment, then correct these errors. In addition, in this study, the researchers also used the complementary aspects of PacBio HiFi and Oxford Nanopore ultra-sequence readings, both of which are used to solve large and complex genomes with almost 100 percent accuracy. Both of these methods are known as long reading methods.
A human DNA design without a gap
In short, work in this study includes telomere assemblies in telomeres without gaps (i.e., from one end of the chromosome to the other) and for the 22 human autosomes and the X chromosome, resulting in 3,054,815,472 pairs of nuclear DNA bases – together with a 16,569-bp mitochondrial genome. Integrated and sequenced regions now include all centripetal satellite arrays, short arms of acrocentric chromosomes, and recent partial duplications, which unlocks these previously unknown regions in complex functional and variant studies. In a way, this is the first meticulous view of the layout of our human DNA. The long-read methods mentioned above have opened the door to understanding the most cumbersome, rich in repetitive parts of the human genome.
Towards personalized medicine
We are still a long way from fully sequencing the genome at the individual level, but this will now inform studies of diseases associated with the heterochromic genome, especially cancer associated with centromere abnormalities (the centromere is a constricted region of chromosome that separates it. in small and large arm). “This 8% of the genome has not been overlooked due to lack of significance but rather due to technological limitations,” the research team said in the groundbreaking Science paper. “The long-term high-precision sequence has finally removed this technological barrier, allowing for extensive studies of genomic variants throughout the human genome, which we expect will lead to a future discovery in human genomic health and disease,” they added. In any case, this study (and the accompanying research efforts) will substantially affect genome analysis and is an important step towards assembling models representing the human genetic code. Benefiting all of us will also open the door to personalized medicine and genome processing in the future.