Repetitive DNA sequences around the centromere show a history of human genetic variation. When scientists announced the full sequence of the human genome in 2003, they were a little confused. In fact, almost 20 years later, about 8% of the genome had never been fully sequenced, mainly because it consists of highly repetitive pieces of DNA that are difficult to align with the rest. But a three-year consortium eventually completed this remaining DNA, providing the first complete, non-empty genome sequence that scientists and doctors can refer to. The recently completed genome, named T2T-CHM13, represents a major upgrade from the current reference genome, called GRCh38, which is used by doctors to look for disease-related mutations as well as by scientists studying human evolution. variation. Among other things, the new DNA sequences reveal details they have never seen before in the area around the centromere, where chromosomes are grabbed and removed when cells divide, ensuring that each “daughter” cell inherits the right number of chromosomes. Variability in this region can also provide new insights into how our human ancestors evolved in Africa. “Revealing the complete sequence of these previously missing genome regions told us so much about how they were organized, something that was completely unknown to many chromosomes,” said Nicolas Altemose, a postdoctoral fellow at the University of California, Berkeley. -author of four new works on the integrated genome. “In the past, we had the most vague picture of what was there and now it is crystal clear in the analysis of a base pair.” Altemose is the first author of a paper describing the base pair sequences around the centromere. An article explaining how the sequence was made will appear in the April 1 issue of Science magazine, while Altemose’s centrifugal paper and four others describing what the new sequences tell us are summarized in the journal with the full papers published online. Four cover works, including one for which Altemose is the first author, will also appear online April 1 in Nature Methods. The sequencing and analysis was performed by a group of more than 100 individuals, called the Telemere-to-Telomere Consortium, or T2T, so named after telomeres covering the ends of all chromosomes. The gap-free version of the consortium of all 22 autosomes and the tribal X chromosome consists of 3.055 billion base pairs, the units that make up our chromosomes and genes, and 19,969 proteins that encode proteins. Of the genes encoding proteins, the T2T group found about 2,000 new ones, most of them disabled, but 115 of which may still be expressed. They also found about 2 million additional variants in the human genome, 622 of which appear in medically related genes. “In the future, when someone submits a sequence of their genome, we will be able to identify all the variants in their DNA and use this information to better guide their health care,” said Adam Phillippy, one of the leaders. of T2T and senior official. researcher at the National Institute of Human Genome Research (NHGRI) of the National Institutes of Health. “Completing the sequence of the human genome was really like wearing a new pair of glasses. “Now that we can see everything clearly, we are one step closer to understanding what all this means.”

The evolving centerpiece

New DNA sequences in and around the centromere account for 6.2% of the entire genome or nearly 190 million base pairs or nucleotides. Of the other sequences recently added, most are located around the telomeres at the end of each chromosome and in the regions surrounding the ribosomal genes. The entire genome consists of only four types of nucleotides, which, in groups of three, encode the amino acids used to make proteins. Altemose’s main research involves finding and exploring areas of chromosomes where proteins interact with DNA. The spindles (green) that separate chromosomes during cell division bind to a protein complex called kinetochore, which locks onto the chromosome in a place called the centromere – an area that contains highly repetitive DNA sequences. Comparison of the sequences of these repeats revealed where mutations have accumulated over millions of years, reflecting the relative age of each iteration. Replicates in the active centromere tend to be the newest and most recent duplicate sequences in the region and have remarkably low DNA methylation. Around the active centromere on both sides there are older repeats, probably the remnants of the former centromere, with the older ones farther away from the active centromere. The researchers hope that the new experimental methods will help reveal why centromere evolve from the middle, as well as why this pattern is so closely linked to kinetic space binding and low DNA methylation. Credits: Nicolas Altemose, UC Berkeley “Without proteins, DNA is nothing,” said Altemose, who holds a Ph.D. in industry jointly from UC Berkeley and UC San Francisco in 2021 after receiving a D.Phil degree. in the statistics of the University of Oxford. “DNA is a set of instructions without anyone reading it unless it has proteins around it to organize it, regulate it, repair it when it is damaged and reproduce it. “Protein-DNA interactions are really where all the action to regulate the genome takes place, and being able to map where certain proteins are linked in the genome is very important for understanding their function.” After the T2T consortium analyzed the missing DNA sequence, Altemose and his team used new techniques to find the site inside the centromere where a large protein complex called kinetochore holds the chromosome firmly so that other machines in the nucleus can to remove chromosome pairs. “When that goes wrong, you end up with the wrong chromosomes and that leads to all sorts of problems,” he said. “If this happens in the reduction, it means that you may have chromosomal abnormalities that lead to miscarriage or congenital diseases. “If it happens to somatic cells, you could end up with cancer – basically, cells that have a huge misalignment.” What they found in and around centrifuges were layers of new sequences that overlap layers of older sequences, as if through evolution new regions of centrifuges have been repeatedly positioned to connect to the moving space. Older regions are characterized by more random mutations and deletions, indicating that they are no longer used by the cell. Newer sequences in which the kinetic space is bound are much less variable and also less methylated. The addition of a methyl group is an epigenetic tag that tends to silence genes. All layers in and around the centromere are made up of repeating DNA lengths, based on a unit length of about 171 base pairs, which is about the length of DNA wrapped around a group of proteins to form a nucleosome, keeping the DNA packed and compact. These 171 base pair units form even larger repeating structures that are reproduced several times in series, creating a large area of ​​repetitive sequences around the centriole. The T2T team focused only on one human genome, which was obtained from a non-cancerous tumor called an aqueous molecule, which is essentially a human embryo that rejected its parent DNA and copied its parent DNA. Such embryos die and transform into tumors. But the fact that this mole had two identical copies of the paternal DNA – both with the paternal X chromosome, instead of different DNA from both mother and father – facilitated the sequence. The researchers also released this week the complete sequence of a Y chromosome from a different source, which took almost as long to assemble as the rest of the genome together, Altemose said. The analysis of this new Y chromosome sequence will appear in a future post. When researchers compared centripetal regions of 1,600 people from around the world, they found that those without recent African descent had at most two types of sequence variants. The proportions of these two variants are represented by the black and light gray wedges within the circles, which are placed on the map near the location where each group of individuals was sampled. Those from Africa or other regions with a large percentage of people of recent African descent, such as the Caribbean, had a much larger centromere sequence, represented by the multicolored wedges. Such variants could help monitor how centripetal regions evolve, as well as how these genetic variants relate to health and disease. Credits: Nicolas Altemose, UC Berkeley Altemose and his team, which included UC Berkeley scientist Sasha Langley, also used the new reference genome as a scaffold to compare the centromere DNA of 1,600 people around the world, revealing significant differences in both sequence and number. replicates of the repeating DNA around the centromere. Previous studies have shown that when groups of ancient humans migrated from Africa to the rest of the world, they took only a small sample of genetic variants with them. Altemose and his team confirmed that this pattern extends to the centromere. “What we found is that in people of recent descent outside the African continent, their centromere, at least on the X chromosome, tends to fall into two large groups, with the most interesting variation being in people of recent African descent,” he said. Altemose. he said. “This is not surprising, given what we know about the rest of the genome. But what he suggests is that if we want to see the interesting variation in them …