Researchers from the University of Eastern Finland, Aalto University, and the University of Oulu have introduced a powerful computational method called KMAP, designed to explore patterns in DNA sequences more intuitively. By projecting short DNA sequences—known as k-mers—into a two-dimensional space, KMAP enables clearer visualization and interpretation of biologically significant DNA motifs. This breakthrough approach helps researchers uncover how regulatory elements behave in different biological contexts.
The new study, recently published by the team, demonstrates KMAP’s capabilities in a variety of applications. One key example is its use in re-analyzing data from Ewing sarcoma, a rare type of cancer. The researchers discovered that the transcriptional repressor ETV6 binds to and blocks enhancer regions that are normally targeted by the transcription factor FLI1, thus contributing to disease progression. However, when ETV6 is degraded, these enhancers become accessible again, allowing FLI1 and other transcription factors—BACH1, OTX2, KCNH2, and possibly an unidentified one—to bind and regulate gene expression.
Importantly, the study also uncovered a previously uncharacterized DNA motif, CCCAGGCTGGAGTGC, which frequently appears within 70 base pairs of BACH1 and OTX2 binding sites. This close spatial clustering suggests the motif could represent a novel regulatory element, potentially significant in cancer biology.
KMAP’s utility extends beyond cancer research. The method was also applied to study DNA repair following CRISPR-Cas9 genome editing at the AAVS1 locus in human cells. After editing, cells repaired the DNA in varied ways. By analyzing thousands of repair outcomes, KMAP identified four common sequence patterns, each linked to a different cellular repair pathway. This insight may improve the design of more accurate gene-editing strategies and help predict how cells are likely to respond to such interventions.
“KMAP offers a more intuitive way to investigate motifs in DNA sequence data,” explains Dr. Lu Cheng from the University of Eastern Finland, the study’s lead author. “By visualizing the distribution of short DNA sequences, we can better interpret regulatory patterns and understand how they change under different biological conditions.”
Professor Gonghong Wei from the University of Oulu adds, “KMAP is a versatile tool that can be applied to many types of sequencing data. In cancer research, it helps identify regulatory elements from ChIP-seq data, and it also holds promise for studying RNA-binding proteins and their binding preferences. Its ability to reveal structure in complex sequence data makes it broadly useful across molecular biology.”
With its ability to clarify complex genomic patterns, KMAP represents a significant step forward in decoding the regulatory language of DNA and holds promise for applications in cancer research, gene editing, and beyond.
By Impact Lab