Data Too Complex for Valid Conclusions -- MEDICA - World Forum for Medicine

Digital data tools have allowed scientists to see that nature is more complex than we thought, and while they don’t yet know what the overarching biological rules are − such as the interrelationship between multiple signalling pathways that can lead to cancer development − they are trying to play the game like they do, said review’s lead author, Robert Clarke, Ph.D., D.Sc., professor of oncology and physiology & biophysics at Georgetown University Medical Center (GUMC). “The answers to our questions are probably there in the data,” he said, “but the issue is whether we can get them using these complex tools and, also, how we will know they are right when we see them.”

Despite the lack of understanding, many studies have been published that link specific “biomarkers” − genes, mRNA or proteins − with an aspect of cancer development or treatment, and the results often appear to be statistically valid, Clarke said. “But it is not clear that that solution is complete or is necessarily correct. It may be partly right and may be intuitively pleasing because you are getting what you expected to see from an experiment. That could be a trap, a self-fulfilling prophecy.”

And while the findings may “fit” in the tumour samples they are tested in, they may not if other tumour tissue is studied, and many times researchers don’t take that extra step, the researchers said in their article. “The lack of rigorous validation is a problem that currently plagues cancer research, Clarke added.

Another pitfall in using the new technology is the “curse of multi-dimensionality,” Clarke said. “You have a lot of measurements, and the statistical model gets very complicated. So sometimes you don’t have enough computing power to derive the right answer or you get an answer that is only true for part of the data.” In other words, scientists “do not always know what they do not know” when looking at multi-dimensional data sets.; Source: Georgetown University