For their contribution to the President’s Precision Medicine Initiative, the New York Genome Center and IBM are expanding their existing collaboration for generating and analyzing genomic and clinical data from glioblastoma patients using the IBM Watson for Genomics system to include a new initiative focused on cancer in general.
The so-called NYGC-IBM Cancer Alliance is launching a six-month pilot that will sequence tumors from 200 cancer patients and compare how different types of sequencing approaches help inform and improve cancer care. They’ll use Watson to analyze variant, gene expression, and other kinds of information. They hope to be able to identify likely driver alterations for these patients’ cancers as well as locate actionable and potentially actionable targets for approved, investigational, and off-label drugs. They’ll provide treating physicians with clinical reports that include the relevant pathogenic variants and therapy options.
The partners also plan to build a comprehensive and open repository of raw whole-genome and exome data as well as variant calls, phenotype data, epigenetic data, and outcomes data collected from participating patients.
As part of the effort with IBM, NYGC is collaborating with several academic institutions to gather DNA, RNA, epigenetic, and other information from patients. Robert Darnell, NYGC’s founding director and CEO, told GenomeWeb that the center and its collaborators have begun recruiting patients for the pilot trial. The pilot will focus on multiple cancer subtypes and researchers are prioritizing patients with aggressive tumors that have resisted conventional therapies. Recruited patients will be consented and informed that their data will be put into a database that will be made available for research.
“We intend this to be a live study,” Darnell said. “These are patients who are needy in terms of having a difficult prognosis in front of them. We hope if we do find potentially actionable targets in them, [that we’ll] be able to convey that information back to the referring physician, who may or may not take that information and pursue [them] in a clinically validated way.”
Each patient will receive whole-genome sequencing, and some will get whole-exome sequencing while others will be tested with gene panels, he said.
Participation is not limited to patients in the New York area, although the NGYGC is starting the recruitment process with the hospitals that are associated with its founding partners. It’s a convenient starting point since patients may be local and “we have agreements already set in place,” Darnell said. However, “There’s no reason why it needs to be [in NY].” In future studies, clinical centers from around the country who are interested in participating in the effort will be eligible to do so.
“We will be looking for additional support to increase this from 200 [patients] to 2,000 and beyond,” he said. A larger-scale effort will require significant financial resources to support the sort of in-depth analysis that the partners propose. “We are looking for philanthropic and industry partners of any sort who might be interested in helping us build a dataset that is very deep and thoughtfully filtered,” Darnell said.
The NYGC-IBM alliance dates back to 2014, when the center first announced that it was partnering with IBM on a cancer study. For that study, they planned to use a prototype of Watson that was specifically designed to handle genomic data for a clinical research study aimed at finding better treatments for glioblastomas, a particularly aggressive brain cancer. Specifically, they planned to use Watson to combine genomic data with information from biomedical literature and drug databases to identify driver mutations and more effective treatments that target those mutations. That study collected data from approximately 20 recently diagnosed patients from nine participating hospitals in New York State. The study is ongoing but is nearing its conclusion, Darnell said.
Over the course of the glioblastoma study, researchers sequenced patients’ tumor and normal DNA and searched for cancer-specific variants by comparing data from both samples. Researchers at NYGC and their collaborators at partner hospitals including neurosurgeons and cancer biologists then analyzed each individual variant on a case-by-case basis.
“The interesting thing that we found in the course of the glioblastoma study is that … there’s real value to doing this, and there’s value to doing it with a team effort, and there’s value to doing it with looking at different platforms and comparing them,” Darnell told GenomeWeb. “The more you look and the more deep the data is, the more you find in terms of identifying what we think are the right things.”
That sounds simple “but in reality, it is complicated because there are different choices and different pathways that might be synergistically attackable … and there might be pathways that are untreatable, [and] you need to filter among the potential pathways where to spend your time and effort,” he continued. For each variant, researchers need to be able to search the literature and identify relevant downstream and upstream pathways as well as evaluate how cellular activities change when pathways are perturbed, among other kinds of information.
It’s a more complicated search space than was required for the television game show Jeopardy, on which Watson competed and won against humans in 2011. Watson not only has to learn to understand the language of scientific publications but also how to combine sequence, phenotype, and clinical information from health records, as well as outcomes data and reason across it. The sheer quantity of data that these sorts of genomics-based studies generate is also challenging. For example, each patient in the new study is expected to have about a terabyte of data associated with them, according to Darnell.
“Very early on, and even in the glioblastoma project, we got both excited and anxious that this [sort of analysis] is not scalable,” Darnell said. “It’s an interesting academic exercise if it’s done in 12 patients, but it doesn’t scale to transforming medicine in the context of the precision medicine initiative.” That’s why the NYGC initially partnered with IBM on the glioblastoma study and have agreed to collaborate on a second study. In addition to providing Watson for Genomics, “IBM has been very supportive of us in building out the infrastructure that we need for the compute and the storage for that amount of data for this project,” he said.
During the glioblastoma study, researchers at the NYGC and on the IBM team collaborated on Watson’s training and development. Their efforts included building, testing, and comparing analysis pipelines to see which approaches work best, as well as working to fill gaps in Watson’s interpretation abilities. They’ll continue to incorporate information from the new pilot into the system. “It learns with each iteration, and we are at the very early iterations of teaching Watson and bringing it up to speed,” Darnell said.
For the 200-person pilot, the partners will use an updated iteration of Watson for Genomics that has been trained on all kinds of cancer datasets expanding its scope beyond just glioblastoma. This is the result of multiple partnerships that IBM inked with early adopters at institutions such as Washington University in St. Louis, Yale University, and the University of North Carolina, Steve Harvey, vice president, IBM Watson Health, told GenomeWeb this week.
In the last year, he said, these partners have used Watson to analyze subsets of sequence data they collected from patients with lymphoma and melanoma, and pancreatic, ovarian, brain, lung, breast, and colorectal cancers, and that information has been used to further train and improve Watson for Genomics. “We are pretty excited about the results that we’ve seen.” Harvey said that his team has written a paper with University of North Carolina that describes some early results, though it’s not clear at present when the paper will be published.
Part of the cancer pilot study will involve comparing the different sequencing approaches and evaluating the quality and quantity of the data that these approaches yield. For each patient, the partners intend to, at a minimum, compare their whole-exome sequence with results from a gene panel, but they also hope to compare whole-genome sequences from the patients.
“Both the panel and the exome require hybridization capture, [so] you are pulling out parts of the DNA from the tumor to look at just the parts that a priori we now think are important [that is] the coding sequences of a small subset of genes or the coding sequences of all genes,” Darnell said. The limitation of this approach is that capture introduces bias into data. “So things that are very GC rich may be captured but not be washed off very well, and things that are very AT-rich get washed off too easily, so for each exon you get uneven coverage.”
Whole-genome sequencing does not require hybridization, and so doesn’t introduce the same biases into the data, and also offers better exon coverage. There is also a lot of information in non-coding regions of the genome that could be vital for understanding how tumors work that WGS captures compared to other methods.
This kind of multi-modal analyses “gives us a chance to compare the cost and benefit of these different types of platforms,” Darnell said, “and it may vary from tumor type to tumor type or clinical situation to clinical situation. … We are very sensitive to the fact that payors have to ultimately pay for this, and we need to demonstrate where and in what kind of clinical situation there’s value for doing the different platforms.”