Putting ML to work for cancer patients
By Joyce Dall'Acqua Peterson
A patient with cancer is unlikely to meet the pathologist who has analyzed her biopsy sample, yet the pathologist’s analysis will drive many decisions around her treatment options.
Pathologist Todd Sheridan describes his job this way: “I look through a microscope to evaluate slides containing tissue from a biopsy or resection. Usually this is a histopathology image that we call H&E because the sample is treated with two histological stains, hematoxylin and eosin.” Pathologists are trained to recognize cancer cells, he says, “and we have a number of techniques to improve our diagnostic accuracy such as immunochemistry, which uses antibodies to confirm the presence of certain markers on the surface of cancer cells.”
But, Sheridan says, pathology has lagged behind radiology and some other medical fields in deploying advanced computer image analysis to improve tumor sample interpretation. Sheridan holds joint appointments with Hartford Pathology Associates at Hartford Hospital in Connecticut and The Jackson Laboratory for Genomic Medicine in Farmington, where he works with on a project to harness the power of machine learning for pathology.
“If we can provide pathologists with tools to interpret tumor images faster and more precisely,” Chuang says, “cancer patients will benefit from more targeted and effective treatment approaches.”
Chuang was the lead author of a published in Nature Communications that harnessed the power of machine learning to capture detailed information about cancers from scanned images. From The Cancer Genome Atlas, they used 27,815 scanned images from over 19 cancer types to “train” their convolutional neural network (CNN) software to reveal how cancers from different organs are related.
Today researchers around the world are beginning to explore the possibilities of computer image analysis, including Chuang’s collaborators at Yale University, Boston University and the University of Massachusetts-Boston. His lab at JAX, he says, brings special capabilities to the challenge.
“One of our leading strengths is our ability to integrate data from multiple cancer types simultaneously,” Chuang says. “For example, we have found image features that can predict cell density in tumors, and these features are accurate in breast tumors, lung tumors, bladder tumors and many others. In addition, we have special expertise in combining protein imaging data with histopathological images, which allows us to make better clinical predictions, and also identify those cases where the H&E data are already capturing the key predictive signals.”
Pathologists can tell a lot from an H&E image, Chuang says, “such as whether cells are dying or whether a tumor has an unusual appearance. But they can’t ‘see’ the genetic aberrations that are driving the cancer, such as the amplification of the HER2 gene that is common in breast cancer, unless they use a special HER2 stain as well as the H&E.”
When a pathologist identifies this HER2 amplification, he says, the patient is usually treated with targeted agents such as the drug trastuzumab (better known as Herceptin). “This treatment can be very effective for these patients, and lead to very good outcomes,” Chuang says.
Working with collaborators at the Yale University School of Medicine, the Chuang lab analyzed about 200 H&E samples for HER2 amplifications. After correcting for some variations in image processing, he says they achieved an accuracy of about 80 percent. “What’s remarkable about this is that pathologists cannot identify HER2 status directly from H&E images, we now can” he says.
has joint appointments in surgical oncology at Hartford Hospital and computational oncology in the Chuang lab. “What Todd and I bring to the team is our clinical perspective. As we build tools from these data sets, how are we going to shape those insights that we get from deep learning into a clinical tool that might actually impact patient care?”
Today, Rubinstein points out, genomic sequencing of tumors accurately identifies genetic mutations that drive cancers. “High-throughput sequencing provides a very detailed look into what tumors are doing, how they are evolving and what pathways are driving them. But it’s expensive to sequence tumors, and not everyone has access to that kind of care.”
Instead, she says, “if we can marry molecular data and deep learning-based imaging data, those H&E-stained images that are already produced during a standard clinical workflow could provide valuable information for clinicians. These tools can be implemented without any additional assays being performed on the tumor. You can implement them remotely, so the patient doesn’t have to be at a center that has specialized capacity for molecular testing.
“Bringing high-level computational insights to any patient, without additional cost, would be a great leveler,” Rubinstein says.
Chuang says the team is already bringing this data-driven image analysis approach to other tumor types, including colon cancer, melanoma and sarcomas.
“The thing about this kind of imaging data,” he says, “is that there's a natural intuition to interpreting it. Imaging data tells you the shapes of cells, where they are, what kinds of cells are active, whether there are dead cells or big spaces between them. But there may be very important mechanistic things in the cells that we just can't see.” Sequence data, in contrast, provides great detail, but “but it’s just less intuitive.”
Chuang says the machine-learning image analysis techniques his lab is refining are part of a new wave of advanced data types, such as spatial transcriptomics, and spatial protein and metabolomic profiling. “And we're working on methods to integrate these so medical images can communicate much more to clinicians.”