Machine learning: mathematics in flux

Environmental portrait of Matthew Gerring, Bar Harbor campus. Photo credit: Tiffany LauferJAX's Matthew Gerring on the Bar Harbor campus. Photo credit: Tiffany Laufer

Rooted in mathematics, machine learning and artificial intelligence (AI), may seem futuristic. At this point, however, they are tried-and-true computational methods used by the research community. But what is machine learning really? Matthew Gerring has a simple answer.

“Machine learning is based on statistics,” he says. “It generates models from training data and applies those to novel datasets, resulting in predictions.”

Teaching the tools

Gerring, a senior manager in computational sciences at The Jackson Laboratory (JAX), has been working in computational science for most of his career. While his work has varied from computational fluid dynamics to big physics experiments and geology, Gerring has consistently held an interest in developing models for predicting nature. How the machine learning and deep learning revolution can be applied at JAX has become a focus of his recent efforts.

At JAX, Vivek Kumar’s research uses machine learning to automate analysis of mouse behavior and movement, and it has provided Gerring and his team with one of the most challenging undertakings they have had at JAX. The Kumar laboratory requires an algorithm to predict the phenotypes (measurable traits) of mouse models without human interference.

“You can determine their behavior via video, and deep learning and machine learning can go deeper by analyzing those videos and drawing conclusions about individual mice. The researchers don't have to interact with them. This gives a much more consistent categorization of phenotype, for example.”

How does one make an algorithm capable of capturing a constant stream of input, like 24/7 video footage in the Kumar lab? It starts with the basics, a hypothesis. “(Researchers) make interpretations, and each of those experts provides data for input into your model. You'll get a virtual version, and each will have their strengths and their biases.”

Gerring notes that formulating an algorithm is not a universal solution. Machine-learning is as limited and biased as the data it is supplied. It takes constant refining to essentially teach the software to be as optimized as possible.

Maturing machine learning

Machine learning currently is a buzz-worthy term, as it has become more accessible and recognizable in the public domain. Gerring personally characterizes the hype around implementing computational tools as more of an evolution rather than a revolution though.

“People have noticed them now in the wider community,” he says. “They've noticed how good they are because they have them on their smartphones, and they're able to identify butterflies, birds and plants, things like that. But these models have existed in mathematics for a long time, and they've just gotten better. And while I think the technology is relatively mature, that doesn't mean that we've exploited it in research to the best we can. It is in the application of this technology that we have most to gain.”

Digital researchers of the future?

From Alexa to Tesla, the concept of a “digital twin” originally pioneered by NASA has integrated into society, going far beyond asking what’s the weather in the area or modelling a rocket production system. Digital twins integrated with machine learning are already being used in medicine and engineering, and their usefulness has only scratched the surface.

“I think there's a long way to go until there's a digital twin in biological science,” says Gerring, “but you can imagine what it would be like by watching science fiction films or reading science fiction books where they often are able to talk to the computer, presumably using something like natural language processing. They get the computer's opinion based on the different models that the computer has about reality. That is something we're far away from in our science now, but the individual software components on which we are working now will add up to that capability one day.”

For JAX, the limitation currently lies with getting the vast volumes of scientific data to the correct algorithms and generating useful results for the researchers, rather than any underlying limitation of machine learning research. It is a software engineering and integration problem Gerring says it is not far from being solved.

“You can't interact with (computers) and use them as a kind of second researcher yet, but this is happening at a simpler level already, right? We can search very quickly and succinctly for papers for example, and we couldn't do that 20 years ago. These models will become better integrated. Natural language processing will get better. Looking forward, I think at JAX there is the potential to employ a digital twin when investigating. That's something I think will really mature, and I'm looking forward to helping make that happen.”