Combing Statistics for Clues to Future Mass Shootings
Yew-Meng Koh, Ph.D. | Assistant Professor of Mathematics
Sandy Hook. Marjory Stoneman Douglas High School. The Capital Gazette newspaper. The Route 91 Harvest Music Festival in Las Vegas. Orlando’s Pulse nightclub. The First Baptist Church of Sutherland, Texas. The Tree of Life synagogue in Pittsburgh. The roll call and the death toll keep rising, steadily and seemingly unchecked, and random mass shootings are becoming part of the fabric of life in America.
But what if they are not random? What if an algorithm could be created that would predict not only where and when the next multiple murder might occur, but also the type of person most likely to commit it?
That is the timely and Herculean challenge that Dr. Yew-Meng Koh has undertaken. “Basically, what we want to do is categorize, try to cluster, the different shooting incidents that have taken place in the U.S.,” explains Koh, a native of Malaysia who specializes in statistical analysis.
“And when I say ‘cluster,’ what I mean is to put like shootings — and some become more alike in many ways over time — to put them together. Then once we have achieved that, to look within the clusters we have formed to see if we can identify any patterns that are shared by the incidents. And to compare across clusters to see what differences jump out at us as well. That’s the objective.”
He’s discovering intriguing patterns about how age and ethnicity correlate with two things: some shooters’ social media posts that give advance notice of their intentions, and the locations where they commit the crimes.
Funded by a Nyenhuis grant from the college and support from the Michigan Space Grant Consortium, Koh selected students John McMorris ’19 and Tyler Gast ’19 to perform the bulk of the preliminary research in summer 2018. “While I was away in Malaysia, I Skyped them every day to check on their progress, and also to give them new tasks to do,” Koh says.
For the foundation of their research, Koh, Gast and McMorris used statistics compiled by the progressive nonprofit magazine Mother Jones, which created a database of mass shootings in America from 1982 to 2018, as part of an in-depth investigation.
“We had to check that the Mother Jones data was accurate, because that data set seems to be open to everyone,” Koh notes. “We were wondering whether people had access to make changes. So the first thing I had my students do was ensure that all the data was substantiated. We cross-verified it with the FBI’s data set.”
Defining terms was important, too. Ideas about what constitutes a “mass killing” vary widely — and consequently, so do the total number of mass killings that various agencies and media outlets report. Koh settled on this definition: “at least three or more killings in a single incident, which may or may not include the shooter.” Koh used the FBI definition of “active shooter: one or more individuals actively engaged in killing or attempting to kill people in a populated area.”
He considered other variables, too: shooters’ age, race, and mental health issues (if any); where incidents occurred; and whether shooters announced their intentions beforehand on social media. “I had my students verify all these facts, and we added in a few other pieces of information like geographic region,” Koh says.
Then came the process of summarizing and comparing the variables. “There’s a statistical technique called principal component analysis, or PCA, which tries to crystallize information from many variables and basically gives you what you should be looking at as the biggest contributors to your investigation,” Koh explains.
Among Koh’s preliminary findings is that mental stability does not necessarily play a role in a shooter’s desire to announce his or her intentions in advance. “We have found that people who have, and do not have, mental health issues will post on social media, so that is not an indicator of the mental health of the individual,” he says.
However, in what Koh considers one of his most interesting findings so far, he discovered that a shooter’s race is strongly correlated to whether the individual posts in advance about a planned attack. “The majority of white shooters will post their intentions on social media, whereas the majority of nonwhites — especially Asian and Hispanic shooters — would not.”
Koh has also noticed a correlation between workplace mass shootings and the age of the perpetrator. Older shooters — aged 50 and above — are responsible for the predominant number of workplace shootings. And analyzing the age distribution of shooters, he found that older shooters more often attack in the workplace than in all other types of locations combined. “And older shooters tend not to post on social media, either,” he adds.
Most interestingly, perhaps, Koh has found in looking across his clusters that the number of victims may correlate to the race of the shooter. “It seems to be that white shooters tend to leave more fatalities, but that could be clouded by the fact that the demographic of the U.S. is majority white,” he says.
This winter Koh has continued to process and analyze data. He’s aware that other researchers also may be scouring data looking for patterns that might help prevent the next mass shooting, but a fresh set of eyes and a distinctive statistical perspective can’t hurt the effort. He hopes to publish his findings this year.