UMD Researcher Awarded $364K to Make Data Analysis Faster and More Accessible
Modern datasets are massive—they can contain billions of values and an overwhelming amount of information. The data itself is invaluable and can provide vital insights, but how can we efficiently make sense of it?
Laxman Dhulipala, an assistant professor of computer science with a dual appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), is building open-source tools to help researchers do just that.
He is part of a multi-institutional team that has been awarded $364K from the National Science Foundation to help us better understand complex data using techniques called clustering algorithms.
“Clustering algorithms are tools that organize data points based on how similar they are,” explains Dhulipala. “They create clusters of similar values while keeping different types of data separate.”
In our own lives, we all belong to a range of clusters—some small, some large, some nested within another. For example, you may have a cluster of high school friends. This group, in turn, belongs to the larger clusters of their respective college alumni, and they all also belong to broader communities consisting of people from their home country.
“Identifying the natural cluster structure of our social groups helps explain our own histories and identities,” Dhulipala says. “Similarly, clustering algorithms uncover the underlying structure of huge amounts of data, helping us organize it better.”
Clustering algorithms have a wide range of applications in various fields, from computational biology to social network analysis to security and privacy. In fact, people have been using clustering techniques to solve problems for centuries. In 1854, a physician named John Snow discovered that contaminated water caused cholera outbreaks by using a very early form of clustering to pinpoint the exact well in New York that was problematic.
However, despite their long history, most existing clustering algorithms are too slow to handle big datasets. Dhulipala’s project—a collaboration between the University of Maryland, Massachusetts Institute of Technology (MIT), and Brown University—aims to address this limitation. The researchers are designing algorithms that create “sparse” graphs to simplify data and preserve essential information, helping organize data more efficiently. They are also developing parallel algorithms that can exploit multiple processing cores at the same time and thus run faster.
One of the team’s most important goals is to enhance the project’s accessibility and transparency, so they plan to develop an open-source toolkit that will make their new algorithms readily available to scientists and researchers.
“We want to make sure that other people benefit from what we’re doing. It's fun to just think about and solve problems, but we also want to share our findings with the world," Dhulipala says. "I believe that trying to make the world more open and transparent is part of our responsibility as scientists."
He adds that this transparency is what will ensure the responsible use of such technology. However, he also warns that there can be disadvantages and risks involved in understanding the structure of data, since these algorithms can be used for malicious purposes too.
“That’s why I believe these algorithms shouldn’t be used behind closed doors at companies,” he says. “We should do what we can to understand the power, the limitation, and the effects of these technologies.”
Dhulipala acknowledges the importance of working with his counterparts—Ellis Hershkowitz, an assistant professor of computer science at Brown University; Julian Shun, an associate professor of electrical engineering and computer science at MIT; and their graduate students—to help overcome research challenges along the way.
“The grant has been a great opportunity to combine our strengths, find opportunities for collaboration, and enhance interaction between our graduate students,” Dhulipala says.
He also appreciates UMIACS’ role in ensuring the project’s smooth, efficient implementation.
“I’m truly grateful for the support from UMIACS,” Dhulipala says. “The resources, especially the staff support, has been extremely helpful.”
He emphasizes how the staff’s professionalism and responsiveness helped him navigate the complex process of writing research proposals.
Dhulipala believes that his team’s work has the potential to have a powerful impact on scientific research and various industries, especially the tech industry, and even in policy spaces. With efficient and accessible clustering algorithms, they aim to help researchers, companies, and government agencies use data to solve the world’s most complex problems.
—Story by Aleena Haroon, UMIACS communications group