Research
I am currently working on a survey of a sample of introductory data science courses at the undergraduate level. The purpose of the survey is to document and better understand the current state of courses at this level. We want this survey to provide a reference for undergraduate professors who are (a) designing an introductory data science course at their institution and are looking for guidelines and materials to help them get started; or (b) looking to enhance their existing introductory data science course and/or foster collaborations with other institutions. The eventual venue for this information is the background section of a paper we are writing on the introductory data science curriculum developed by Dr. Çetinkaya-Rundel. The goal is not a comparative study of the courses we review, but instead to provide a snapshot of the various of approaches to teaching introductory data science.
In addition to my pedagogical research, I would like to continue working in the area of my doctoral research. My dissertation involved creating an extension of what is known as the ratio cut, a commonly used clustering metric used to partition a graph or network creating two clusters of roughly balanced size. Typically graph partitioning problems such as this are NP-hard. However, there exists a fast heuristic for the ratio cut problem based on matrix decomposition methods. The metric that I created in my dissertation is an extension of the ratio cut in that it allows the graph to be partitioned into more than two clusters and it allows for divisive hierarchical clustering techniques to be applied to the graph. Using a metric that allows for divisive hierarchical clustering techniques is particularly useful for gene expression similarity graphs, given that gene expression tissue clusters tend to be highly nested in nature. In the continuation of this research, I would hope to be able to similarly create a fast heuristic for this metric, like that found for the ratio cut metric.
I am currently working on a survey of a sample of introductory data science courses at the undergraduate level. The purpose of the survey is to document and better understand the current state of courses at this level. We want this survey to provide a reference for undergraduate professors who are (a) designing an introductory data science course at their institution and are looking for guidelines and materials to help them get started; or (b) looking to enhance their existing introductory data science course and/or foster collaborations with other institutions. The eventual venue for this information is the background section of a paper we are writing on the introductory data science curriculum developed by Dr. Çetinkaya-Rundel. The goal is not a comparative study of the courses we review, but instead to provide a snapshot of the various of approaches to teaching introductory data science.
In addition to my pedagogical research, I would like to continue working in the area of my doctoral research. My dissertation involved creating an extension of what is known as the ratio cut, a commonly used clustering metric used to partition a graph or network creating two clusters of roughly balanced size. Typically graph partitioning problems such as this are NP-hard. However, there exists a fast heuristic for the ratio cut problem based on matrix decomposition methods. The metric that I created in my dissertation is an extension of the ratio cut in that it allows the graph to be partitioned into more than two clusters and it allows for divisive hierarchical clustering techniques to be applied to the graph. Using a metric that allows for divisive hierarchical clustering techniques is particularly useful for gene expression similarity graphs, given that gene expression tissue clusters tend to be highly nested in nature. In the continuation of this research, I would hope to be able to similarly create a fast heuristic for this metric, like that found for the ratio cut metric.