Project Portfolio

Enhancing DataTrail Data Science Education: The BaltimoreTrails R Package and Dashboard

The BaltimoreTrails R package and associated dashboard are a comprehensive toolkit designed to facilitate the integration, manipulation, visualization, and interactive access of Baltimore datasets. Developed as part of the DataTrail initiative by the Johns Hopkins Bloomberg School of Public Health, this package and associated dashboard aim to provide a more localized, interactive, and relevant learning experience for students.

An R Shiny App to Introduce and Apply Survival Analysis Ideas

Together with Tiffany Hsieh and Bowen Chen, I am working on this R Shiny web-application to introduce ideas and provide intuition of survival analysis concepts from both theoretical and applied perspectives. Given a publicly available Moderna vaccine survival dataset on GitHub which we wanted to further explore, we outline several basic survival analysis concepts in a non-mathematical manner and then apply those concepts to the aforementioned dataset.

Pediatric Traumatic Brain Injury (TBI) Mortality Prediction Web-Application

This pediatric TBI prediction web-application is part of my honors thesis on imbalanced outcome pediatric patient mortality classification, where the best-performing C5.0 decision tree classifier trained on Synthetic Minority Over-sampling TEchnique (SMOTE) subsampled data is the predictive model implemented here. Potential applications of such a web-application, as recommended to us by clinicians, could be when either diagnosis by a clinician is not possible or as an indirect clinicians aid to give an approximate survival estimate for parents or guardians. Simplicity, speed, and portability were the main priorities considered when developing the application.

An Intuitive Introduction to Metropolis-Hastings Algorithm Sampling and Diagnostics

The first aim of this stochastic models and simulation project is to understand the metropolis-hastings algorithm and several markov-chain-monte-carlo diagnostic methods at a more intuitive and visual level through plots that are both animated and interactive. My second aim is to present my first aim in a cohesive and compact manner to those unfamiliar with MCMC and the R-Programming Language. It is important to mention that the time-dependent nature of a markov chain and the amount of visually appealing parameters are optimal for creating animated illustrations.

Comparing Variable Selection Techniques on Simulated Data

The aims of this project are to simulate multivariate data from three underlying linear models with varying degrees of correlation among predictors and then observing how effective different variable selection methods perform on each dataset by fitting a linear regression model with the selected variables. Thus, we first simulate 50 datasets for every underlying model from section 7 of the Paper “Regression Shrinkage and Selection via the Lasso” by Tibshirani (1996). We then fit the models according several variable selection and shrinkage methods and consider Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Adjusted R-Squared Value as performance metrics.

Shiny-Based Personal Project Portfolio Web-Application

I created this project as my previous personal website to improve my R Shiny and HTML programming skills. Given that Shiny applications are more dynamic and are made for visualizing and making statistical analyses more interactive, I decided to retire this web-application for this current website.