Project Portfolio

Multi-Agentic Oncology Value Scorecard Creation with LLMs

This project explores the use of large language models (LLMs) to replicate and generate oncology value scorecards, such as the ASCO and ISPOR frameworks. These frameworks evaluate cancer treatments based on clinical benefit, toxicity, and other patient-centered outcomes. It implements and compares three LLM-based approaches: multi-agent systems, single LLM pipelines, and retrieval-augmented generation (RAG), using data from ClinicalTrials.gov, PubMed, and OpenFDA. A fourth, more advanced MOA-based multi-agent framework was later integrated for enhanced synthesis and traceability. Scorecard outputs were benchmarked against published gold standards to assess accuracy and reproducibility.

Exploratory COVID-19 Modeling: Because There Weren’t Already Enough Predictive COVID-19 Projects

This project explores the dynamics of the COVID-19 pandemic through three lenses: healthcare system strain, pandemic fatigue, and policy effectiveness. We developed predictive models for ICU utilization using LSTM and ensemble methods, achieving strong generalizability across countries and time periods. Pandemic fatigue was operationalized and detected using social media sentiment, policy data, and case trends, reaching over 89% balanced accuracy. We also applied causal and time-series methods, including wavelet coherence, to study the delayed effects of policy interventions.

Enhancing DataTrail Data Science Education: The BaltimoreTrails R Package and Dashboard

The BaltimoreTrails R package and associated dashboard are a comprehensive toolkit designed to facilitate the integration, manipulation, visualization, and interactive access of Baltimore datasets. Developed as part of the DataTrail initiative by the Johns Hopkins Bloomberg School of Public Health, this package and associated dashboard aim to provide a more localized, interactive, and relevant learning experience for students.

An R Shiny App to Introduce and Apply Survival Analysis Ideas

Together with Tiffany Hsieh and Bowen Chen, I am working on this R Shiny web-application to introduce ideas and provide intuition of survival analysis concepts from both theoretical and applied perspectives. Given a publicly available Moderna vaccine survival dataset on GitHub which we wanted to further explore, we outline several basic survival analysis concepts in a non-mathematical manner and then apply those concepts to the aforementioned dataset.

Pediatric Traumatic Brain Injury (TBI) Mortality Prediction Web-Application

This pediatric TBI prediction web-application is part of my honors thesis on imbalanced outcome pediatric patient mortality classification, where the best-performing C5.0 decision tree classifier trained on Synthetic Minority Over-sampling TEchnique (SMOTE) subsampled data is the predictive model implemented here. Potential applications of such a web-application, as recommended to us by clinicians, could be when either diagnosis by a clinician is not possible or as an indirect clinicians aid to give an approximate survival estimate for parents or guardians. Simplicity, speed, and portability were the main priorities considered when developing the application.

An Intuitive Introduction to Metropolis-Hastings Algorithm Sampling and Diagnostics

The first aim of this stochastic models and simulation project is to understand the metropolis-hastings algorithm and several markov-chain-monte-carlo diagnostic methods at a more intuitive and visual level through plots that are both animated and interactive. My second aim is to present my first aim in a cohesive and compact manner to those unfamiliar with MCMC and the R-Programming Language. It is important to mention that the time-dependent nature of a markov chain and the amount of visually appealing parameters are optimal for creating animated illustrations.

Comparing Variable Selection Techniques on Simulated Data

The aims of this project are to simulate multivariate data from three underlying linear models with varying degrees of correlation among predictors and then observing how effective different variable selection methods perform on each dataset by fitting a linear regression model with the selected variables. Thus, we first simulate 50 datasets for every underlying model from section 7 of the Paper “Regression Shrinkage and Selection via the Lasso” by Tibshirani (1996). We then fit the models according several variable selection and shrinkage methods and consider Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Adjusted R-Squared Value as performance metrics.

Shiny-Based Personal Project Portfolio Web-Application

I created this project as my previous personal website to improve my R Shiny and HTML programming skills. Given that Shiny applications are more dynamic and are made for visualizing and making statistical analyses more interactive, I decided to retire this web-application for this current website.