Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Breaking Deep Research: Where Retail User LLM Search Agents Fail and Why Verification Still Falls on You

15 minute read

Published:

Deep research tools from OpenAI, Google, and Perplexity promise source-grounded synthesis, but their reliability depends heavily on what they are searching. Professional tools connected to peer-reviewed databases like PubMed have a built-in quality gate: the barrier to publication is literally peer review. But for code repositories, blog posts, and other sources with no editorial barrier, deep research inherits every error, exaggeration, and fabrication in the source material. Digging into some of the recent literature myself, it looks like studies often find citation accuracy as low as 40% and fabricated references. I tested all three tools on the same library-evaluation question as a small experiment to show how they can fail (a la proof by contradiction). One tool recommended an abandoned library as its top pick with an apparently fabricated release date. The issue of AI-generated content infiltrating peer review itself is a separate and important problem, but not within the scope of this post.

Pure Indexing, Flirting with Factors, and Finding Intermediaries: Thoughts on Addressing Three-Fund Portfolio Inefficiencies

44 minute read

Published:

After reading Ang’s ‘Asset Management: A Systematic Approach to Factor Investing’ and rereading the newest 50th anniversary edition of Malkiel’s ‘A Random Walk Down Wall Street,’ I wanted to explore how funds like Dimensional’s DFUS address indexing inefficiencies without fully committing to factor tilts. The goal: capture market beta with index inefficiencies such as adverse selection addressed. 5-factor regressions indicate DFUS is a VTI equivalent for US equities, but a VXUS equivalent remains unsolved. This is not financial advice and I am just your average retail investor.

portfolio

Multi-Agentic Oncology Value Scorecard Creation with LLMs

Based on some great discussions with my previous Pfizer colleagues Brett South, Ajit Jadhav, Jay Ronquillo, Jon Mauer, and Stephen Watt, this project aims to replicate established oncology value frameworks, such as the ISPOR Scorecard and ASCO Value Framework, using Large Language Models (LLMs) to validate their capabilities in reproducing human-derived scorecards. The project implements and compares three LLM-based approaches (multi-agent systems, single LLM pipelines, and retrieval-augmented generation) using data from ClinicalTrials.gov, PubMed, and OpenFDA. A fourth MOA-based multi-agent framework was later integrated for enhanced synthesis and traceability.

Enhancing DataTrail Data Science Education: The BaltimoreTrails R Package and Dashboard

The BaltimoreTrails R package and associated dashboard are a comprehensive toolkit designed to facilitate the integration, manipulation, visualization, and interactive access of Baltimore datasets. Developed as part of the DataTrail initiative by the Johns Hopkins Bloomberg School of Public Health, this package and associated dashboard aim to provide a more localized, interactive, and relevant learning experience for students.

An R Shiny App to Introduce and Apply Survival Analysis Ideas

Together with Tiffany Hsieh and Bowen Chen, I am working on this R Shiny web-application to introduce ideas and provide intuition of survival analysis concepts from both theoretical and applied perspectives. Given a publicly available Moderna vaccine survival dataset on GitHub which we wanted to further explore, we outline several basic survival analysis concepts in a non-mathematical manner and then apply those concepts to the aforementioned dataset.

Pediatric Traumatic Brain Injury (TBI) Mortality Prediction Web-Application

This pediatric TBI prediction web-application is part of my honors thesis on imbalanced outcome pediatric patient mortality classification, where the best-performing C5.0 decision tree classifier trained on Synthetic Minority Over-sampling TEchnique (SMOTE) subsampled data is the predictive model implemented here. Potential applications of such a web-application, as recommended to us by clinicians, could be when either diagnosis by a clinician is not possible or as an indirect clinicians aid to give an approximate survival estimate for parents or guardians. Simplicity, speed, and portability were the main priorities considered when developing the application.

An Intuitive Introduction to Metropolis-Hastings Algorithm Sampling and Diagnostics

The first aim of this stochastic models and simulation project is to understand the metropolis-hastings algorithm and several markov-chain-monte-carlo diagnostic methods at a more intuitive and visual level through plots that are both animated and interactive. My second aim is to present my first aim in a cohesive and compact manner to those unfamiliar with MCMC and the R-Programming Language. It is important to mention that the time-dependent nature of a markov chain and the amount of visually appealing parameters are optimal for creating animated illustrations.

Comparing Variable Selection Techniques on Simulated Data

The aims of this project are to simulate multivariate data from three underlying linear models with varying degrees of correlation among predictors and then observing how effective different variable selection methods perform on each dataset by fitting a linear regression model with the selected variables. Thus, we first simulate 50 datasets for every underlying model from section 7 of the Paper “Regression Shrinkage and Selection via the Lasso” by Tibshirani (1996). We then fit the models according several variable selection and shrinkage methods and consider Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Adjusted R-Squared Value as performance metrics.

Shiny-Based Personal Project Portfolio Web-Application

I created this project as my previous personal website to improve my R Shiny and HTML programming skills. Given that Shiny applications are more dynamic and are made for visualizing and making statistical analyses more interactive, I decided to retire this web-application for this current website.

publications

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.