STEFAN
X.
SOH

Computer Science @ Elmhurst University

Building ML pipelines and predictive models while playing NCAA Division III basketball. Founder of SohAI. Passionate about quantitative research, sports analytics, and automation.

3.6
GPA
Dec '27
GRADUATION
Stefan Soh - Elmhurst Bluejays Point Guard
#3

IN THE NEWS

Featured Articles & Recognition

LATEST
๐Ÿ“ฐ

Featured in Elmhurst University Athletics

Read about my journey as a student-athlete combining data science with Division III basketball.

Read Full Article โ†’

FEATURED PROJECTS

ML, Data Science, Sports Analytics & Automation

01

NBA BetIQ

Built leak-free ML pipeline predicting NBA outcomes using 33 market-based features from Vegas odds and betting data. Achieved 72% accuracy with properly calibrated probabilities. Deployed prediction API using FastAPI and Docker.

Python SQL XGBoost FastAPI Docker pandas
02

Attention Market Simulator

Built microstructure-inspired simulation modeling human attention as a financial market with dopamine spikes as volatility shocks. Trained ML models for regime classification and direction forecasting using 15+ engineered features including order flow imbalance, liquidity metrics, and rolling volatility.

Python NumPy pandas scikit-learn Reinforcement Learning Market Microstructure
03

Unfelt Time

Modeled relativistic time dilation across 48,000 cities using gravitational and rotational physics equations. Built interactive Streamlit visualization app with neural network approximations and LLM-generated explanations to make complex physics concepts accessible.

Python TensorFlow Streamlit Neural Networks Physics Simulation Visualization
04

SohAI - AI Automation Business

Founded AI automation consultancy generating $750/month recurring revenue. Built Python-based ETL pipelines and web scraping workflows extracting 300+ verified leads. Engineered automated outreach systems integrating ChatGPT API, Bland.ai, LangChain, and n8n workflows.

Python BeautifulSoup Selenium n8n ChatGPT API LangChain Bland.ai

WORK EXPERIENCE

Building products, analyzing data, and creating value

Founder / Data Automation Engineer

SohAI
Remote
Oct 2024 - Present
  • Built Python-based ETL pipelines and web scraping workflows extracting 300+ verified leads across dental, orthodontist, chiropractor, and real estate datasets using BeautifulSoup, Selenium, and pandas
  • Engineered automated outreach systems integrating ChatGPT API, Bland.ai, LangChain, and n8n workflows, generating $750/month recurring revenue from real estate and healthcare clients
  • Deployed AI phone agents and lead qualification systems processing 100+ monthly interactions

Data Intern

Move For Hunger
Remote
Jan 2025 - May 2025
  • Cleaned and standardized 5,000+ donor records using Python (pandas, NumPy) and SQL, identifying 100+ high-value donor prospects through statistical analysis and segmentation
  • Built automated ETL pipelines using Python scripts and SQL queries, reducing manual reporting time by 15 hours/week
  • Generated donor engagement dashboards and quarterly summaries in Excel for executive leadership

Investment Research Intern

Cannataro Family Partners
New York, NY
Jun 2021 - Jul 2021
  • Analyzed portfolio performance across 12+ equity positions using Excel financial models (DCF, comparables analysis, ratio analysis) to evaluate $2M+ in holdings
  • Conducted sector research and competitive analysis on 8 potential investments, contributing to 2 portfolio allocation decisions
  • Built Excel dashboards tracking multi-year performance metrics, exposure analysis, and risk-adjusted returns

TECHNICAL SKILLS

Tools & Technologies I Work With

Languages

Python SQL JavaScript C++

ML & Data Science

scikit-learn TensorFlow/Keras XGBoost pandas NumPy statsmodels Reinforcement Learning

Tools & Tech

FastAPI Docker Git/GitHub Streamlit n8n BeautifulSoup Selenium Excel

Mathematics

Statistical Modeling Probability Theory Quantitative Analysis Time Series Analysis Discrete Math

LET'S CONNECT

Interested in data science, sports analytics, or AI automation? Looking for quantitative research interns? Let's talk!