King AI Capital

Match Outcome Predictor (Random Forest Classifier)

Launch Match Result Prediction App

Overview

This application predicts the outcome of upcoming English Premier League fixtures, providing probability estimates for each result: Home Win, Draw, or Away Win. Rather than relying on bookmaker lines alone, the model uses machine learning and statistical signals to generate data-driven forecasts.

Data Pipeline

Data Acquisition:
Historical Premier League match results are collected via Selenium scraping from site X, due to dynamic JavaScript content and request-blocking mechanisms. The scraper processes each season in turn, extracting structured match data across multiple seasons.

Data Preparation:
The dataset is cleaned and checked for consistency, with exploratory analysis performed to ensure stable distributions and remove anomalies.

Feature Engineering:
Over 40 engineered features are built, capturing team form, goal differences, recent results, head-to-head records, Elo-style adjustments, home/away factors, and rolling performance metrics. The feature set is designed to provide predictive strength across varied match conditions.

Feature Selection & Correlation:
Pearson and Spearman correlation tests, along with feature importance scoring from preliminary models, are used to prune weak or redundant features, leaving a robust and efficient set for training.

Model Architecture

Model Choice:
A Random Forest Classifier (RFC) was chosen for its ability to manage multi-class classification, capture non-linear feature interactions, and mitigate overfitting through ensemble methods.

Target Structuring:
The target is classified into three discrete categories: Home Win, Draw, or Away Win. This multi-class setup allows the model to produce balanced probability outputs for all possible outcomes.

Hyperparameter Optimization:
Hyperparameters such as n_estimators, max_depth, min_samples_leaf, and max_features were optimized using randomized search and cross-validation to achieve the best balance between accuracy and generalisation.

Testing & Validation:
The model has been validated on unseen fixtures using metrics including log loss, accuracy, and probability calibration analysis to ensure reliable predictive strength.

Model Evaluation & Success

The model has been tested extensively on dynamically updated, unseen data. While predicting every exact outcome is impossible, it consistently delivers well-calibrated probability distributions that give users an informed statistical edge.

We do not show historical backtests, as past results are not necessarily indicative of future performance. This is a predictive tool — a blend of machine learning and data-driven probability modelling — not a guarantee.

If we could predict every match perfectly, we’d already own Luton Town FC, with Ronaldo sitting on the bench as our impact sub! Until then, this tool is here to help you make smarter, more data-informed decisions.

Deployment

The model, feature pipelines, and dynamic fixture engine are deployed to Streamlit Cloud. Predictions are updated daily with the latest fixtures and engineered features.

Scalability

Although the current version is focused on the English Premier League, the system is structured for easy expansion to other leagues with tailored feature sets and retraining to reflect local outcome tendencies.