CricCatapult

The Modern Cricket Analytics Platform

Production-grade machine learning for cricket data. Train models, make predictions, analyze matches - all from a single Python library.

Trusted by data scientists, AI engineers, and cricket analysts worldwide.

Why CricCatapult?

Pre-Trained Models: ML models work instantly. No training required, no API keys, no costs.
Lightning Fast: Sub-10ms predictions. Works offline. Production-ready.
AI-Native: Built for AI agents, automation, and modern workflows. JSON-first API.
Battle-Tested: Presented at Carnegie Mellon Sports Analytics Conference. Used in production.

Quick Start

Install

pip install criccatapult

That’s it. Models are included.

Make Predictions

from CricCatapult.ml import get_predictor

predictor = get_predictor()

# Predict match outcome
result = predictor.predict_match_outcome("India", "Australia")
print(f"{result['predicted_winner']} wins with {result['confidence']}% confidence")

# Predict player performance
from CricCatapult.ml import PlayerStatsSynthesizer
stats = PlayerStatsSynthesizer.get_player_stats("Virat Kohli")
performance = predictor.predict_player_performance(stats)
print(f"Predicted: {performance['predicted_runs']} runs")

Download Data

from CricCatapult import Cricsheet
cs = Cricsheet() cs.IPL_csv() # Download all IPL data

# Or use the CLI
criccatapult-cli cricsheet --type ipl --format json

For AI Agents

CricCatapult is designed for AI agents and automation workflows.

Structured Output

criccatapult-cli --format json predict-match \
  --team1 "India" --team2 "Australia"

{
  "predicted_winner": "India",
  "confidence": 55.4,
  "win_probability": {
    "India": 55.4,
    "Australia": 44.6
  }
}

Natural Language

criccatapult-cli ask "Who won the last IPL?"
criccatapult-cli ask "Predict India vs Pakistan"
criccatapult-cli ask "Show Virat Kohli stats"

Production Ready

from flask import Flask, request, jsonify
from CricCatapult.ml import get_predictor

app = Flask(__name__)
predictor = get_predictor()

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    result = predictor.predict_match_outcome(
        data['team1'],
        data['team2']
    )
    return jsonify(result)

Machine Learning

Player Performance Prediction

XGBoost Regressor

Predicts runs scored in next innings with confidence intervals.

Features

Historical averages, strike rates, recent form, consistency metrics, momentum indicators.

Performance

Mean Absolute Error: 15 runs. Inference: <10ms.

from CricCatapult.ml import get_predictor, PlayerStatsSynthesizer

predictor = get_predictor() stats = PlayerStatsSynthesizer.get_player_stats(“Rohit Sharma”)

prediction = predictor.predict_player_performance(stats) # Returns: predicted_runs, confidence, confidence_interval

Match Outcome Prediction

Random Forest Classifier

Predicts match winner with win probability percentages.

Features

Team strength, historical win rates, toss decisions, venue factors.

Performance

Accuracy: 58.5%. Inference: <10ms.

predictor = get_predictor()

result = predictor.predict_match_outcome(“England”, “Pakistan”) # Returns: predicted_winner, confidence, win_probability

Feature Engineering

Extract features from raw cricket data for custom models.

from CricCatapult.ml import CricketFeatureEngineer, load_cricsheet_data

df = load_cricsheet_data(‘ipl_matches.csv’) engineer = CricketFeatureEngineer()

# Extract batting features batting = engineer.extract_batting_features(df, “Virat Kohli”) # career_avg, career_sr, boundary_pct, powerplay_sr, death_sr, etc.

# Extract bowling features bowling = engineer.extract_bowling_features(df, “Jasprit Bumrah”) # economy, wickets, dot_ball_pct, death_economy, etc.

Custom Training

Fine-tune models on your data using Google Colab (free GPUs).

Step 1: Upload notebook from notebooks_training/

Step 2: Upload your Cricsheet CSV data

Step 3: Run all cells (5-10 minutes)

Step 4: Download and deploy

# Replace pre-trained model
mv your_model.pkl CricCatapult/models/player_performance_model.pkl

# Predictions now use your model
criccatapult-cli predict-player --player "Your Player"

Command Line Interface

ML Predictions

# Player performance
criccatapult-cli predict-player --player "Virat Kohli"

# Match outcome
criccatapult-cli predict-match --team1 "India" --team2 "Australia"

# Structured JSON output
criccatapult-cli --format json predict-player --player "Rohit Sharma"

Data Downloads

# Indian Premier League
criccatapult-cli cricsheet --type ipl

# All T20 matches
criccatapult-cli cricsheet --type t20 --gender male

# Recent matches
criccatapult-cli cricsheet --type recent --days 7

Natural Language

criccatapult-cli ask "Download IPL data"
criccatapult-cli ask "Predict India vs Australia"
criccatapult-cli ask "Show Kohli's stats"

Analytics

# Usage dashboard
criccatapult-cli dashboard --days 30

Terminal Interface

Launch the interactive terminal UI:

criccatapult

Navigate with arrow keys. Press Q to quit.

Python API

Cricsheet Data

Download historical match data from Cricsheet.org.

from CricCatapult import Cricsheet

cs = Cricsheet()

# League data cs.IPL_csv() # Indian Premier League cs.bigbashleague_csv() # Big Bash League cs.pakistanleague_csv() # Pakistan Super League cs.caribbeanleague_csv() # Caribbean Premier League

# Format data cs.t20_csv(gender=”male”) # T20 matches cs.odi_csv(gender=”female”) # ODI matches cs.test_matches_csv(“both”) # Test matches

# Recent data cs.recent_csv(gender=”male”, days=7) # Last 7 days

Player Analytics

Comprehensive player statistics and career analysis.

from CricCatapult import Player
player = Player(“Virat Kohli”)

# Career overview career = player.get_career_df()

# Personal information name = player.get_personal_info(“full_name”) style = player.get_personal_info(“batting_style”)

# Teams represented teams = player.get_teams()

# Format-specific stats odi_batting = player.get_format_df(

format_num=2, view=’match’, action=’batting’

)

Match Analysis

Detailed match scorecards and visualizations.

from CricCatapult import Series
import matplotlib.pyplot as plt
# Initialize with match IDs from ESPNCricinfo
series = Series(series_id=1298423, match_id=1298436)

# Scorecards first_innings = series.batting_df(bat_first=True) bowling_figures = series.bowling_df(bowl_first=True)

# MVP statistics mvp = series.mvp()

# Visualizations series.manhattan() # Runs per over series.worm() # Cumulative runs

plt.show()

Live Scores

Real-time cricket scores worldwide.

from CricCatapult import Scoreboard

scoreboard = Scoreboard()

# All live matches live = scoreboard.scores()

# As DataFrame df = scoreboard.scores_df() ongoing = df[df[‘Status’] == ‘Live’]

Records

Cricket records by team, tournament, year, ground.

from CricCatapult import Records

records = Records()
# IPL records ipl_runs = records.get_ipl_records(

format=’batting’, record=’most_runs_career’

)

# Big Bash League bbl_wickets = records.get_bbl_records(

format=’bowling’, record=’most_wickets_career’

)

Venue Information

Cricket ground locations and interactive maps.

from CricCatapult import Location

location = Location(match_id="1329821")
# Venue name venue = location.get_location()

# Interactive map map_obj = location.get_map() map_obj.save(“ground_map.html”)

Deployment

REST API

Deploy as a production API service.

from flask import Flask, request, jsonify from CricCatapult.ml import get_predictor

app = Flask(__name__) predictor = get_predictor()

@app.route(‘/api/v1/predict/match’, methods=[‘POST’]) def predict_match():

data = request.json result = predictor.predict_match_outcome(

data[‘team1’], data[‘team2’]

) return jsonify(result)

@app.route(‘/api/v1/predict/player’, methods=[‘POST’]) def predict_player():

data = request.json result = predictor.predict_player_performance(data[‘stats’]) return jsonify(result)

if __name__ == ‘__main__’:
app.run(host=’0.0.0.0’, port=8000)

Batch Processing

Process multiple predictions efficiently.

from CricCatapult.ml import get_predictor, PlayerStatsSynthesizer

predictor = get_predictor()

players = ["Virat Kohli", "Rohit Sharma", "KL Rahul", "Rishabh Pant"]

results = []
for player in players:
    stats = PlayerStatsSynthesizer.get_player_stats(player)
    prediction = predictor.predict_player_performance(stats)
    results.append({
        'player': player,
        'predicted_runs': prediction['predicted_runs'],
        'confidence': prediction['confidence']
    })

print(results)

Background Jobs

Integrate with task queues for async processing.

from celery import Celery
from CricCatapult.ml import get_predictor

app = Celery('tasks', broker='redis://localhost:6379')
predictor = get_predictor()

@app.task
def predict_match_async(team1, team2):
    result = predictor.predict_match_outcome(team1, team2)
    return result

# Queue prediction
task = predict_match_async.delay("India", "Australia")
result = task.get()

Enterprise Features

Zero Dependencies - No external APIs. No rate limits. No downtime.

Data Privacy - All processing happens locally. Your data never leaves your infrastructure.

Offline Capable - Models work without internet after installation.

Version Control - Models are files. Version them with Git. Roll back anytime.

Audit Trail - Built-in usage analytics. Track every prediction.

Custom Training - Fine-tune on your proprietary data. Keep your competitive edge.

Support

Documentation: You’re reading it. Comprehensive guides for every feature.
GitHub: https://github.com/aadrijupadya/CricCatapult
PyPI: https://pypi.org/project/criccatapult/
Issues & Features: Open an issue on GitHub. We respond within 24 hours.

FAQ

Do I need API keys?

No. Everything works locally. No external services required.

What’s the cost?

Free. Open source. No hidden fees.

Can I use this in production?

Yes. Built for production. Used by multiple organizations.

How accurate are predictions?

Player model: MAE 15 runs. Match model: 58.5% accuracy. Improve with custom training.

Does it work offline?

Yes. After installation, all predictions work offline.

Can AI agents use this?

Yes. Designed for AI agents. JSON output, CLI interface, deterministic results.

How big are the models?

2.1MB total. Negligible impact on package size.

Can I train custom models?

Yes. Google Colab notebooks included. Train on your data.

What about data freshness?

Cricsheet.org updates regularly. Download anytime with one command.

Is this production-ready?

Yes. Battle-tested. Proper error handling. Comprehensive tests.

Architecture

ML Models - XGBoost for regression. Random Forest for classification. Joblib serialization.

Feature Engineering - Pandas-based pipelines. NumPy for numerical operations. Scikit-learn preprocessing.

CLI - Argparse for parsing. JSON/CSV output formats. Subprocess-safe for AI agents.

TUI - Textual framework. Modern terminal interface. Keyboard navigation.

Data Pipeline - Requests for HTTP. BeautifulSoup for parsing. Pandas for transformation.

Caching - SQLite for analytics. File-based for queries. Automatic cleanup.

License

Open source. Check the repository for details.