CricCatapult
The Modern Cricket Analytics Platform
Production-grade machine learning for cricket data. Train models, make predictions, analyze matches - all from a single Python library.
Trusted by data scientists, AI engineers, and cricket analysts worldwide.
Why CricCatapult?
- Pre-Trained Models
ML models work instantly. No training required, no API keys, no costs.
- Lightning Fast
Sub-10ms predictions. Works offline. Production-ready.
- AI-Native
Built for AI agents, automation, and modern workflows. JSON-first API.
- Battle-Tested
Presented at Carnegie Mellon Sports Analytics Conference. Used in production.
Quick Start
Install
pip install criccatapult
That’s it. Models are included.
Make Predictions
from CricCatapult.ml import get_predictor
predictor = get_predictor()
# Predict match outcome
result = predictor.predict_match_outcome("India", "Australia")
print(f"{result['predicted_winner']} wins with {result['confidence']}% confidence")
# Predict player performance
from CricCatapult.ml import PlayerStatsSynthesizer
stats = PlayerStatsSynthesizer.get_player_stats("Virat Kohli")
performance = predictor.predict_player_performance(stats)
print(f"Predicted: {performance['predicted_runs']} runs")
Download Data
from CricCatapult import Cricsheetcs = Cricsheet() cs.IPL_csv() # Download all IPL data
# Or use the CLI
criccatapult-cli cricsheet --type ipl --format json
For AI Agents
CricCatapult is designed for AI agents and automation workflows.
Structured Output
criccatapult-cli --format json predict-match \
--team1 "India" --team2 "Australia"
{
"predicted_winner": "India",
"confidence": 55.4,
"win_probability": {
"India": 55.4,
"Australia": 44.6
}
}
Natural Language
criccatapult-cli ask "Who won the last IPL?"
criccatapult-cli ask "Predict India vs Pakistan"
criccatapult-cli ask "Show Virat Kohli stats"
Production Ready
from flask import Flask, request, jsonify
from CricCatapult.ml import get_predictor
app = Flask(__name__)
predictor = get_predictor()
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
result = predictor.predict_match_outcome(
data['team1'],
data['team2']
)
return jsonify(result)
Machine Learning
Player Performance Prediction
- XGBoost Regressor
Predicts runs scored in next innings with confidence intervals.
- Features
Historical averages, strike rates, recent form, consistency metrics, momentum indicators.
- Performance
Mean Absolute Error: 15 runs. Inference: <10ms.
from CricCatapult.ml import get_predictor, PlayerStatsSynthesizer
predictor = get_predictor() stats = PlayerStatsSynthesizer.get_player_stats(“Rohit Sharma”)
prediction = predictor.predict_player_performance(stats) # Returns: predicted_runs, confidence, confidence_interval
Match Outcome Prediction
- Random Forest Classifier
Predicts match winner with win probability percentages.
- Features
Team strength, historical win rates, toss decisions, venue factors.
- Performance
Accuracy: 58.5%. Inference: <10ms.
predictor = get_predictor()
result = predictor.predict_match_outcome(“England”, “Pakistan”) # Returns: predicted_winner, confidence, win_probability
Feature Engineering
Extract features from raw cricket data for custom models.
from CricCatapult.ml import CricketFeatureEngineer, load_cricsheet_data
df = load_cricsheet_data(‘ipl_matches.csv’) engineer = CricketFeatureEngineer()
# Extract batting features batting = engineer.extract_batting_features(df, “Virat Kohli”) # career_avg, career_sr, boundary_pct, powerplay_sr, death_sr, etc.
# Extract bowling features bowling = engineer.extract_bowling_features(df, “Jasprit Bumrah”) # economy, wickets, dot_ball_pct, death_economy, etc.
Custom Training
Fine-tune models on your data using Google Colab (free GPUs).
Step 1: Upload notebook from notebooks_training/
Step 2: Upload your Cricsheet CSV data
Step 3: Run all cells (5-10 minutes)
Step 4: Download and deploy
# Replace pre-trained model
mv your_model.pkl CricCatapult/models/player_performance_model.pkl
# Predictions now use your model
criccatapult-cli predict-player --player "Your Player"
Command Line Interface
ML Predictions
# Player performance
criccatapult-cli predict-player --player "Virat Kohli"
# Match outcome
criccatapult-cli predict-match --team1 "India" --team2 "Australia"
# Structured JSON output
criccatapult-cli --format json predict-player --player "Rohit Sharma"
Data Downloads
# Indian Premier League
criccatapult-cli cricsheet --type ipl
# All T20 matches
criccatapult-cli cricsheet --type t20 --gender male
# Recent matches
criccatapult-cli cricsheet --type recent --days 7
Natural Language
criccatapult-cli ask "Download IPL data"
criccatapult-cli ask "Predict India vs Australia"
criccatapult-cli ask "Show Kohli's stats"
Analytics
# Usage dashboard
criccatapult-cli dashboard --days 30
Terminal Interface
Launch the interactive terminal UI:
criccatapult
Navigate with arrow keys. Press Q to quit.
Python API
Cricsheet Data
Download historical match data from Cricsheet.org.
from CricCatapult import Cricsheet
cs = Cricsheet()
# League data cs.IPL_csv() # Indian Premier League cs.bigbashleague_csv() # Big Bash League cs.pakistanleague_csv() # Pakistan Super League cs.caribbeanleague_csv() # Caribbean Premier League
# Format data cs.t20_csv(gender=”male”) # T20 matches cs.odi_csv(gender=”female”) # ODI matches cs.test_matches_csv(“both”) # Test matches
# Recent data cs.recent_csv(gender=”male”, days=7) # Last 7 days
Player Analytics
Comprehensive player statistics and career analysis.
from CricCatapult import Playerplayer = Player(“Virat Kohli”)
# Career overview career = player.get_career_df()
# Personal information name = player.get_personal_info(“full_name”) style = player.get_personal_info(“batting_style”)
# Teams represented teams = player.get_teams()
# Format-specific stats odi_batting = player.get_format_df(
format_num=2, view=’match’, action=’batting’
)
Match Analysis
Detailed match scorecards and visualizations.
from CricCatapult import Series import matplotlib.pyplot as plt
- # Initialize with match IDs from ESPNCricinfo
series = Series(series_id=1298423, match_id=1298436)
# Scorecards first_innings = series.batting_df(bat_first=True) bowling_figures = series.bowling_df(bowl_first=True)
# MVP statistics mvp = series.mvp()
# Visualizations series.manhattan() # Runs per over series.worm() # Cumulative runs
plt.show()
Live Scores
Real-time cricket scores worldwide.
from CricCatapult import Scoreboard
scoreboard = Scoreboard()
# All live matches live = scoreboard.scores()
# As DataFrame df = scoreboard.scores_df() ongoing = df[df[‘Status’] == ‘Live’]
Records
Cricket records by team, tournament, year, ground.
from CricCatapult import Records records = Records()# IPL records ipl_runs = records.get_ipl_records(
format=’batting’, record=’most_runs_career’
)
# Big Bash League bbl_wickets = records.get_bbl_records(
format=’bowling’, record=’most_wickets_career’
)
Venue Information
Cricket ground locations and interactive maps.
from CricCatapult import Location location = Location(match_id="1329821")# Venue name venue = location.get_location()
# Interactive map map_obj = location.get_map() map_obj.save(“ground_map.html”)
Deployment
REST API
Deploy as a production API service.
from flask import Flask, request, jsonify from CricCatapult.ml import get_predictor
app = Flask(__name__) predictor = get_predictor()
@app.route(‘/api/v1/predict/match’, methods=[‘POST’]) def predict_match():
data = request.json result = predictor.predict_match_outcome(
data[‘team1’], data[‘team2’]
) return jsonify(result)
@app.route(‘/api/v1/predict/player’, methods=[‘POST’]) def predict_player():
data = request.json result = predictor.predict_player_performance(data[‘stats’]) return jsonify(result)
- if __name__ == ‘__main__’:
app.run(host=’0.0.0.0’, port=8000)
Batch Processing
Process multiple predictions efficiently.
from CricCatapult.ml import get_predictor, PlayerStatsSynthesizer
predictor = get_predictor()
players = ["Virat Kohli", "Rohit Sharma", "KL Rahul", "Rishabh Pant"]
results = []
for player in players:
stats = PlayerStatsSynthesizer.get_player_stats(player)
prediction = predictor.predict_player_performance(stats)
results.append({
'player': player,
'predicted_runs': prediction['predicted_runs'],
'confidence': prediction['confidence']
})
print(results)
Background Jobs
Integrate with task queues for async processing.
from celery import Celery
from CricCatapult.ml import get_predictor
app = Celery('tasks', broker='redis://localhost:6379')
predictor = get_predictor()
@app.task
def predict_match_async(team1, team2):
result = predictor.predict_match_outcome(team1, team2)
return result
# Queue prediction
task = predict_match_async.delay("India", "Australia")
result = task.get()
Enterprise Features
Zero Dependencies - No external APIs. No rate limits. No downtime.
Data Privacy - All processing happens locally. Your data never leaves your infrastructure.
Offline Capable - Models work without internet after installation.
Version Control - Models are files. Version them with Git. Roll back anytime.
Audit Trail - Built-in usage analytics. Track every prediction.
Custom Training - Fine-tune on your proprietary data. Keep your competitive edge.
Support
- Documentation
You’re reading it. Comprehensive guides for every feature.
- GitHub
- PyPI
- Issues & Features
Open an issue on GitHub. We respond within 24 hours.
FAQ
Do I need API keys?
No. Everything works locally. No external services required.
What’s the cost?
Free. Open source. No hidden fees.
Can I use this in production?
Yes. Built for production. Used by multiple organizations.
How accurate are predictions?
Player model: MAE 15 runs. Match model: 58.5% accuracy. Improve with custom training.
Does it work offline?
Yes. After installation, all predictions work offline.
Can AI agents use this?
Yes. Designed for AI agents. JSON output, CLI interface, deterministic results.
How big are the models?
2.1MB total. Negligible impact on package size.
Can I train custom models?
Yes. Google Colab notebooks included. Train on your data.
What about data freshness?
Cricsheet.org updates regularly. Download anytime with one command.
Is this production-ready?
Yes. Battle-tested. Proper error handling. Comprehensive tests.
Architecture
ML Models - XGBoost for regression. Random Forest for classification. Joblib serialization.
Feature Engineering - Pandas-based pipelines. NumPy for numerical operations. Scikit-learn preprocessing.
CLI - Argparse for parsing. JSON/CSV output formats. Subprocess-safe for AI agents.
TUI - Textual framework. Modern terminal interface. Keyboard navigation.
Data Pipeline - Requests for HTTP. BeautifulSoup for parsing. Pandas for transformation.
Caching - SQLite for analytics. File-based for queries. Automatic cleanup.
License
Open source. Check the repository for details.