Practice Exercises

Master baseball analytics through hands-on practice. Work through exercises that reinforce concepts from each chapter, from basic data wrangling to advanced predictive modeling.

48 Total Exercises
6 Chapters
3 Difficulty Levels

How to Practice Effectively

Get the most out of these exercises with our recommended approach

1
Read the Chapter First

Make sure you've completed the corresponding chapter before attempting exercises. Understanding the concepts will make problem-solving much easier.

2
Try Before You Peek

Attempt each exercise on your own before looking at hints or solutions. Struggling through problems builds deeper understanding than copying answers.

3
Experiment & Extend

After solving an exercise, try modifying it. Use different players, seasons, or metrics. This reinforces learning and builds real analysis skills.

Understanding Difficulty Levels

Exercises are categorized by difficulty to help you build skills progressively. Start with easy exercises to build confidence, then work your way up.

Easy
Foundation Building

Basic operations like loading data, simple calculations, and fundamental visualizations. Perfect for beginners or warming up.

Medium
Skill Development

Multi-step problems requiring data manipulation, metric calculations, and combining multiple concepts. The core of your learning.

Hard
Advanced Application

Complex problems requiring creative problem-solving, advanced techniques, and often combining skills from multiple chapters.

Exercises by Difficulty
Easy 0 exercises
Medium 11 exercises
Hard 37 exercises

Total Exercises: 48
Filter Exercises
Showing: Medium exercises (11 of 48)
Show All

Medium Exercises

11 exercises
Exercise 1.2
Data Hierarchy Exploration
Medium
Using baseball data from 2023:

1. Get game-level data for your favorite team
2. Calculate the team's wins and losses
3. Aggregate to find total runs scored and allowed across the season
4. Calculate the team's Pythagorean winning percentage using the formula from the Preface
5. Compare actual winning percentage to Pythagorean expectation

**R Version Hint:**

```r
# Use baseballr to get team game logs
library(baseballr)
library(tidyverse)

# Example for Yankees (team_id = 147)
yankees_games <- mlb_team_schedule(season = 2023,
team_id = 147)
# Then aggregate and calculate...
```

**Python Version Hint:**

```python
# Use pybaseball to get team game logs
from pybaseball import schedule_and_record

# Example for Yankees
yankees_games = schedule_and_record(2023, 'NYY')
# Then aggregate and calculate...
```
Exercise 2.2
Grouping and Team Analysis
Medium
Calculate team-level statistics from individual player data:

1. Which team had the highest average OPS among qualified batters?
2. What's the correlation between team home runs and team wins (you'll need to join with standings data)?
3. Calculate each team's offensive balance: standard deviation of WAR among their top 5 hitters
Exercise 2.4
Joins and Data Integration
Medium
Combine batting and pitching data:

1. Identify two-way players (appear in both datasets)
2. For two-way players, calculate their combined WAR
3. Compare offensive WAR vs. pitching WAR
Exercise 3.1
Multi-Source Data Integration
Medium
Using both FanGraphs (via baseballr/pybaseball) and Statcast data:

1. Retrieve 2024 batting statistics for players with 400+ PA
2. For the top 10 hitters by wRC+, get their Statcast data
3. Compare their wOBA (FanGraphs) to their xwOBA (Statcast)
4. Which players are most over-performing or under-performing their expected stats?

**R Solution Sketch:**
```r
library(baseballr)
library(tidyverse)

# 1. Get FanGraphs data
batters <- fg_batter_leaders(2024, 2024, qual = 400)
top10 <- batters %>% arrange(desc(`wRC+`)) %>% head(10)

# 2 & 3. Get Statcast data for each
# (Would loop through players and use statcast_search with their IDs)
# Compare wOBA vs xwOBA

# 4. Calculate differences
# top10 %>% mutate(woba_diff = wOBA - xwOBA)
```

**Python Solution Sketch:**
```python
from pybaseball import batting_stats, statcast_batter, playerid_lookup
import pandas as pd

# 1. Get batting stats
batters = batting_stats(2024, qual=400)
top10 = batters.nlargest(10, 'wRC+')

# 2. Get Statcast data for top 10
# (Would need to lookup MLBAM IDs and query statcast_batter)

# 3 & 4. Compare wOBA vs xwOBA
# Calculate differences to identify over/under-performers
```
Exercise 3.2
Historical Trends with Lahman
Medium
Using the Lahman database:

1. Calculate the league-average batting average by decade (1920-present)
2. Identify the decade with the highest and lowest scoring
3. Plot the trend of strikeouts per game over time
4. Compare the "Steroid Era" (1995-2005) to the "Modern Era" (2015-2024) in terms of HR rate, K rate, and BA
Exercise 11.1
Fantasy Player Valuation
Medium
Using projection data for 10 players across all five standard hitting categories:

1. Calculate replacement-level statistics (use the worst player's projections)
2. Define per-point denominators for each category (assume reasonable league spreads)
3. Calculate SGP for each player
4. Convert SGP to auction values (12-team league, $260 budget)
5. Visualize the relationship between a player's projected home runs and their auction value

**Extension**: How does the value of a player with extreme stolen base totals (50+) change if the league adds OBP as a sixth category?
Exercise 14.1
Free Agent Cost Analysis
Medium
Using 2024 free agent data, calculate cost per WAR for at least 10 free agent signings. Then:

a) Compare cost per WAR across different position groups (pitchers vs hitters, premium positions vs corner positions)

b) Analyze whether older players (33+) cost more or less per WAR than younger free agents (28-30)

c) Identify which signing appears most efficient (best value) and least efficient (worst value)

**Data to collect:**
- Player name, position, age
- Contract terms (years, AAV)
- Projected WAR for first year (use Steamer or ZiPS)

**Hint:** Check FanGraphs or Baseball Prospectus for free agent tracker and projections.
Exercise 14.2
Trade Surplus Value
Medium
Evaluate a recent blockbuster trade using surplus value analysis. Choose a trade from the past 2 years involving multiple players.

**Your analysis should:**

a) Calculate total surplus value for each side of the trade (projected WAR × market rate - expected salary over years of control)

b) Apply discount rates to future value (use 5-10%)

c) Determine which team "won" the trade based on surplus value

d) Discuss how competitive windows might make the trade beneficial for both sides despite unequal surplus value

**Suggested trades:**
- Juan Soto to Padres (2022)
- Tyler Glasnow to Dodgers (2023)
- Dylan Cease to Padres (2024)
Exercise 15.1
Age-Adjusted Performance Analysis
Medium
**Task**: Analyze a prospect's performance adjusting for age relative to league average. Using the provided data, calculate age-adjusted metrics and determine if the prospect is performing above or below expectations.

**Data**:
```
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
12.2% BB%, 24.5% K%, .210 ISO
```

**Questions**:
1. Calculate the prospect's age-adjusted wRC+ (assume league average is 100)
2. How does the strikeout rate compare when adjusted for age?
3. Based on age-adjusted metrics, is this prospect ahead or behind the development curve?
4. What level should this prospect be promoted to next, and why?
Exercise 15.2
Breakout Candidate Identification
Medium
**Task**: Using the swing decision and contact quality metrics below, identify which prospect is most likely to break out in the next season.

**Prospect Comparison**:

| Metric | Prospect A | Prospect B | Prospect C |
|--------|-----------|-----------|-----------|
| Current wRC+ | 105 | 118 | 98 |
| Chase Rate Change | -4.5% | -1.2% | +2.1% |
| Zone Contact Change | +3.2% | +1.8% | -0.5% |
| Avg EV Change | +2.1 mph | +0.8 mph | +3.5 mph |
| Barrel Rate | 8.5% | 11.2% | 6.8% |
| Age | 22 | 24 | 21 |

**Questions**:
1. Calculate a composite breakout score for each prospect
2. Which prospect shows the most promising leading indicators?
3. What specific improvements drive your choice?
4. What realistic wRC+ would you project for each prospect next season?
Exercise 15.3
International Prospect Translation
Medium
**Task**: Translate the following KBO statistics to MLB equivalents and project first-year MLB performance.

**Player Data**:
```
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
```

**Questions**:
1. Translate the KBO statistics to MLB equivalents using appropriate league factors
2. What MLB slash line would you project for Year 1?
3. What is the biggest risk factor in this projection?
4. How would your projection change if the player were age 23 instead of 26?

Build Your Skills Progressively

Follow this recommended path through the exercises

Data Wrangling

Start here! Learn to load, clean, and manipulate baseball data.

Chapters 1-3
Visualization

Create compelling charts and visualizations of baseball data.

Chapter 4
Metrics & Analysis

Calculate and interpret sabermetric and Statcast metrics.

Chapters 5-8
Advanced Topics

Machine learning, custom metrics, and interactive apps.

Chapters 9-12

Need to Review the Material?

Head back to the chapters to refresh your understanding before tackling exercises.