Filter Exercises
Medium Exercises
11 exercisesIntroduction to MLB Analytics
1 exerciseData Hierarchy Exploration
1. Get game-level data for your favorite team
2. Calculate the team's wins and losses
3. Aggregate to find total runs scored and allowed across the season
4. Calculate the team's Pythagorean winning percentage using the formula from the Preface
5. Compare actual winning percentage to Pythagorean expectation
**R Version Hint:**
```r
# Use baseballr to get team game logs
library(baseballr)
library(tidyverse)
# Example for Yankees (team_id = 147)
yankees_games <- mlb_team_schedule(season = 2023,
team_id = 147)
# Then aggregate and calculate...
```
**Python Version Hint:**
```python
# Use pybaseball to get team game logs
from pybaseball import schedule_and_record
# Example for Yankees
yankees_games = schedule_and_record(2023, 'NYY')
# Then aggregate and calculate...
```
Data Wrangling for Baseball
2 exercisesGrouping and Team Analysis
1. Which team had the highest average OPS among qualified batters?
2. What's the correlation between team home runs and team wins (you'll need to join with standings data)?
3. Calculate each team's offensive balance: standard deviation of WAR among their top 5 hitters
Joins and Data Integration
1. Identify two-way players (appear in both datasets)
2. For two-way players, calculate their combined WAR
3. Compare offensive WAR vs. pitching WAR
The Baseball Data Ecosystem
2 exercisesMulti-Source Data Integration
1. Retrieve 2024 batting statistics for players with 400+ PA
2. For the top 10 hitters by wRC+, get their Statcast data
3. Compare their wOBA (FanGraphs) to their xwOBA (Statcast)
4. Which players are most over-performing or under-performing their expected stats?
**R Solution Sketch:**
```r
library(baseballr)
library(tidyverse)
# 1. Get FanGraphs data
batters <- fg_batter_leaders(2024, 2024, qual = 400)
top10 <- batters %>% arrange(desc(`wRC+`)) %>% head(10)
# 2 & 3. Get Statcast data for each
# (Would loop through players and use statcast_search with their IDs)
# Compare wOBA vs xwOBA
# 4. Calculate differences
# top10 %>% mutate(woba_diff = wOBA - xwOBA)
```
**Python Solution Sketch:**
```python
from pybaseball import batting_stats, statcast_batter, playerid_lookup
import pandas as pd
# 1. Get batting stats
batters = batting_stats(2024, qual=400)
top10 = batters.nlargest(10, 'wRC+')
# 2. Get Statcast data for top 10
# (Would need to lookup MLBAM IDs and query statcast_batter)
# 3 & 4. Compare wOBA vs xwOBA
# Calculate differences to identify over/under-performers
```
Historical Trends with Lahman
1. Calculate the league-average batting average by decade (1920-present)
2. Identify the decade with the highest and lowest scoring
3. Plot the trend of strikeouts per game over time
4. Compare the "Steroid Era" (1995-2005) to the "Modern Era" (2015-2024) in terms of HR rate, K rate, and BA
Fantasy Player Valuation
1. Calculate replacement-level statistics (use the worst player's projections)
2. Define per-point denominators for each category (assume reasonable league spreads)
3. Calculate SGP for each player
4. Convert SGP to auction values (12-team league, $260 budget)
5. Visualize the relationship between a player's projected home runs and their auction value
**Extension**: How does the value of a player with extreme stolen base totals (50+) change if the league adds OBP as a sixth category?
Team Building & Roster Construction
2 exercisesFree Agent Cost Analysis
a) Compare cost per WAR across different position groups (pitchers vs hitters, premium positions vs corner positions)
b) Analyze whether older players (33+) cost more or less per WAR than younger free agents (28-30)
c) Identify which signing appears most efficient (best value) and least efficient (worst value)
**Data to collect:**
- Player name, position, age
- Contract terms (years, AAV)
- Projected WAR for first year (use Steamer or ZiPS)
**Hint:** Check FanGraphs or Baseball Prospectus for free agent tracker and projections.
Trade Surplus Value
**Your analysis should:**
a) Calculate total surplus value for each side of the trade (projected WAR × market rate - expected salary over years of control)
b) Apply discount rates to future value (use 5-10%)
c) Determine which team "won" the trade based on surplus value
d) Discuss how competitive windows might make the trade beneficial for both sides despite unequal surplus value
**Suggested trades:**
- Juan Soto to Padres (2022)
- Tyler Glasnow to Dodgers (2023)
- Dylan Cease to Padres (2024)
Age-Adjusted Performance Analysis
**Data**:
```
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
12.2% BB%, 24.5% K%, .210 ISO
```
**Questions**:
1. Calculate the prospect's age-adjusted wRC+ (assume league average is 100)
2. How does the strikeout rate compare when adjusted for age?
3. Based on age-adjusted metrics, is this prospect ahead or behind the development curve?
4. What level should this prospect be promoted to next, and why?
Breakout Candidate Identification
**Prospect Comparison**:
| Metric | Prospect A | Prospect B | Prospect C |
|--------|-----------|-----------|-----------|
| Current wRC+ | 105 | 118 | 98 |
| Chase Rate Change | -4.5% | -1.2% | +2.1% |
| Zone Contact Change | +3.2% | +1.8% | -0.5% |
| Avg EV Change | +2.1 mph | +0.8 mph | +3.5 mph |
| Barrel Rate | 8.5% | 11.2% | 6.8% |
| Age | 22 | 24 | 21 |
**Questions**:
1. Calculate a composite breakout score for each prospect
2. Which prospect shows the most promising leading indicators?
3. What specific improvements drive your choice?
4. What realistic wRC+ would you project for each prospect next season?
International Prospect Translation
**Player Data**:
```
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
```
**Questions**:
1. Translate the KBO statistics to MLB equivalents using appropriate league factors
2. What MLB slash line would you project for Year 1?
3. What is the biggest risk factor in this projection?
4. How would your projection change if the player were age 23 instead of 26?