Chapter 12: Building Interactive Applications

The power of baseball analytics increases dramatically when insights become interactive. Static reports and visualizations have their place, but interactive applications allow users to explore data dynamically, ask their own questions, and discover patterns at their own pace. This chapter teaches you to build professional-quality web applications using three leading frameworks: Shiny for R, Streamlit for Python, and Dash by Plotly. By chapter's end, you'll have created deployable MLB analytic...

Intermediate ~5 min read 5 sections 21 code examples
Book Progress
24%
Chapter 13 of 54
What You'll Learn
  • Introduction to Shiny (R)
  • Streamlit (Python)
  • Dash by Plotly
  • Deployment Options
  • And 1 more topics...
Languages in This Chapter
R (11) Python (5)

All code examples can be copied and run in your environment.

12.1 Introduction to Shiny (R)

Shiny, developed by Posit (formerly RStudio), has become the dominant framework for building interactive web applications in R. Released in 2012, Shiny allows R programmers to create sophisticated web apps without learning HTML, CSS, or JavaScript. Every Shiny app contains two essential components: a user interface (UI) that controls appearance and layout, and a server function that contains the reactive logic powering interactivity.

12.1.1 Shiny Basics

Understanding Shiny's structure and reactive programming model is essential before building complex applications. Let's start with a minimal but complete Shiny app that demonstrates core concepts.

# app.R - Minimal Shiny Application
library(shiny)

# Define UI
ui <- fluidPage(
  titlePanel("MLB Home Run Calculator"),

  sidebarLayout(
    sidebarPanel(
      numericInput("ab",
                   "At Bats:",
                   value = 500,
                   min = 1,
                   max = 700),
      sliderInput("hr_rate",
                  "HR Rate (%):",
                  value = 5,
                  min = 0,
                  max = 15,
                  step = 0.1)
    ),

    mainPanel(
      h3("Projected Home Runs"),
      textOutput("hr_projection"),
      br(),
      plotOutput("hr_plot")
    )
  )
)

# Define Server
server <- function(input, output, session) {

  # Reactive calculation
  projected_hrs <- reactive({
    input$ab * (input$hr_rate / 100)
  })

  # Text output
  output$hr_projection <- renderText({
    paste0("Expected HRs: ", round(projected_hrs(), 1))
  })

  # Plot output
  output$hr_plot <- renderPlot({
    hr_values <- seq(0, projected_hrs() * 2, length.out = 100)
    density_values <- dnorm(hr_values,
                            mean = projected_hrs(),
                            sd = sqrt(projected_hrs()))

    plot(hr_values, density_values,
         type = "l",
         lwd = 2,
         col = "darkblue",
         xlab = "Home Runs",
         ylab = "Probability Density",
         main = "Distribution of Possible HR Outcomes")
    abline(v = projected_hrs(),
           col = "red",
           lwd = 2,
           lty = 2)
  })
}

# Run the application
shinyApp(ui = ui, server = server)

This simple app illustrates Shiny's fundamental architecture. The UI defines two input widgets (numericInput and sliderInput) and two outputs (textOutput and plotOutput). The server function contains reactive expressions that automatically recalculate when inputs change. When a user adjusts the at-bats slider, Shiny automatically updates both the text projection and the plot—no explicit event handling required.

Reactive Programming Concept: Shiny's reactive programming model is its most powerful and initially confusing feature. In traditional programming, code executes sequentially from top to bottom. In reactive programming, code execution depends on dependencies between reactive elements.

Consider our example: projected_hrs() is a reactive expression that depends on input$ab and input$hr_rate. Whenever either input changes, Shiny automatically knows to re-execute projected_hrs(). Similarly, output$hr_projection and output$hr_plot depend on projected_hrs(), so they automatically re-render when it changes. You don't write code to detect changes or trigger updates—Shiny's reactive framework handles this automatically.

Three types of reactive elements exist in Shiny:

  1. Reactive sources: User inputs (input$ab, input$hr_rate) that users can change
  2. Reactive conductors: Reactive expressions created with reactive() that depend on other reactives and cache their results
  3. Reactive endpoints: Outputs (output$hr_projection) and observers that perform actions when their dependencies change

Understanding the reactive graph—the network of dependencies between these elements—is crucial for building complex applications efficiently. Reactive expressions automatically cache their results and only re-execute when their dependencies change, making apps responsive even with expensive computations.

12.1.2 Project: Player Dashboard

Now let's build a practical MLB player dashboard that allows users to look up any player, view their statistics, and see visualizations. This complete application demonstrates Shiny's power for real baseball analytics.

# player_dashboard.R - Complete MLB Player Dashboard
library(shiny)
library(dplyr)
library(ggplot2)
library(Lahman)  # MLB historical database

# Prepare data
batting_data <- Lahman::Batting %>%
  filter(yearID >= 2015) %>%
  left_join(Lahman::People %>%
              select(playerID, nameFirst, nameLast),
            by = "playerID") %>%
  mutate(
    full_name = paste(nameFirst, nameLast),
    BA = round(H / AB, 3),
    OBP = round((H + BB + HBP) / (AB + BB + HBP + SF), 3),
    SLG = round((H + X2B + 2*X3B + 3*HR) / AB, 3)
  ) %>%
  filter(AB >= 100)  # Minimum 100 AB for meaningful stats

# UI
ui <- fluidPage(
  titlePanel("MLB Player Statistics Dashboard"),

  sidebarLayout(
    sidebarPanel(
      selectInput("player",
                  "Select Player:",
                  choices = sort(unique(batting_data$full_name)),
                  selected = "Mike Trout"),

      checkboxGroupInput("stats_display",
                        "Statistics to Display:",
                        choices = c("Batting Average" = "BA",
                                   "On-Base Percentage" = "OBP",
                                   "Slugging Percentage" = "SLG",
                                   "Home Runs" = "HR",
                                   "RBIs" = "RBI",
                                   "Stolen Bases" = "SB"),
                        selected = c("BA", "OBP", "SLG", "HR")),

      hr(),

      h4("Dashboard Info"),
      p("Data: 2015-2022 seasons"),
      p("Minimum 100 AB per season"),
      p("Source: Lahman database")
    ),

    mainPanel(
      tabsetPanel(
        tabPanel("Season Statistics",
                 br(),
                 tableOutput("season_stats")),

        tabPanel("Career Trends",
                 br(),
                 plotOutput("trend_plot", height = "500px")),

        tabPanel("Performance Distribution",
                 br(),
                 plotOutput("distribution_plot", height = "500px")),

        tabPanel("Career Summary",
                 br(),
                 verbatimTextOutput("career_summary"))
      )
    )
  )
)

# Server
server <- function(input, output, session) {

  # Reactive: Filter data for selected player
  player_data <- reactive({
    batting_data %>%
      filter(full_name == input$player) %>%
      arrange(yearID)
  })

  # Reactive: Calculate career totals
  career_totals <- reactive({
    data <- player_data()

    list(
      seasons = nrow(data),
      total_ab = sum(data$AB, na.rm = TRUE),
      total_h = sum(data$H, na.rm = TRUE),
      total_hr = sum(data$HR, na.rm = TRUE),
      total_rbi = sum(data$RBI, na.rm = TRUE),
      total_sb = sum(data$SB, na.rm = TRUE),
      avg_ba = round(sum(data$H, na.rm = TRUE) /
                      sum(data$AB, na.rm = TRUE), 3),
      best_hr_season = max(data$HR, na.rm = TRUE),
      best_hr_year = data$yearID[which.max(data$HR)]
    )
  })

  # Output: Season statistics table
  output$season_stats <- renderTable({
    data <- player_data()

    # Select columns based on user choice
    base_cols <- c("yearID", "teamID", "G", "AB", "H")
    selected_cols <- c(base_cols, input$stats_display)

    data %>%
      select(all_of(selected_cols)) %>%
      rename(Year = yearID, Team = teamID, Games = G,
             `At Bats` = AB, Hits = H)
  }, striped = TRUE, hover = TRUE, bordered = TRUE)

  # Output: Career trend plot
  output$trend_plot <- renderPlot({
    req(input$stats_display)  # Require at least one stat selected

    data <- player_data()

    # Prepare data for plotting
    plot_data <- data %>%
      select(yearID, all_of(input$stats_display)) %>%
      tidyr::pivot_longer(cols = -yearID,
                         names_to = "stat",
                         values_to = "value")

    ggplot(plot_data, aes(x = yearID, y = value, color = stat)) +
      geom_line(size = 1.2) +
      geom_point(size = 3) +
      facet_wrap(~ stat, scales = "free_y", ncol = 2) +
      theme_minimal(base_size = 14) +
      theme(legend.position = "none",
            strip.background = element_rect(fill = "lightblue",
                                          color = "black"),
            strip.text = element_text(face = "bold")) +
      labs(x = "Season",
           y = "Value",
           title = paste("Career Trends:", input$player)) +
      scale_x_continuous(breaks = seq(2015, 2025, 2))
  })

  # Output: Distribution plot
  output$distribution_plot <- renderPlot({
    data <- player_data()

    par(mfrow = c(2, 2))

    # Batting average distribution
    if("BA" %in% input$stats_display) {
      hist(data$BA,
           col = "skyblue",
           border = "white",
           main = "Batting Average Distribution",
           xlab = "BA",
           breaks = 10)
      abline(v = mean(data$BA, na.rm = TRUE),
             col = "red",
             lwd = 2,
             lty = 2)
    }

    # Home runs distribution
    if("HR" %in% input$stats_display) {
      barplot(data$HR,
              names.arg = data$yearID,
              col = "coral",
              border = "white",
              main = "Home Runs by Season",
              xlab = "Season",
              ylab = "Home Runs")
    }

    # OBP distribution
    if("OBP" %in% input$stats_display) {
      hist(data$OBP,
           col = "lightgreen",
           border = "white",
           main = "OBP Distribution",
           xlab = "OBP",
           breaks = 10)
      abline(v = mean(data$OBP, na.rm = TRUE),
             col = "red",
             lwd = 2,
             lty = 2)
    }

    # SLG distribution
    if("SLG" %in% input$stats_display) {
      hist(data$SLG,
           col = "lavender",
           border = "white",
           main = "SLG Distribution",
           xlab = "SLG",
           breaks = 10)
      abline(v = mean(data$SLG, na.rm = TRUE),
             col = "red",
             lwd = 2,
             lty = 2)
    }
  })

  # Output: Career summary
  output$career_summary <- renderPrint({
    totals <- career_totals()

    cat("CAREER SUMMARY:", input$player, "\n")
    cat(strrep("=", 50), "\n\n")
    cat("Seasons Played:", totals$seasons, "\n")
    cat("Total At Bats:", totals$total_ab, "\n")
    cat("Total Hits:", totals$total_h, "\n")
    cat("Total Home Runs:", totals$total_hr, "\n")
    cat("Total RBIs:", totals$total_rbi, "\n")
    cat("Total Stolen Bases:", totals$total_sb, "\n\n")
    cat("Career Batting Average:", totals$avg_ba, "\n")
    cat("Best HR Season:", totals$best_hr_season,
        "in", totals$best_hr_year, "\n")
  })
}

# Run application
shinyApp(ui = ui, server = server)

This dashboard demonstrates several advanced Shiny concepts. The selectInput dropdown automatically populates with all available players from the dataset. The checkboxGroupInput allows users to customize which statistics appear in tables and plots. The tabsetPanel organizes different views—season statistics, career trends, distributions, and summary—into separate tabs for clean organization.

Notice how reactive expressions efficiently structure the logic. The player_data() reactive filters the full dataset once, and multiple outputs reuse this filtered data. The career_totals() reactive performs expensive aggregations once, caching results until the selected player changes. This reactive architecture keeps the app responsive even with larger datasets.

The career trends plot uses facet_wrap() to create small multiples of each selected statistic, allowing easy comparison across years. The distribution plot conditionally displays different visualizations based on which statistics the user selected. This kind of dynamic, user-driven visualization would be tedious to create in static reports but becomes straightforward with Shiny.

12.1.3 Project: Pitch Tracker

Pitch location analysis is central to modern baseball analytics. This application visualizes pitch locations, allowing users to filter by pitcher, pitch type, and outcome.

# pitch_tracker.R - Interactive Pitch Location Visualization
library(shiny)
library(ggplot2)
library(dplyr)

# Simulate pitch data (in production, load from Statcast or similar)
set.seed(42)
generate_pitch_data <- function(n = 500) {
  pitchers <- c("Gerrit Cole", "Jacob deGrom", "Shane Bieber",
                "Walker Buehler", "Corbin Burnes")
  pitch_types <- c("Fastball", "Slider", "Curveball", "Changeup")
  outcomes <- c("Ball", "Called Strike", "Swinging Strike",
                "Foul", "In Play")

  data.frame(
    pitcher = sample(pitchers, n, replace = TRUE),
    pitch_type = sample(pitch_types, n, replace = TRUE,
                       prob = c(0.5, 0.2, 0.15, 0.15)),
    plate_x = rnorm(n, 0, 0.7),  # Horizontal location (ft from center)
    plate_z = rnorm(n, 2.5, 0.5), # Vertical location (ft from ground)
    velocity = rnorm(n, 93, 5),
    spin_rate = rnorm(n, 2300, 300),
    outcome = sample(outcomes, n, replace = TRUE,
                    prob = c(0.3, 0.2, 0.15, 0.2, 0.15))
  )
}

pitch_data <- generate_pitch_data(1000)

# UI
ui <- fluidPage(
  titlePanel("MLB Pitch Tracker"),

  sidebarLayout(
    sidebarPanel(
      selectInput("pitcher_select",
                  "Select Pitcher:",
                  choices = c("All Pitchers",
                             unique(pitch_data$pitcher)),
                  selected = "All Pitchers"),

      checkboxGroupInput("pitch_type_select",
                        "Pitch Types:",
                        choices = unique(pitch_data$pitch_type),
                        selected = unique(pitch_data$pitch_type)),

      checkboxGroupInput("outcome_select",
                        "Outcomes:",
                        choices = unique(pitch_data$outcome),
                        selected = unique(pitch_data$outcome)),

      hr(),

      sliderInput("velocity_range",
                  "Velocity Range (mph):",
                  min = min(pitch_data$velocity),
                  max = max(pitch_data$velocity),
                  value = c(min(pitch_data$velocity),
                           max(pitch_data$velocity)),
                  step = 1),

      hr(),

      h4("Pitch Counts"),
      tableOutput("pitch_counts")
    ),

    mainPanel(
      plotOutput("pitch_location_plot", height = "600px"),
      hr(),
      h4("Summary Statistics"),
      verbatimTextOutput("pitch_summary")
    )
  )
)

# Server
server <- function(input, output, session) {

  # Reactive: Filtered pitch data
  filtered_data <- reactive({
    data <- pitch_data

    # Filter by pitcher
    if(input$pitcher_select != "All Pitchers") {
      data <- data %>% filter(pitcher == input$pitcher_select)
    }

    # Filter by pitch type
    data <- data %>%
      filter(pitch_type %in% input$pitch_type_select)

    # Filter by outcome
    data <- data %>%
      filter(outcome %in% input$outcome_select)

    # Filter by velocity
    data <- data %>%
      filter(velocity >= input$velocity_range[1],
             velocity <= input$velocity_range[2])

    data
  })

  # Output: Pitch location plot
  output$pitch_location_plot <- renderPlot({
    data <- filtered_data()

    # Create strike zone rectangle
    strike_zone <- data.frame(
      x = c(-0.83, 0.83, 0.83, -0.83, -0.83),
      y = c(1.5, 1.5, 3.5, 3.5, 1.5)
    )

    ggplot(data, aes(x = plate_x, y = plate_z)) +
      # Strike zone
      geom_path(data = strike_zone,
                aes(x = x, y = y),
                color = "black",
                size = 1.5) +
      # Home plate
      geom_segment(aes(x = -0.83, xend = 0.83, y = 0, yend = 0),
                   color = "gray30", size = 2) +
      # Pitch locations
      geom_point(aes(color = pitch_type, shape = outcome),
                 size = 4, alpha = 0.6) +
      # Customization
      coord_fixed(ratio = 1) +
      theme_minimal(base_size = 14) +
      theme(panel.grid.major = element_line(color = "gray90"),
            panel.grid.minor = element_blank(),
            legend.position = "right") +
      labs(x = "Horizontal Location (ft from center)",
           y = "Height (ft from ground)",
           title = paste("Pitch Locations:",
                        ifelse(input$pitcher_select == "All Pitchers",
                              "All Pitchers",
                              input$pitcher_select)),
           color = "Pitch Type",
           shape = "Outcome") +
      scale_color_brewer(palette = "Set1") +
      xlim(-2.5, 2.5) +
      ylim(0, 5)
  })

  # Output: Pitch counts table
  output$pitch_counts <- renderTable({
    filtered_data() %>%
      group_by(pitch_type) %>%
      summarise(Count = n(),
                `Avg Velo` = round(mean(velocity), 1),
                .groups = "drop") %>%
      arrange(desc(Count))
  }, striped = TRUE)

  # Output: Summary statistics
  output$pitch_summary <- renderPrint({
    data <- filtered_data()

    cat("FILTERED PITCH SUMMARY\n")
    cat(strrep("=", 50), "\n\n")
    cat("Total Pitches:", nrow(data), "\n\n")

    cat("Average Velocity:", round(mean(data$velocity), 1), "mph\n")
    cat("Velocity Range:", round(min(data$velocity), 1), "-",
        round(max(data$velocity), 1), "mph\n\n")

    cat("Average Spin Rate:", round(mean(data$spin_rate), 0), "rpm\n\n")

    cat("Strike Zone Analysis:\n")
    in_zone <- sum(data$plate_x >= -0.83 & data$plate_x <= 0.83 &
                   data$plate_z >= 1.5 & data$plate_z <= 3.5)
    cat("  In Zone:", in_zone,
        paste0("(", round(100*in_zone/nrow(data), 1), "%)\n"))
    cat("  Outside Zone:", nrow(data) - in_zone,
        paste0("(", round(100*(nrow(data)-in_zone)/nrow(data), 1), "%)\n"))
  })
}

# Run application
shinyApp(ui = ui, server = server)

This pitch tracker demonstrates sophisticated filtering and visualization. Users can filter pitches by multiple criteria simultaneously—pitcher, pitch type, outcome, and velocity—and the plot updates instantly. The strike zone overlay provides critical context for evaluating pitch location quality.

The coord_fixed(ratio = 1) ensures the strike zone appears with correct proportions rather than being distorted by the plot window's aspect ratio. The combination of color (pitch type) and shape (outcome) allows users to see patterns in both dimensions simultaneously. For example, you might notice that curveballs appear lower in the zone and generate more swinging strikes, or that high fastballs result in more foul balls.

12.1.4 Advanced Shiny Features

As your applications grow more complex, several advanced Shiny features become essential for maintaining code quality and performance.

Reactive Values: Sometimes you need to store state that isn't tied to an input widget. The reactiveValues() function creates a list of reactive values you can read and write programmatically.

server <- function(input, output, session) {
  # Create reactive values object
  values <- reactiveValues(
    click_count = 0,
    last_player = NULL,
    analysis_cache = list()
  )

  # Update values when button clicked
  observeEvent(input$analyze_button, {
    values$click_count <- values$click_count + 1
    values$last_player <- input$player_select
  })

  # Use reactive values in outputs
  output$click_display <- renderText({
    paste("Analyses run:", values$click_count)
  })
}

Reactive values enable patterns like undo/redo functionality, maintaining analysis history, or implementing custom validation logic that depends on multiple inputs.

Observe and observeEvent: While reactive expressions return values and cache results, observers execute side effects. Use observe() for effects that should run whenever any dependency changes, and observeEvent() for effects triggered by specific inputs.

server <- function(input, output, session) {
  # Update choices when category changes
  observeEvent(input$category, {
    new_choices <- get_players_by_category(input$category)
    updateSelectInput(session, "player", choices = new_choices)
  })

  # Save settings whenever they change
  observe({
    settings <- list(
      theme = input$theme,
      stats = input$stats_display,
      filters = input$filters
    )
    saveRDS(settings, "user_settings.rds")
  })
}

The updateSelectInput() function demonstrates programmatically changing input widgets—here updating the player dropdown based on category selection. This creates hierarchical filtering patterns common in complex applications.

Modules Basics: As applications grow, organizing code becomes crucial. Shiny modules allow you to create reusable components with their own namespace, preventing ID conflicts and enabling composition.

# Module UI function
playerCardUI <- function(id) {
  ns <- NS(id)  # Namespace function

  tagList(
    selectInput(ns("player"), "Select Player:", choices = NULL),
    tableOutput(ns("stats")),
    plotOutput(ns("plot"))
  )
}

# Module server function
playerCardServer <- function(id, player_data) {
  moduleServer(id, function(input, output, session) {
    output$stats <- renderTable({
      player_data() %>% filter(name == input$player)
    })

    output$plot <- renderPlot({
      # Plot code here
    })
  })
}

# Use module in main app
ui <- fluidPage(
  playerCardUI("player1"),
  playerCardUI("player2")
)

server <- function(input, output, session) {
  data <- reactive({ load_player_data() })

  playerCardServer("player1", data)
  playerCardServer("player2", data)
}

Modules are essential for large applications. You might create modules for different analysis types (hitting, pitching, fielding), different visualization styles (tables, plots, maps), or different user roles (coach view, scout view, analyst view). Each module encapsulates its logic and can be developed and tested independently.

R
# app.R - Minimal Shiny Application
library(shiny)

# Define UI
ui <- fluidPage(
  titlePanel("MLB Home Run Calculator"),

  sidebarLayout(
    sidebarPanel(
      numericInput("ab",
                   "At Bats:",
                   value = 500,
                   min = 1,
                   max = 700),
      sliderInput("hr_rate",
                  "HR Rate (%):",
                  value = 5,
                  min = 0,
                  max = 15,
                  step = 0.1)
    ),

    mainPanel(
      h3("Projected Home Runs"),
      textOutput("hr_projection"),
      br(),
      plotOutput("hr_plot")
    )
  )
)

# Define Server
server <- function(input, output, session) {

  # Reactive calculation
  projected_hrs <- reactive({
    input$ab * (input$hr_rate / 100)
  })

  # Text output
  output$hr_projection <- renderText({
    paste0("Expected HRs: ", round(projected_hrs(), 1))
  })

  # Plot output
  output$hr_plot <- renderPlot({
    hr_values <- seq(0, projected_hrs() * 2, length.out = 100)
    density_values <- dnorm(hr_values,
                            mean = projected_hrs(),
                            sd = sqrt(projected_hrs()))

    plot(hr_values, density_values,
         type = "l",
         lwd = 2,
         col = "darkblue",
         xlab = "Home Runs",
         ylab = "Probability Density",
         main = "Distribution of Possible HR Outcomes")
    abline(v = projected_hrs(),
           col = "red",
           lwd = 2,
           lty = 2)
  })
}

# Run the application
shinyApp(ui = ui, server = server)
R
# player_dashboard.R - Complete MLB Player Dashboard
library(shiny)
library(dplyr)
library(ggplot2)
library(Lahman)  # MLB historical database

# Prepare data
batting_data <- Lahman::Batting %>%
  filter(yearID >= 2015) %>%
  left_join(Lahman::People %>%
              select(playerID, nameFirst, nameLast),
            by = "playerID") %>%
  mutate(
    full_name = paste(nameFirst, nameLast),
    BA = round(H / AB, 3),
    OBP = round((H + BB + HBP) / (AB + BB + HBP + SF), 3),
    SLG = round((H + X2B + 2*X3B + 3*HR) / AB, 3)
  ) %>%
  filter(AB >= 100)  # Minimum 100 AB for meaningful stats

# UI
ui <- fluidPage(
  titlePanel("MLB Player Statistics Dashboard"),

  sidebarLayout(
    sidebarPanel(
      selectInput("player",
                  "Select Player:",
                  choices = sort(unique(batting_data$full_name)),
                  selected = "Mike Trout"),

      checkboxGroupInput("stats_display",
                        "Statistics to Display:",
                        choices = c("Batting Average" = "BA",
                                   "On-Base Percentage" = "OBP",
                                   "Slugging Percentage" = "SLG",
                                   "Home Runs" = "HR",
                                   "RBIs" = "RBI",
                                   "Stolen Bases" = "SB"),
                        selected = c("BA", "OBP", "SLG", "HR")),

      hr(),

      h4("Dashboard Info"),
      p("Data: 2015-2022 seasons"),
      p("Minimum 100 AB per season"),
      p("Source: Lahman database")
    ),

    mainPanel(
      tabsetPanel(
        tabPanel("Season Statistics",
                 br(),
                 tableOutput("season_stats")),

        tabPanel("Career Trends",
                 br(),
                 plotOutput("trend_plot", height = "500px")),

        tabPanel("Performance Distribution",
                 br(),
                 plotOutput("distribution_plot", height = "500px")),

        tabPanel("Career Summary",
                 br(),
                 verbatimTextOutput("career_summary"))
      )
    )
  )
)

# Server
server <- function(input, output, session) {

  # Reactive: Filter data for selected player
  player_data <- reactive({
    batting_data %>%
      filter(full_name == input$player) %>%
      arrange(yearID)
  })

  # Reactive: Calculate career totals
  career_totals <- reactive({
    data <- player_data()

    list(
      seasons = nrow(data),
      total_ab = sum(data$AB, na.rm = TRUE),
      total_h = sum(data$H, na.rm = TRUE),
      total_hr = sum(data$HR, na.rm = TRUE),
      total_rbi = sum(data$RBI, na.rm = TRUE),
      total_sb = sum(data$SB, na.rm = TRUE),
      avg_ba = round(sum(data$H, na.rm = TRUE) /
                      sum(data$AB, na.rm = TRUE), 3),
      best_hr_season = max(data$HR, na.rm = TRUE),
      best_hr_year = data$yearID[which.max(data$HR)]
    )
  })

  # Output: Season statistics table
  output$season_stats <- renderTable({
    data <- player_data()

    # Select columns based on user choice
    base_cols <- c("yearID", "teamID", "G", "AB", "H")
    selected_cols <- c(base_cols, input$stats_display)

    data %>%
      select(all_of(selected_cols)) %>%
      rename(Year = yearID, Team = teamID, Games = G,
             `At Bats` = AB, Hits = H)
  }, striped = TRUE, hover = TRUE, bordered = TRUE)

  # Output: Career trend plot
  output$trend_plot <- renderPlot({
    req(input$stats_display)  # Require at least one stat selected

    data <- player_data()

    # Prepare data for plotting
    plot_data <- data %>%
      select(yearID, all_of(input$stats_display)) %>%
      tidyr::pivot_longer(cols = -yearID,
                         names_to = "stat",
                         values_to = "value")

    ggplot(plot_data, aes(x = yearID, y = value, color = stat)) +
      geom_line(size = 1.2) +
      geom_point(size = 3) +
      facet_wrap(~ stat, scales = "free_y", ncol = 2) +
      theme_minimal(base_size = 14) +
      theme(legend.position = "none",
            strip.background = element_rect(fill = "lightblue",
                                          color = "black"),
            strip.text = element_text(face = "bold")) +
      labs(x = "Season",
           y = "Value",
           title = paste("Career Trends:", input$player)) +
      scale_x_continuous(breaks = seq(2015, 2025, 2))
  })

  # Output: Distribution plot
  output$distribution_plot <- renderPlot({
    data <- player_data()

    par(mfrow = c(2, 2))

    # Batting average distribution
    if("BA" %in% input$stats_display) {
      hist(data$BA,
           col = "skyblue",
           border = "white",
           main = "Batting Average Distribution",
           xlab = "BA",
           breaks = 10)
      abline(v = mean(data$BA, na.rm = TRUE),
             col = "red",
             lwd = 2,
             lty = 2)
    }

    # Home runs distribution
    if("HR" %in% input$stats_display) {
      barplot(data$HR,
              names.arg = data$yearID,
              col = "coral",
              border = "white",
              main = "Home Runs by Season",
              xlab = "Season",
              ylab = "Home Runs")
    }

    # OBP distribution
    if("OBP" %in% input$stats_display) {
      hist(data$OBP,
           col = "lightgreen",
           border = "white",
           main = "OBP Distribution",
           xlab = "OBP",
           breaks = 10)
      abline(v = mean(data$OBP, na.rm = TRUE),
             col = "red",
             lwd = 2,
             lty = 2)
    }

    # SLG distribution
    if("SLG" %in% input$stats_display) {
      hist(data$SLG,
           col = "lavender",
           border = "white",
           main = "SLG Distribution",
           xlab = "SLG",
           breaks = 10)
      abline(v = mean(data$SLG, na.rm = TRUE),
             col = "red",
             lwd = 2,
             lty = 2)
    }
  })

  # Output: Career summary
  output$career_summary <- renderPrint({
    totals <- career_totals()

    cat("CAREER SUMMARY:", input$player, "\n")
    cat(strrep("=", 50), "\n\n")
    cat("Seasons Played:", totals$seasons, "\n")
    cat("Total At Bats:", totals$total_ab, "\n")
    cat("Total Hits:", totals$total_h, "\n")
    cat("Total Home Runs:", totals$total_hr, "\n")
    cat("Total RBIs:", totals$total_rbi, "\n")
    cat("Total Stolen Bases:", totals$total_sb, "\n\n")
    cat("Career Batting Average:", totals$avg_ba, "\n")
    cat("Best HR Season:", totals$best_hr_season,
        "in", totals$best_hr_year, "\n")
  })
}

# Run application
shinyApp(ui = ui, server = server)
R
# pitch_tracker.R - Interactive Pitch Location Visualization
library(shiny)
library(ggplot2)
library(dplyr)

# Simulate pitch data (in production, load from Statcast or similar)
set.seed(42)
generate_pitch_data <- function(n = 500) {
  pitchers <- c("Gerrit Cole", "Jacob deGrom", "Shane Bieber",
                "Walker Buehler", "Corbin Burnes")
  pitch_types <- c("Fastball", "Slider", "Curveball", "Changeup")
  outcomes <- c("Ball", "Called Strike", "Swinging Strike",
                "Foul", "In Play")

  data.frame(
    pitcher = sample(pitchers, n, replace = TRUE),
    pitch_type = sample(pitch_types, n, replace = TRUE,
                       prob = c(0.5, 0.2, 0.15, 0.15)),
    plate_x = rnorm(n, 0, 0.7),  # Horizontal location (ft from center)
    plate_z = rnorm(n, 2.5, 0.5), # Vertical location (ft from ground)
    velocity = rnorm(n, 93, 5),
    spin_rate = rnorm(n, 2300, 300),
    outcome = sample(outcomes, n, replace = TRUE,
                    prob = c(0.3, 0.2, 0.15, 0.2, 0.15))
  )
}

pitch_data <- generate_pitch_data(1000)

# UI
ui <- fluidPage(
  titlePanel("MLB Pitch Tracker"),

  sidebarLayout(
    sidebarPanel(
      selectInput("pitcher_select",
                  "Select Pitcher:",
                  choices = c("All Pitchers",
                             unique(pitch_data$pitcher)),
                  selected = "All Pitchers"),

      checkboxGroupInput("pitch_type_select",
                        "Pitch Types:",
                        choices = unique(pitch_data$pitch_type),
                        selected = unique(pitch_data$pitch_type)),

      checkboxGroupInput("outcome_select",
                        "Outcomes:",
                        choices = unique(pitch_data$outcome),
                        selected = unique(pitch_data$outcome)),

      hr(),

      sliderInput("velocity_range",
                  "Velocity Range (mph):",
                  min = min(pitch_data$velocity),
                  max = max(pitch_data$velocity),
                  value = c(min(pitch_data$velocity),
                           max(pitch_data$velocity)),
                  step = 1),

      hr(),

      h4("Pitch Counts"),
      tableOutput("pitch_counts")
    ),

    mainPanel(
      plotOutput("pitch_location_plot", height = "600px"),
      hr(),
      h4("Summary Statistics"),
      verbatimTextOutput("pitch_summary")
    )
  )
)

# Server
server <- function(input, output, session) {

  # Reactive: Filtered pitch data
  filtered_data <- reactive({
    data <- pitch_data

    # Filter by pitcher
    if(input$pitcher_select != "All Pitchers") {
      data <- data %>% filter(pitcher == input$pitcher_select)
    }

    # Filter by pitch type
    data <- data %>%
      filter(pitch_type %in% input$pitch_type_select)

    # Filter by outcome
    data <- data %>%
      filter(outcome %in% input$outcome_select)

    # Filter by velocity
    data <- data %>%
      filter(velocity >= input$velocity_range[1],
             velocity <= input$velocity_range[2])

    data
  })

  # Output: Pitch location plot
  output$pitch_location_plot <- renderPlot({
    data <- filtered_data()

    # Create strike zone rectangle
    strike_zone <- data.frame(
      x = c(-0.83, 0.83, 0.83, -0.83, -0.83),
      y = c(1.5, 1.5, 3.5, 3.5, 1.5)
    )

    ggplot(data, aes(x = plate_x, y = plate_z)) +
      # Strike zone
      geom_path(data = strike_zone,
                aes(x = x, y = y),
                color = "black",
                size = 1.5) +
      # Home plate
      geom_segment(aes(x = -0.83, xend = 0.83, y = 0, yend = 0),
                   color = "gray30", size = 2) +
      # Pitch locations
      geom_point(aes(color = pitch_type, shape = outcome),
                 size = 4, alpha = 0.6) +
      # Customization
      coord_fixed(ratio = 1) +
      theme_minimal(base_size = 14) +
      theme(panel.grid.major = element_line(color = "gray90"),
            panel.grid.minor = element_blank(),
            legend.position = "right") +
      labs(x = "Horizontal Location (ft from center)",
           y = "Height (ft from ground)",
           title = paste("Pitch Locations:",
                        ifelse(input$pitcher_select == "All Pitchers",
                              "All Pitchers",
                              input$pitcher_select)),
           color = "Pitch Type",
           shape = "Outcome") +
      scale_color_brewer(palette = "Set1") +
      xlim(-2.5, 2.5) +
      ylim(0, 5)
  })

  # Output: Pitch counts table
  output$pitch_counts <- renderTable({
    filtered_data() %>%
      group_by(pitch_type) %>%
      summarise(Count = n(),
                `Avg Velo` = round(mean(velocity), 1),
                .groups = "drop") %>%
      arrange(desc(Count))
  }, striped = TRUE)

  # Output: Summary statistics
  output$pitch_summary <- renderPrint({
    data <- filtered_data()

    cat("FILTERED PITCH SUMMARY\n")
    cat(strrep("=", 50), "\n\n")
    cat("Total Pitches:", nrow(data), "\n\n")

    cat("Average Velocity:", round(mean(data$velocity), 1), "mph\n")
    cat("Velocity Range:", round(min(data$velocity), 1), "-",
        round(max(data$velocity), 1), "mph\n\n")

    cat("Average Spin Rate:", round(mean(data$spin_rate), 0), "rpm\n\n")

    cat("Strike Zone Analysis:\n")
    in_zone <- sum(data$plate_x >= -0.83 & data$plate_x <= 0.83 &
                   data$plate_z >= 1.5 & data$plate_z <= 3.5)
    cat("  In Zone:", in_zone,
        paste0("(", round(100*in_zone/nrow(data), 1), "%)\n"))
    cat("  Outside Zone:", nrow(data) - in_zone,
        paste0("(", round(100*(nrow(data)-in_zone)/nrow(data), 1), "%)\n"))
  })
}

# Run application
shinyApp(ui = ui, server = server)
R
server <- function(input, output, session) {
  # Create reactive values object
  values <- reactiveValues(
    click_count = 0,
    last_player = NULL,
    analysis_cache = list()
  )

  # Update values when button clicked
  observeEvent(input$analyze_button, {
    values$click_count <- values$click_count + 1
    values$last_player <- input$player_select
  })

  # Use reactive values in outputs
  output$click_display <- renderText({
    paste("Analyses run:", values$click_count)
  })
}
R
server <- function(input, output, session) {
  # Update choices when category changes
  observeEvent(input$category, {
    new_choices <- get_players_by_category(input$category)
    updateSelectInput(session, "player", choices = new_choices)
  })

  # Save settings whenever they change
  observe({
    settings <- list(
      theme = input$theme,
      stats = input$stats_display,
      filters = input$filters
    )
    saveRDS(settings, "user_settings.rds")
  })
}
R
# Module UI function
playerCardUI <- function(id) {
  ns <- NS(id)  # Namespace function

  tagList(
    selectInput(ns("player"), "Select Player:", choices = NULL),
    tableOutput(ns("stats")),
    plotOutput(ns("plot"))
  )
}

# Module server function
playerCardServer <- function(id, player_data) {
  moduleServer(id, function(input, output, session) {
    output$stats <- renderTable({
      player_data() %>% filter(name == input$player)
    })

    output$plot <- renderPlot({
      # Plot code here
    })
  })
}

# Use module in main app
ui <- fluidPage(
  playerCardUI("player1"),
  playerCardUI("player2")
)

server <- function(input, output, session) {
  data <- reactive({ load_player_data() })

  playerCardServer("player1", data)
  playerCardServer("player2", data)
}

12.2 Streamlit (Python)

Streamlit, released in 2019, rapidly became Python's most popular framework for building data applications. Its philosophy differs markedly from Shiny: instead of separate UI and server components, Streamlit apps are simply Python scripts that run from top to bottom. This simplicity makes Streamlit incredibly easy to learn, though it trades some of Shiny's fine-grained control for development speed.

12.2.1 Streamlit Basics

A Streamlit app is just a Python script. Streamlit reruns the entire script from top to bottom whenever a user interacts with a widget. While this sounds inefficient, Streamlit's caching system ensures computations only rerun when necessary.

# streamlit_basic.py - Minimal Streamlit Application
import streamlit as st
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Title and description
st.title("MLB Home Run Calculator")
st.write("Calculate projected home runs based on at-bats and HR rate")

# Widgets (automatically create variables)
ab = st.number_input("At Bats", min_value=1, max_value=700, value=500)
hr_rate = st.slider("HR Rate (%)", min_value=0.0, max_value=15.0,
                    value=5.0, step=0.1)

# Calculations
projected_hrs = ab * (hr_rate / 100)

# Display results
st.subheader("Projected Home Runs")
st.metric("Expected HRs", f"{projected_hrs:.1f}")

# Create visualization
st.subheader("Distribution of Possible Outcomes")

# Generate normal distribution
hr_values = np.linspace(0, projected_hrs * 2, 100)
density = (1 / np.sqrt(2 * np.pi * projected_hrs)) * \
          np.exp(-((hr_values - projected_hrs)**2) / (2 * projected_hrs))

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(hr_values, density, linewidth=2, color='darkblue')
ax.axvline(projected_hrs, color='red', linestyle='--', linewidth=2)
ax.set_xlabel("Home Runs")
ax.set_ylabel("Probability Density")
ax.set_title("Distribution of Possible HR Outcomes")
ax.grid(True, alpha=0.3)

st.pyplot(fig)

Run this with streamlit run streamlit_basic.py. The script executes from top to bottom each time a user changes an input. Notice how simple this is compared to Shiny—no separate UI and server, no reactive expressions, just straightforward Python code.

Widgets and Caching: Streamlit provides widgets for every input type: st.selectbox(), st.multiselect(), st.checkbox(), st.radio(), st.text_input(), st.date_input(), and many more. Each returns the current value directly.

Caching prevents expensive recomputations. The @st.cache_data decorator caches function results based on input arguments.

import streamlit as st
import pandas as pd

@st.cache_data
def load_batting_data():
    """Load batting data (cached)"""
    # This expensive operation only runs once
    df = pd.read_csv("batting_stats.csv")
    df['BA'] = df['H'] / df['AB']
    return df

# This loads from cache on subsequent runs
data = load_batting_data()

st.write(f"Loaded {len(data)} player seasons")

Use @st.cache_data for data loading and transformations. For resource connections (database connections, ML models), use @st.cache_resource which doesn't create copies.

12.2.2 Project: MLB Analytics Dashboard

Let's build a complete MLB analytics dashboard in Streamlit, equivalent to our Shiny player dashboard.

# mlb_dashboard.py - Complete MLB Analytics Dashboard
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

# Page configuration
st.set_page_config(
    page_title="MLB Analytics Dashboard",
    page_icon="⚾",
    layout="wide"
)

# Custom CSS
st.markdown("""
    <style>
    .main {
        padding: 0rem 1rem;
    }
    .stMetric {
        background-color: #f0f2f6;
        padding: 10px;
        border-radius: 5px;
    }
    </style>
""", unsafe_allow_html=True)

# Title
st.title("⚾ MLB Player Statistics Dashboard")
st.markdown("---")

# Load and prepare data
@st.cache_data
def load_data():
    """Load and prepare batting statistics"""
    # Simulated data - in production, load from Lahman or similar
    np.random.seed(42)

    players = ["Mike Trout", "Mookie Betts", "Aaron Judge",
               "Ronald Acuna Jr.", "Freddie Freeman", "Shohei Ohtani",
               "Juan Soto", "Bryce Harper", "Manny Machado", "Jose Ramirez"]
    years = range(2015, 2023)
    teams = ["LAA", "LAD", "NYY", "ATL", "LAD", "LAA", "WSN", "PHI", "SD", "CLE"]

    data = []
    for player, team in zip(players, teams):
        for year in years:
            ab = np.random.randint(450, 650)
            h = int(ab * np.random.uniform(0.240, 0.320))
            hr = np.random.randint(15, 45)
            bb = np.random.randint(50, 120)
            hbp = np.random.randint(3, 15)
            sf = np.random.randint(3, 10)
            doubles = int(h * np.random.uniform(0.15, 0.25))
            triples = np.random.randint(0, 8)
            rbi = np.random.randint(60, 120)
            sb = np.random.randint(5, 30)

            ba = round(h / ab, 3) if ab > 0 else 0
            obp = round((h + bb + hbp) / (ab + bb + hbp + sf), 3) \
                  if (ab + bb + hbp + sf) > 0 else 0
            slg = round((h + doubles + 2*triples + 3*hr) / ab, 3) \
                  if ab > 0 else 0
            ops = round(obp + slg, 3)

            data.append({
                'Player': player,
                'Year': year,
                'Team': team,
                'G': np.random.randint(120, 162),
                'AB': ab,
                'H': h,
                'HR': hr,
                'RBI': rbi,
                'SB': sb,
                'BB': bb,
                'BA': ba,
                'OBP': obp,
                'SLG': slg,
                'OPS': ops
            })

    return pd.DataFrame(data)

# Load data
with st.spinner("Loading player data..."):
    batting_data = load_data()

# Sidebar filters
st.sidebar.header("Filters & Options")

selected_player = st.sidebar.selectbox(
    "Select Player",
    options=sorted(batting_data['Player'].unique()),
    index=0
)

selected_stats = st.sidebar.multiselect(
    "Statistics to Display",
    options=['BA', 'OBP', 'SLG', 'OPS', 'HR', 'RBI', 'SB'],
    default=['BA', 'OBP', 'SLG', 'HR']
)

year_range = st.sidebar.slider(
    "Year Range",
    min_value=int(batting_data['Year'].min()),
    max_value=int(batting_data['Year'].max()),
    value=(int(batting_data['Year'].min()),
           int(batting_data['Year'].max()))
)

st.sidebar.markdown("---")
st.sidebar.info("""
**Dashboard Info**
- Data: 2015-2022 seasons
- Simulated statistics
- Interactive visualizations
""")

# Filter data for selected player and year range
player_data = batting_data[
    (batting_data['Player'] == selected_player) &
    (batting_data['Year'] >= year_range[0]) &
    (batting_data['Year'] <= year_range[1])
].sort_values('Year')

# Main content area
if len(player_data) == 0:
    st.warning("No data available for selected filters")
    st.stop()

# Header with player name
st.header(f"{selected_player}")

# Key metrics in columns
col1, col2, col3, col4, col5 = st.columns(5)

with col1:
    st.metric("Seasons", len(player_data))

with col2:
    total_hr = player_data['HR'].sum()
    st.metric("Total HR", f"{total_hr:,}")

with col3:
    total_rbi = player_data['RBI'].sum()
    st.metric("Total RBI", f"{total_rbi:,}")

with col4:
    career_ba = player_data['H'].sum() / player_data['AB'].sum()
    st.metric("Career BA", f"{career_ba:.3f}")

with col5:
    career_obp = (player_data['H'].sum() + player_data['BB'].sum()) / \
                 (player_data['AB'].sum() + player_data['BB'].sum())
    st.metric("Career OBP", f"{career_obp:.3f}")

st.markdown("---")

# Tabs for different views
tab1, tab2, tab3, tab4 = st.tabs([
    "? Season Statistics",
    "? Career Trends",
    "? Distributions",
    "? Career Summary"
])

with tab1:
    st.subheader("Season-by-Season Statistics")

    # Select columns to display
    display_cols = ['Year', 'Team', 'G', 'AB', 'H'] + selected_stats
    display_data = player_data[display_cols].copy()

    # Format the dataframe
    st.dataframe(
        display_data.style.format({
            'BA': '{:.3f}',
            'OBP': '{:.3f}',
            'SLG': '{:.3f}',
            'OPS': '{:.3f}'
        } if any(x in selected_stats for x in ['BA', 'OBP', 'SLG', 'OPS'])
          else {}),
        use_container_width=True,
        height=400
    )

    # Download button
    csv = display_data.to_csv(index=False)
    st.download_button(
        label="Download CSV",
        data=csv,
        file_name=f"{selected_player}_stats.csv",
        mime="text/csv"
    )

with tab2:
    st.subheader("Career Trends")

    if len(selected_stats) == 0:
        st.info("Please select at least one statistic to display")
    else:
        # Determine grid layout
        n_stats = len(selected_stats)
        n_cols = 2
        n_rows = (n_stats + n_cols - 1) // n_cols

        fig, axes = plt.subplots(n_rows, n_cols,
                                figsize=(15, 5*n_rows))
        if n_rows == 1:
            axes = axes.reshape(1, -1)

        for idx, stat in enumerate(selected_stats):
            row = idx // n_cols
            col = idx % n_cols
            ax = axes[row, col]

            ax.plot(player_data['Year'], player_data[stat],
                   marker='o', linewidth=2, markersize=8,
                   color='steelblue')
            ax.set_xlabel("Season", fontsize=12)
            ax.set_ylabel(stat, fontsize=12)
            ax.set_title(f"{stat} Trends", fontsize=14, fontweight='bold')
            ax.grid(True, alpha=0.3)
            ax.set_xticks(player_data['Year'])
            ax.tick_params(axis='x', rotation=45)

        # Hide extra subplots
        for idx in range(n_stats, n_rows * n_cols):
            row = idx // n_cols
            col = idx % n_cols
            axes[row, col].axis('off')

        plt.tight_layout()
        st.pyplot(fig)

with tab3:
    st.subheader("Performance Distributions")

    if len(selected_stats) == 0:
        st.info("Please select at least one statistic to display")
    else:
        # Create distribution plots
        n_stats = len(selected_stats)
        n_cols = 2
        n_rows = (n_stats + n_cols - 1) // n_cols

        fig, axes = plt.subplots(n_rows, n_cols,
                                figsize=(15, 5*n_rows))
        if n_rows == 1:
            axes = axes.reshape(1, -1)

        colors = ['skyblue', 'coral', 'lightgreen', 'lavender',
                 'wheat', 'lightpink', 'lightcyan']

        for idx, stat in enumerate(selected_stats):
            row = idx // n_cols
            col = idx % n_cols
            ax = axes[row, col]

            if stat in ['HR', 'RBI', 'SB']:
                # Bar plot for counting stats
                ax.bar(player_data['Year'], player_data[stat],
                      color=colors[idx % len(colors)],
                      edgecolor='black', alpha=0.7)
                ax.set_xlabel("Season", fontsize=12)
                ax.set_ylabel(stat, fontsize=12)
            else:
                # Histogram for rate stats
                ax.hist(player_data[stat], bins=8,
                       color=colors[idx % len(colors)],
                       edgecolor='black', alpha=0.7)
                ax.axvline(player_data[stat].mean(),
                          color='red', linestyle='--',
                          linewidth=2, label='Mean')
                ax.set_xlabel(stat, fontsize=12)
                ax.set_ylabel("Frequency", fontsize=12)
                ax.legend()

            ax.set_title(f"{stat} Distribution",
                        fontsize=14, fontweight='bold')
            ax.grid(True, alpha=0.3, axis='y')

        # Hide extra subplots
        for idx in range(n_stats, n_rows * n_cols):
            row = idx // n_cols
            col = idx % n_cols
            axes[row, col].axis('off')

        plt.tight_layout()
        st.pyplot(fig)

with tab4:
    st.subheader("Career Summary")

    col1, col2 = st.columns([1, 1])

    with col1:
        st.markdown("### Cumulative Statistics")

        summary_data = {
            'Statistic': [
                'Seasons Played',
                'Total Games',
                'Total At Bats',
                'Total Hits',
                'Total Home Runs',
                'Total RBIs',
                'Total Stolen Bases',
                'Total Walks'
            ],
            'Value': [
                len(player_data),
                f"{player_data['G'].sum():,}",
                f"{player_data['AB'].sum():,}",
                f"{player_data['H'].sum():,}",
                f"{player_data['HR'].sum():,}",
                f"{player_data['RBI'].sum():,}",
                f"{player_data['SB'].sum():,}",
                f"{player_data['BB'].sum():,}"
            ]
        }

        st.dataframe(
            pd.DataFrame(summary_data),
            use_container_width=True,
            hide_index=True
        )

    with col2:
        st.markdown("### Career Averages")

        avg_data = {
            'Statistic': [
                'Career Batting Average',
                'Career OBP',
                'Career SLG',
                'Career OPS',
                'Avg HR per Season',
                'Avg RBI per Season',
                'Best HR Season',
                'Best BA Season'
            ],
            'Value': [
                f"{career_ba:.3f}",
                f"{career_obp:.3f}",
                f"{player_data['H'].sum() / player_data['AB'].sum():.3f}",
                f"{career_obp + (player_data['H'].sum() / player_data['AB'].sum()):.3f}",
                f"{player_data['HR'].mean():.1f}",
                f"{player_data['RBI'].mean():.1f}",
                f"{player_data['HR'].max()} ({player_data.loc[player_data['HR'].idxmax(), 'Year']})",
                f"{player_data['BA'].max():.3f} ({player_data.loc[player_data['BA'].idxmax(), 'Year']})"
            ]
        }

        st.dataframe(
            pd.DataFrame(avg_data),
            use_container_width=True,
            hide_index=True
        )

    # Best season
    st.markdown("### Best Season (by OPS)")
    best_season = player_data.loc[player_data['OPS'].idxmax()]

    cols = st.columns(6)
    cols[0].metric("Year", int(best_season['Year']))
    cols[1].metric("BA", f"{best_season['BA']:.3f}")
    cols[2].metric("HR", int(best_season['HR']))
    cols[3].metric("RBI", int(best_season['RBI']))
    cols[4].metric("OPS", f"{best_season['OPS']:.3f}")
    cols[5].metric("Games", int(best_season['G']))

# Footer
st.markdown("---")
st.caption("MLB Analytics Dashboard | Data: 2015-2022 | Built with Streamlit")

Run this with streamlit run mlb_dashboard.py. This complete dashboard demonstrates Streamlit's strengths: rapid development, clean layout with columns and tabs, built-in widgets, and easy data display.

Notice the st.set_page_config() at the top—this must be the first Streamlit command and controls page title, icon, and layout. The layout="wide" parameter uses the full browser width, ideal for dashboards.

The st.columns() function creates side-by-side layouts. We use it for metrics at the top and for the two-column career summary. The st.tabs() function organizes content into tabs, just like our Shiny dashboard.

Streamlit's st.metric() provides attractive metric displays with optional deltas. The st.dataframe() function renders interactive tables with sorting and filtering built-in. The st.download_button() allows users to download data directly—a feature that requires more code in Shiny.

12.2.3 Comparing Shiny vs Streamlit

Both frameworks excel at building interactive data applications but make different trade-offs. Understanding these differences helps you choose the right tool for each project.

FeatureShiny (R)Streamlit (Python)
Learning CurveModerate - requires understanding reactive programmingEasy - just write Python scripts top-to-bottom
Development SpeedSlower initially, faster once you understand reactivityVery fast - minimal boilerplate
CustomizationHigh - full control over UI with htmltools, JavaScriptModerate - limited to provided components
PerformanceExcellent - reactive system only recomputes what changedGood - aggressive caching required for complex apps
Deploymentshinyapps.io (free tier), Shiny Server, ConnectStreamlit Cloud (free), standard Python hosting
EcosystemMature - extensive packages, large communityGrowing rapidly - newer but very active
Layout ControlPrecise - full CSS/HTML control possibleSimplified - column-based layouts, less flexibility
State ManagementExplicit reactivity, complex but powerfulAutomatic reruns, simpler but less control
InteractivityHighly interactive - websocket connectionInteractive but entire script reruns
Best ForComplex applications, fine-grained control, R usersRapid prototyping, Python users, quick dashboards
Code OrganizationModules for large apps, clear separationCan become messy in large apps without discipline
AuthenticationAvailable via packages and Shiny Server ProLimited in free tier, requires custom solutions

When to Choose Shiny:


  • Your team primarily uses R

  • You need fine-grained control over reactivity and updates

  • You're building a complex application with many interconnected pieces

  • You need sophisticated authentication and user management

  • You want to integrate deeply with R's statistical packages

When to Choose Streamlit:


  • Your team primarily uses Python

  • You need to build a prototype quickly

  • You're creating a straightforward analytics dashboard

  • You want minimal code and easy deployment

  • You're comfortable with the app rerunning frequently

Hybrid Approach: Some teams use both. Streamlit for quick internal dashboards and prototypes, Shiny for production applications requiring more control. The best choice depends on your team's skills, project requirements, and timeline.

Python
# streamlit_basic.py - Minimal Streamlit Application
import streamlit as st
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Title and description
st.title("MLB Home Run Calculator")
st.write("Calculate projected home runs based on at-bats and HR rate")

# Widgets (automatically create variables)
ab = st.number_input("At Bats", min_value=1, max_value=700, value=500)
hr_rate = st.slider("HR Rate (%)", min_value=0.0, max_value=15.0,
                    value=5.0, step=0.1)

# Calculations
projected_hrs = ab * (hr_rate / 100)

# Display results
st.subheader("Projected Home Runs")
st.metric("Expected HRs", f"{projected_hrs:.1f}")

# Create visualization
st.subheader("Distribution of Possible Outcomes")

# Generate normal distribution
hr_values = np.linspace(0, projected_hrs * 2, 100)
density = (1 / np.sqrt(2 * np.pi * projected_hrs)) * \
          np.exp(-((hr_values - projected_hrs)**2) / (2 * projected_hrs))

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(hr_values, density, linewidth=2, color='darkblue')
ax.axvline(projected_hrs, color='red', linestyle='--', linewidth=2)
ax.set_xlabel("Home Runs")
ax.set_ylabel("Probability Density")
ax.set_title("Distribution of Possible HR Outcomes")
ax.grid(True, alpha=0.3)

st.pyplot(fig)
Python
import streamlit as st
import pandas as pd

@st.cache_data
def load_batting_data():
    """Load batting data (cached)"""
    # This expensive operation only runs once
    df = pd.read_csv("batting_stats.csv")
    df['BA'] = df['H'] / df['AB']
    return df

# This loads from cache on subsequent runs
data = load_batting_data()

st.write(f"Loaded {len(data)} player seasons")
Python
# mlb_dashboard.py - Complete MLB Analytics Dashboard
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

# Page configuration
st.set_page_config(
    page_title="MLB Analytics Dashboard",
    page_icon="⚾",
    layout="wide"
)

# Custom CSS
st.markdown("""
    <style>
    .main {
        padding: 0rem 1rem;
    }
    .stMetric {
        background-color: #f0f2f6;
        padding: 10px;
        border-radius: 5px;
    }
    </style>
""", unsafe_allow_html=True)

# Title
st.title("⚾ MLB Player Statistics Dashboard")
st.markdown("---")

# Load and prepare data
@st.cache_data
def load_data():
    """Load and prepare batting statistics"""
    # Simulated data - in production, load from Lahman or similar
    np.random.seed(42)

    players = ["Mike Trout", "Mookie Betts", "Aaron Judge",
               "Ronald Acuna Jr.", "Freddie Freeman", "Shohei Ohtani",
               "Juan Soto", "Bryce Harper", "Manny Machado", "Jose Ramirez"]
    years = range(2015, 2023)
    teams = ["LAA", "LAD", "NYY", "ATL", "LAD", "LAA", "WSN", "PHI", "SD", "CLE"]

    data = []
    for player, team in zip(players, teams):
        for year in years:
            ab = np.random.randint(450, 650)
            h = int(ab * np.random.uniform(0.240, 0.320))
            hr = np.random.randint(15, 45)
            bb = np.random.randint(50, 120)
            hbp = np.random.randint(3, 15)
            sf = np.random.randint(3, 10)
            doubles = int(h * np.random.uniform(0.15, 0.25))
            triples = np.random.randint(0, 8)
            rbi = np.random.randint(60, 120)
            sb = np.random.randint(5, 30)

            ba = round(h / ab, 3) if ab > 0 else 0
            obp = round((h + bb + hbp) / (ab + bb + hbp + sf), 3) \
                  if (ab + bb + hbp + sf) > 0 else 0
            slg = round((h + doubles + 2*triples + 3*hr) / ab, 3) \
                  if ab > 0 else 0
            ops = round(obp + slg, 3)

            data.append({
                'Player': player,
                'Year': year,
                'Team': team,
                'G': np.random.randint(120, 162),
                'AB': ab,
                'H': h,
                'HR': hr,
                'RBI': rbi,
                'SB': sb,
                'BB': bb,
                'BA': ba,
                'OBP': obp,
                'SLG': slg,
                'OPS': ops
            })

    return pd.DataFrame(data)

# Load data
with st.spinner("Loading player data..."):
    batting_data = load_data()

# Sidebar filters
st.sidebar.header("Filters & Options")

selected_player = st.sidebar.selectbox(
    "Select Player",
    options=sorted(batting_data['Player'].unique()),
    index=0
)

selected_stats = st.sidebar.multiselect(
    "Statistics to Display",
    options=['BA', 'OBP', 'SLG', 'OPS', 'HR', 'RBI', 'SB'],
    default=['BA', 'OBP', 'SLG', 'HR']
)

year_range = st.sidebar.slider(
    "Year Range",
    min_value=int(batting_data['Year'].min()),
    max_value=int(batting_data['Year'].max()),
    value=(int(batting_data['Year'].min()),
           int(batting_data['Year'].max()))
)

st.sidebar.markdown("---")
st.sidebar.info("""
**Dashboard Info**
- Data: 2015-2022 seasons
- Simulated statistics
- Interactive visualizations
""")

# Filter data for selected player and year range
player_data = batting_data[
    (batting_data['Player'] == selected_player) &
    (batting_data['Year'] >= year_range[0]) &
    (batting_data['Year'] <= year_range[1])
].sort_values('Year')

# Main content area
if len(player_data) == 0:
    st.warning("No data available for selected filters")
    st.stop()

# Header with player name
st.header(f"{selected_player}")

# Key metrics in columns
col1, col2, col3, col4, col5 = st.columns(5)

with col1:
    st.metric("Seasons", len(player_data))

with col2:
    total_hr = player_data['HR'].sum()
    st.metric("Total HR", f"{total_hr:,}")

with col3:
    total_rbi = player_data['RBI'].sum()
    st.metric("Total RBI", f"{total_rbi:,}")

with col4:
    career_ba = player_data['H'].sum() / player_data['AB'].sum()
    st.metric("Career BA", f"{career_ba:.3f}")

with col5:
    career_obp = (player_data['H'].sum() + player_data['BB'].sum()) / \
                 (player_data['AB'].sum() + player_data['BB'].sum())
    st.metric("Career OBP", f"{career_obp:.3f}")

st.markdown("---")

# Tabs for different views
tab1, tab2, tab3, tab4 = st.tabs([
    "? Season Statistics",
    "? Career Trends",
    "? Distributions",
    "? Career Summary"
])

with tab1:
    st.subheader("Season-by-Season Statistics")

    # Select columns to display
    display_cols = ['Year', 'Team', 'G', 'AB', 'H'] + selected_stats
    display_data = player_data[display_cols].copy()

    # Format the dataframe
    st.dataframe(
        display_data.style.format({
            'BA': '{:.3f}',
            'OBP': '{:.3f}',
            'SLG': '{:.3f}',
            'OPS': '{:.3f}'
        } if any(x in selected_stats for x in ['BA', 'OBP', 'SLG', 'OPS'])
          else {}),
        use_container_width=True,
        height=400
    )

    # Download button
    csv = display_data.to_csv(index=False)
    st.download_button(
        label="Download CSV",
        data=csv,
        file_name=f"{selected_player}_stats.csv",
        mime="text/csv"
    )

with tab2:
    st.subheader("Career Trends")

    if len(selected_stats) == 0:
        st.info("Please select at least one statistic to display")
    else:
        # Determine grid layout
        n_stats = len(selected_stats)
        n_cols = 2
        n_rows = (n_stats + n_cols - 1) // n_cols

        fig, axes = plt.subplots(n_rows, n_cols,
                                figsize=(15, 5*n_rows))
        if n_rows == 1:
            axes = axes.reshape(1, -1)

        for idx, stat in enumerate(selected_stats):
            row = idx // n_cols
            col = idx % n_cols
            ax = axes[row, col]

            ax.plot(player_data['Year'], player_data[stat],
                   marker='o', linewidth=2, markersize=8,
                   color='steelblue')
            ax.set_xlabel("Season", fontsize=12)
            ax.set_ylabel(stat, fontsize=12)
            ax.set_title(f"{stat} Trends", fontsize=14, fontweight='bold')
            ax.grid(True, alpha=0.3)
            ax.set_xticks(player_data['Year'])
            ax.tick_params(axis='x', rotation=45)

        # Hide extra subplots
        for idx in range(n_stats, n_rows * n_cols):
            row = idx // n_cols
            col = idx % n_cols
            axes[row, col].axis('off')

        plt.tight_layout()
        st.pyplot(fig)

with tab3:
    st.subheader("Performance Distributions")

    if len(selected_stats) == 0:
        st.info("Please select at least one statistic to display")
    else:
        # Create distribution plots
        n_stats = len(selected_stats)
        n_cols = 2
        n_rows = (n_stats + n_cols - 1) // n_cols

        fig, axes = plt.subplots(n_rows, n_cols,
                                figsize=(15, 5*n_rows))
        if n_rows == 1:
            axes = axes.reshape(1, -1)

        colors = ['skyblue', 'coral', 'lightgreen', 'lavender',
                 'wheat', 'lightpink', 'lightcyan']

        for idx, stat in enumerate(selected_stats):
            row = idx // n_cols
            col = idx % n_cols
            ax = axes[row, col]

            if stat in ['HR', 'RBI', 'SB']:
                # Bar plot for counting stats
                ax.bar(player_data['Year'], player_data[stat],
                      color=colors[idx % len(colors)],
                      edgecolor='black', alpha=0.7)
                ax.set_xlabel("Season", fontsize=12)
                ax.set_ylabel(stat, fontsize=12)
            else:
                # Histogram for rate stats
                ax.hist(player_data[stat], bins=8,
                       color=colors[idx % len(colors)],
                       edgecolor='black', alpha=0.7)
                ax.axvline(player_data[stat].mean(),
                          color='red', linestyle='--',
                          linewidth=2, label='Mean')
                ax.set_xlabel(stat, fontsize=12)
                ax.set_ylabel("Frequency", fontsize=12)
                ax.legend()

            ax.set_title(f"{stat} Distribution",
                        fontsize=14, fontweight='bold')
            ax.grid(True, alpha=0.3, axis='y')

        # Hide extra subplots
        for idx in range(n_stats, n_rows * n_cols):
            row = idx // n_cols
            col = idx % n_cols
            axes[row, col].axis('off')

        plt.tight_layout()
        st.pyplot(fig)

with tab4:
    st.subheader("Career Summary")

    col1, col2 = st.columns([1, 1])

    with col1:
        st.markdown("### Cumulative Statistics")

        summary_data = {
            'Statistic': [
                'Seasons Played',
                'Total Games',
                'Total At Bats',
                'Total Hits',
                'Total Home Runs',
                'Total RBIs',
                'Total Stolen Bases',
                'Total Walks'
            ],
            'Value': [
                len(player_data),
                f"{player_data['G'].sum():,}",
                f"{player_data['AB'].sum():,}",
                f"{player_data['H'].sum():,}",
                f"{player_data['HR'].sum():,}",
                f"{player_data['RBI'].sum():,}",
                f"{player_data['SB'].sum():,}",
                f"{player_data['BB'].sum():,}"
            ]
        }

        st.dataframe(
            pd.DataFrame(summary_data),
            use_container_width=True,
            hide_index=True
        )

    with col2:
        st.markdown("### Career Averages")

        avg_data = {
            'Statistic': [
                'Career Batting Average',
                'Career OBP',
                'Career SLG',
                'Career OPS',
                'Avg HR per Season',
                'Avg RBI per Season',
                'Best HR Season',
                'Best BA Season'
            ],
            'Value': [
                f"{career_ba:.3f}",
                f"{career_obp:.3f}",
                f"{player_data['H'].sum() / player_data['AB'].sum():.3f}",
                f"{career_obp + (player_data['H'].sum() / player_data['AB'].sum()):.3f}",
                f"{player_data['HR'].mean():.1f}",
                f"{player_data['RBI'].mean():.1f}",
                f"{player_data['HR'].max()} ({player_data.loc[player_data['HR'].idxmax(), 'Year']})",
                f"{player_data['BA'].max():.3f} ({player_data.loc[player_data['BA'].idxmax(), 'Year']})"
            ]
        }

        st.dataframe(
            pd.DataFrame(avg_data),
            use_container_width=True,
            hide_index=True
        )

    # Best season
    st.markdown("### Best Season (by OPS)")
    best_season = player_data.loc[player_data['OPS'].idxmax()]

    cols = st.columns(6)
    cols[0].metric("Year", int(best_season['Year']))
    cols[1].metric("BA", f"{best_season['BA']:.3f}")
    cols[2].metric("HR", int(best_season['HR']))
    cols[3].metric("RBI", int(best_season['RBI']))
    cols[4].metric("OPS", f"{best_season['OPS']:.3f}")
    cols[5].metric("Games", int(best_season['G']))

# Footer
st.markdown("---")
st.caption("MLB Analytics Dashboard | Data: 2015-2022 | Built with Streamlit")

12.3 Dash by Plotly

Dash, created by Plotly, is Python's answer to Shiny's level of control. While Streamlit optimizes for simplicity, Dash provides fine-grained control over layout and callbacks, making it ideal for complex enterprise applications.

Dash applications use React.js under the hood, with Python callbacks handling server-side logic. This architecture provides the responsiveness of modern web applications while keeping logic in Python.

# dash_example.py - MLB Batting Average Calculator
import dash
from dash import dcc, html, Input, Output
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# Initialize app
app = dash.Dash(__name__)

# Layout
app.layout = html.Div([
    html.H1("MLB Batting Average Calculator",
            style={'textAlign': 'center', 'color': '#2c3e50'}),

    html.Div([
        html.Div([
            html.Label("At Bats:"),
            dcc.Slider(
                id='at-bats-slider',
                min=100,
                max=700,
                step=10,
                value=500,
                marks={i: str(i) for i in range(100, 701, 100)},
                tooltip={"placement": "bottom", "always_visible": True}
            )
        ], style={'width': '45%', 'display': 'inline-block', 'padding': '20px'}),

        html.Div([
            html.Label("Hits:"),
            dcc.Slider(
                id='hits-slider',
                min=0,
                max=250,
                step=5,
                value=150,
                marks={i: str(i) for i in range(0, 251, 50)},
                tooltip={"placement": "bottom", "always_visible": True}
            )
        ], style={'width': '45%', 'display': 'inline-block', 'padding': '20px'})
    ]),

    html.Div([
        html.H2(id='ba-display', style={'textAlign': 'center', 'color': '#27ae60'})
    ], style={'padding': '20px'}),

    html.Div([
        dcc.Graph(id='ba-comparison')
    ], style={'padding': '20px'})
])

# Callbacks
@app.callback(
    [Output('ba-display', 'children'),
     Output('ba-comparison', 'figure')],
    [Input('at-bats-slider', 'value'),
     Input('hits-slider', 'value')]
)
def update_output(at_bats, hits):
    # Calculate batting average
    ba = hits / at_bats if at_bats > 0 else 0

    # Display text
    ba_text = f"Batting Average: {ba:.3f}"

    # Create comparison chart
    league_averages = {
        'Excellent': 0.300,
        'Good': 0.270,
        'Average': 0.250,
        'Below Average': 0.230,
        'Poor': 0.200
    }

    fig = go.Figure()

    # Add league average bars
    fig.add_trace(go.Bar(
        x=list(league_averages.keys()),
        y=list(league_averages.values()),
        name='League Standards',
        marker_color='lightblue'
    ))

    # Add user's BA as a line
    fig.add_trace(go.Scatter(
        x=list(league_averages.keys()),
        y=[ba] * len(league_averages),
        mode='lines',
        name='Your BA',
        line=dict(color='red', width=3, dash='dash')
    ))

    fig.update_layout(
        title='Batting Average Comparison',
        xaxis_title='Category',
        yaxis_title='Batting Average',
        yaxis_range=[0, 0.35],
        hovermode='x unified',
        template='plotly_white'
    )

    return ba_text, fig

# Run server
if __name__ == '__main__':
    app.run_server(debug=True, port=8050)

Run this with python dash_example.py and visit http://localhost:8050.

Dash's architecture differs from both Shiny and Streamlit. The layout explicitly defines all UI components using Python objects (html.Div, dcc.Slider, dcc.Graph). Callbacks use decorators to specify inputs and outputs, with function arguments matching the input order.

The @app.callback decorator creates a reactive relationship: whenever at-bats-slider or hits-slider changes, the update_output function runs, returning new values for both ba-display and ba-comparison. This explicit callback system provides clarity about what triggers what, though it requires more boilerplate than Streamlit.

Dash excels with Plotly visualizations—the dcc.Graph component provides rich interactivity including hover tooltips, zooming, and panning built-in. For applications heavy on interactive visualizations, Dash's tight Plotly integration is a major advantage.

When to Choose Dash:


  • You need production-grade applications with fine-grained control

  • Your visualizations are primarily Plotly charts

  • You're building for enterprise deployment with authentication requirements

  • You want React.js-like component architecture

  • You need to embed in larger web applications

Dash requires more code than Streamlit but provides more control than Streamlit and stays in Python unlike Shiny. For MLB analytics applications requiring sophisticated interactive visualizations with corporate deployment requirements, Dash is often the best choice.

Python
# dash_example.py - MLB Batting Average Calculator
import dash
from dash import dcc, html, Input, Output
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# Initialize app
app = dash.Dash(__name__)

# Layout
app.layout = html.Div([
    html.H1("MLB Batting Average Calculator",
            style={'textAlign': 'center', 'color': '#2c3e50'}),

    html.Div([
        html.Div([
            html.Label("At Bats:"),
            dcc.Slider(
                id='at-bats-slider',
                min=100,
                max=700,
                step=10,
                value=500,
                marks={i: str(i) for i in range(100, 701, 100)},
                tooltip={"placement": "bottom", "always_visible": True}
            )
        ], style={'width': '45%', 'display': 'inline-block', 'padding': '20px'}),

        html.Div([
            html.Label("Hits:"),
            dcc.Slider(
                id='hits-slider',
                min=0,
                max=250,
                step=5,
                value=150,
                marks={i: str(i) for i in range(0, 251, 50)},
                tooltip={"placement": "bottom", "always_visible": True}
            )
        ], style={'width': '45%', 'display': 'inline-block', 'padding': '20px'})
    ]),

    html.Div([
        html.H2(id='ba-display', style={'textAlign': 'center', 'color': '#27ae60'})
    ], style={'padding': '20px'}),

    html.Div([
        dcc.Graph(id='ba-comparison')
    ], style={'padding': '20px'})
])

# Callbacks
@app.callback(
    [Output('ba-display', 'children'),
     Output('ba-comparison', 'figure')],
    [Input('at-bats-slider', 'value'),
     Input('hits-slider', 'value')]
)
def update_output(at_bats, hits):
    # Calculate batting average
    ba = hits / at_bats if at_bats > 0 else 0

    # Display text
    ba_text = f"Batting Average: {ba:.3f}"

    # Create comparison chart
    league_averages = {
        'Excellent': 0.300,
        'Good': 0.270,
        'Average': 0.250,
        'Below Average': 0.230,
        'Poor': 0.200
    }

    fig = go.Figure()

    # Add league average bars
    fig.add_trace(go.Bar(
        x=list(league_averages.keys()),
        y=list(league_averages.values()),
        name='League Standards',
        marker_color='lightblue'
    ))

    # Add user's BA as a line
    fig.add_trace(go.Scatter(
        x=list(league_averages.keys()),
        y=[ba] * len(league_averages),
        mode='lines',
        name='Your BA',
        line=dict(color='red', width=3, dash='dash')
    ))

    fig.update_layout(
        title='Batting Average Comparison',
        xaxis_title='Category',
        yaxis_title='Batting Average',
        yaxis_range=[0, 0.35],
        hovermode='x unified',
        template='plotly_white'
    )

    return ba_text, fig

# Run server
if __name__ == '__main__':
    app.run_server(debug=True, port=8050)

12.4 Deployment Options

Building an application is only half the journey—deployment makes it accessible to users. Each framework offers multiple deployment paths with different trade-offs between cost, ease, and control.

12.4.1 shinyapps.io

Posit's shinyapps.io provides the easiest deployment path for Shiny applications. The free tier allows 5 applications with 25 active hours per month—sufficient for portfolios and small projects.

Deployment Steps:

  1. Install the rsconnect package:
install.packages('rsconnect')
  1. Create a free account at https://www.shinyapps.io/
  1. Configure your account (find these values in your shinyapps.io dashboard):
library(rsconnect)
rsconnect::setAccountInfo(
  name='your-account-name',
  token='your-token',
  secret='your-secret'
)
  1. Deploy your app:
# From within your app directory
library(rsconnect)
deployApp()

The deployApp() function automatically packages your app, uploads it to shinyapps.io, and provides a URL like https://your-account.shinyapps.io/your-app-name/.

Important Considerations:

  • Dependencies: Ensure your app.R or ui.R/server.R files load all required packages. shinyapps.io automatically installs packages, but explicit library() calls are required.
  • Data Files: If your app loads external data, include those files in the same directory. The deployment process uploads everything in the app directory.
  • File Paths: Use relative paths, not absolute paths. Instead of read.csv("C:/data/batting.csv"), use read.csv("batting.csv") and place the file in your app directory.
  • Free Tier Limits:
  • 5 applications maximum
  • 25 active hours per month (when someone is using the app)
  • Apps sleep after 15 minutes of inactivity
  • Limited memory (1 GB)

For serious projects, paid tiers ($9-$149/month) increase limits and add features like authentication and custom domains.

12.4.2 Streamlit Cloud

Streamlit Cloud provides free hosting for Streamlit applications with GitHub integration. It's even simpler than shinyapps.io and works well for portfolios.

Deployment Steps:

  1. Push your Streamlit app to a GitHub repository:
git init
git add streamlit_app.py
git add requirements.txt  # List of Python packages
git commit -m "Initial commit"
git push origin main
  1. Visit https://streamlit.io/cloud and sign in with GitHub
  1. Click "New app" and select:
  • Your GitHub repository
  • The branch (usually main)
  • The main file path (e.g., streamlit_app.py)
  1. Click "Deploy"

Streamlit Cloud automatically detects your requirements.txt file and installs dependencies. Your app is live at https://share.streamlit.io/your-username/your-repo/main/streamlit_app.py.

Creating requirements.txt:

Your app needs a requirements.txt file listing all dependencies:

streamlit==1.28.0
pandas==2.0.3
numpy==1.24.3
matplotlib==3.7.2
seaborn==0.12.2
pybaseball==2.2.5

Generate this automatically:

pip freeze > requirements.txt

Or create manually with only the packages your app actually uses (recommended—keeps deployments faster and more reliable).

Important Considerations:

  • Free Tier: Unlimited public apps, 1 private app, shared resources
  • GitHub Integration: Changes pushed to GitHub automatically redeploy
  • Secrets Management: Store API keys and passwords using Streamlit's secrets management (not in code!)
  • Resource Limits: Free tier has CPU and memory limits; complex apps may need optimization

Streamlit Cloud is ideal for portfolio projects, team dashboards, and public analytics tools. For private enterprise applications, Streamlit also offers paid enterprise hosting.

12.4.3 Heroku / Cloud Providers

For production applications requiring more control, major cloud providers (Heroku, AWS, Google Cloud, Azure) offer robust hosting with scalability and custom configurations.

Heroku Deployment (simplified example for Streamlit):

  1. Install Heroku CLI and create account at https://heroku.com
  1. Create required files:

Procfile:

web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0

setup.sh:

mkdir -p ~/.streamlit/

echo "\
[server]\n\
headless = true\n\
port = $PORT\n\
enableCORS = false\n\
\n\
" > ~/.streamlit/config.toml

requirements.txt (as before)

  1. Deploy:
heroku login
heroku create your-app-name
git push heroku main
heroku open

Docker Deployment (works for all frameworks):

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501

CMD ["streamlit", "run", "streamlit_app.py"]

Build and run:

docker build -t mlb-dashboard .
docker run -p 8501:8501 mlb-dashboard

Deploy to any cloud provider supporting Docker (AWS ECS, Google Cloud Run, Azure Container Instances).

Cloud Provider Comparison:

ProviderComplexityCostScalabilityBest For
shinyapps.ioLowFree-$149/moLimitedShiny apps, portfolios
Streamlit CloudLowFree-$250/moModerateStreamlit apps, teams
HerokuModerate$7-$250/moGoodQuick deployment, full control
AWS/GCP/AzureHighVariableExcellentEnterprise, custom needs
Self-hostedHighHardware onlyFull controlComplete control needed

For MLB analytics applications, start with the framework-specific platforms (shinyapps.io or Streamlit Cloud) for simplicity. Graduate to cloud providers when you need custom domains, authentication, integration with other services, or guaranteed uptime SLAs.

R
install.packages('rsconnect')
R
library(rsconnect)
rsconnect::setAccountInfo(
  name='your-account-name',
  token='your-token',
  secret='your-secret'
)
R
# From within your app directory
library(rsconnect)
deployApp()
Bash
git init
git add streamlit_app.py
git add requirements.txt  # List of Python packages
git commit -m "Initial commit"
git push origin main
R
streamlit==1.28.0
pandas==2.0.3
numpy==1.24.3
matplotlib==3.7.2
seaborn==0.12.2
pybaseball==2.2.5
Bash
pip freeze > requirements.txt
R
web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0
Bash
mkdir -p ~/.streamlit/

echo "\
[server]\n\
headless = true\n\
port = $PORT\n\
enableCORS = false\n\
\n\
" > ~/.streamlit/config.toml
Bash
heroku login
heroku create your-app-name
git push heroku main
heroku open
Bash
docker build -t mlb-dashboard .
docker run -p 8501:8501 mlb-dashboard

12.5 Exercises

These exercises challenge you to build complete interactive applications, combining skills from throughout this chapter and earlier chapters.

Exercise 1: Build a Team Comparison Application

Create an interactive application (your choice of Shiny or Streamlit) that allows users to compare two MLB teams across multiple seasons.

Requirements:


  • Allow selection of two teams from dropdown menus

  • Allow selection of year range (e.g., 2015-2022)

  • Display comparison visualizations:

  • Win-loss records over time (line plot)

  • Head-to-head records if applicable

  • Key statistics comparison (team batting average, ERA, home runs)

  • Radar chart or parallel coordinates plot comparing multiple metrics

  • Include summary statistics for each team

  • Allow users to toggle between different visualization types

  • Add export functionality (download plot or data)

Data Source: Use the Lahman database (R: Lahman package, Python: pybaseball or download from https://www.seanlahman.com/baseball-archive/statistics/)

Bonus Challenges:


  • Add playoff appearance indicators

  • Include player roster comparison (top performers)

  • Calculate and display team-level advanced metrics (runs created, pythagorean win expectation)

  • Add animation showing how teams' performance changed season by season

This exercise tests your ability to structure complex filtering logic, create multiple coordinated visualizations, and build intuitive user interfaces.

Exercise 2: Create a Live Dashboard with Real-Time Data

Build a dashboard that displays live or recent MLB data, updating automatically.

Requirements:


  • Fetch recent MLB data (use MLB's Stats API or Baseball Reference)

  • Display today's games with current scores (or most recent game day)

  • Show player leaderboards (batting average, home runs, ERA, etc.)

  • Update automatically every 5 minutes (or on button press)

  • Include filters for:

  • Minimum plate appearances/innings pitched

  • Specific teams or divisions

  • Date ranges

  • Visualize trends over the current season

  • Highlight notable performances (hot streaks, milestone achievements)

Data Source:


  • Python: statsapi package (pip install MLB-StatsAPI)

  • R: Use httr or jsonlite to query MLB Stats API directly

  • API documentation: https://github.com/toddrob99/MLB-StatsAPI

Example API Usage (Python):

import statsapi

# Get today's games
games = statsapi.schedule(date='2024-08-15')

# Get player statistics
player_stats = statsapi.player_stat_data(playerID, 'hitting', 'season')

Bonus Challenges:


  • Add push notifications when interesting events occur (no-hitter in progress, player approaching milestone)

  • Create a "game tracker" showing play-by-play for selected game

  • Add predictive elements (win probability graphs using historical data)

  • Cache data appropriately to avoid overwhelming the API

This exercise develops skills in working with external APIs, handling real-time data updates, and managing application state over time.

Exercise 3: Deploy Your Application and Add Advanced Features

Take one of your previous applications (from Exercise 1, 2, or your own creation) and deploy it publicly, adding production-ready features.

Requirements:

Deployment:


  • Deploy to shinyapps.io, Streamlit Cloud, or Heroku

  • Ensure all dependencies are properly specified

  • Verify the app works in the deployed environment

  • Get a custom URL if possible (e.g., mlb-stats-yourname.streamlit.app)

Professional Features:


  • Add comprehensive documentation (About page, usage instructions)

  • Implement error handling (what happens if data fails to load?)

  • Add loading indicators for slow operations

  • Include data source attribution and last-updated timestamps

  • Optimize performance (caching, efficient queries)

  • Make the UI responsive (works on mobile devices)

  • Add a contact/feedback mechanism

Advanced Features (choose at least 2):


  • User authentication (allow users to save preferences)

  • Database integration (store user queries, preferences, or custom player lists)

  • Email report functionality (send daily/weekly summaries)

  • Social sharing (share specific views or insights)

  • A/B testing different UI layouts

  • Analytics tracking (which features do users use most?)

Portfolio Integration:


  • Create a project page describing the app

  • Write up design decisions and challenges encountered

  • Include screenshots and usage examples

  • Share on LinkedIn, Twitter/X, or personal website

Bonus Challenges:


  • Implement continuous deployment (GitHub pushes automatically deploy)

  • Add automated testing (unit tests for data processing functions)

  • Create multiple language support (English, Spanish, Japanese)

  • Build a REST API alongside the UI for programmatic access

This exercise simulates real-world application development, where functionality is only part of the challenge—reliability, usability, and deployment matter equally.


Chapter Summary

Interactive applications transform static analyses into explorable tools. This chapter equipped you with three powerful frameworks—Shiny for R, Streamlit for Python, and Dash for advanced control—each with different strengths for different use cases.

Shiny's reactive programming model provides fine-grained control and sophisticated state management, making it ideal for complex applications where you need precise control over what updates when. Its mature ecosystem and integration with R's statistical packages make it the natural choice for R-based analytics teams.

Streamlit's simplicity and rapid development cycle make it perfect for prototyping and internal dashboards. By treating apps as simple Python scripts that rerun on interaction, Streamlit eliminates boilerplate and lets you focus on analysis rather than application architecture. Its growing popularity and active development community ensure continued improvement and expanding capabilities.

Dash occupies a middle ground, providing React.js-like control while staying in Python. Its explicit callback system and tight Plotly integration make it ideal for visualization-heavy applications requiring production-grade quality and deployment.

Deployment platforms like shinyapps.io and Streamlit Cloud democratize access to your work, letting you share analyses with colleagues, showcase skills to employers, or build public tools advancing baseball understanding. Understanding deployment options and their trade-offs enables you to choose appropriate hosting for each application's requirements.

The exercises challenge you to build complete, deployable applications—the kind of portfolio projects that demonstrate both technical skill and understanding of user needs. Interactive applications are increasingly essential in baseball analytics, whether you're presenting findings to front office staff, creating tools for coaches and scouts, or building public analysis platforms.

As you continue developing interactive applications, remember that the best tools balance functionality with usability. Technical sophistication means little if users can't understand or navigate your application. Design with your audience in mind, test with real users, iterate based on feedback, and continuously refine based on how people actually interact with your tools.

The next chapter explores advanced analytics topics, building on both the analytical methods from earlier chapters and the interactive presentation skills from this chapter. You're now equipped to not just perform sophisticated analyses, but to make those analyses accessible and actionable through well-designed interactive applications.

Python
import statsapi

# Get today's games
games = statsapi.schedule(date='2024-08-15')

# Get player statistics
player_stats = statsapi.player_stat_data(playerID, 'hitting', 'season')

Chapter Summary

In this chapter, you learned about building interactive applications. Key topics covered:

  • Introduction to Shiny (R)
  • Streamlit (Python)
  • Dash by Plotly
  • Deployment Options
  • Exercises