fix: correct progress bar total when using gradient accumulation with max_steps (#1227)
When training with accumulate_grad_batches > 1 and max_steps, the
default TQDMProgressBar overflows past 100% because its total is
computed in optimizer steps while on_train_batch_end fires on every
forward pass.
GradAccumProgressBar multiplies total_train_batches by
accumulate_grad_batches so the total matches the actual number of
forward passes, keeping the bar accurate throughout training.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>