Weights And Biases — W&B: log ML experiments, sweeps, model registry, dashboards
Weights And Biases
Section titled “Weights And Biases”W&B: log ML experiments, sweeps, model registry, dashboards.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/mlops/evaluation/weights-and-biases |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | wandb |
| Tags | MLOps, Weights And Biases, WandB, Experiment Tracking, Hyperparameter Tuning, Model Registry, Collaboration, Real-Time Visualization, PyTorch, TensorFlow, HuggingFace |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”Info The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Weights & Biases: ML Experiment Tracking & MLOps
Section titled “Weights & Biases: ML Experiment Tracking & MLOps”When to Use This Skill
Section titled “When to Use This Skill”Use Weights & Biases (W&B) when you need to:
- Track ML experiments with automatic metric logging
- Visualize training in real-time dashboards
- Compare runs across hyperparameters and configurations
- Optimize hyperparameters with automated sweeps
- Manage model registry with versioning and lineage
- Collaborate on ML projects with team workspaces
- Track artifacts (datasets, models, code) with lineage
Users: 200,000+ ML practitioners | GitHub Stars: 10.5k+ | Integrations: 100+
Installation
Section titled “Installation”# Install W&Bpip install wandb
# Login (creates API key)wandb login
# Or set API key programmaticallyexport WANDB_API_KEY=your_api_key_hereQuick Start
Section titled “Quick Start”Basic Experiment Tracking
Section titled “Basic Experiment Tracking”import wandb
# Initialize a runrun = wandb.init( project="my-project", config={ "learning_rate": 0.001, "epochs": 10, "batch_size": 32, "architecture": "ResNet50" })
# Training loopfor epoch in range(run.config.epochs): # Your training code train_loss = train_epoch() val_loss = validate()
# Log metrics wandb.log({ "epoch": epoch, "train/loss": train_loss, "val/loss": val_loss, "train/accuracy": train_acc, "val/accuracy": val_acc })
# Finish the runwandb.finish()With PyTorch
Section titled “With PyTorch”import torchimport wandb
# Initializewandb.init(project="pytorch-demo", config={ "lr": 0.001, "epochs": 10})
# Access configconfig = wandb.config
# Training loopfor epoch in range(config.epochs): for batch_idx, (data, target) in enumerate(train_loader): # Forward pass output = model(data) loss = criterion(output, target)
# Backward pass optimizer.zero_grad() loss.backward() optimizer.step()
# Log every 100 batches if batch_idx % 100 == 0: wandb.log({ "loss": loss.item(), "epoch": epoch, "batch": batch_idx })
# Save modeltorch.save(model.state_dict(), "model.pth")wandb.save("model.pth") # Upload to W&B
wandb.finish()Core Concepts
Section titled “Core Concepts”1. Projects and Runs
Section titled “1. Projects and Runs”Project: Collection of related experiments Run: Single execution of your training script
# Create/use projectrun = wandb.init( project="image-classification", name="resnet50-experiment-1", # Optional run name tags=["baseline", "resnet"], # Organize with tags notes="First baseline run" # Add notes)
# Each run has unique IDprint(f"Run ID: {run.id}")print(f"Run URL: {run.url}")2. Configuration Tracking
Section titled “2. Configuration Tracking”Track hyperparameters automatically:
config = { # Model architecture "model": "ResNet50", "pretrained": True,
# Training params "learning_rate": 0.001, "batch_size": 32, "epochs": 50, "optimizer": "Adam",
# Data params "dataset": "ImageNet", "augmentation": "standard"}
wandb.init(project="my-project", config=config)
# Access config during traininglr = wandb.config.learning_ratebatch_size = wandb.config.batch_size3. Metric Logging
Section titled “3. Metric Logging”# Log scalarswandb.log({"loss": 0.5, "accuracy": 0.92})
# Log multiple metricswandb.log({ "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, "learning_rate": current_lr, "epoch": epoch})
# Log with custom x-axiswandb.log({"loss": loss}, step=global_step)
# Log media (images, audio, video)wandb.log({"examples": [wandb.Image(img) for img in images]})
# Log histogramswandb.log({"gradients": wandb.Histogram(gradients)})
# Log tablestable = wandb.Table(columns=["id", "prediction", "ground_truth"])wandb.log({"predictions": table})4. Model Checkpointing
Section titled “4. Model Checkpointing”import torchimport wandb
# Save model checkpointcheckpoint = { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss,}
torch.save(checkpoint, 'checkpoint.pth')
# Upload to W&Bwandb.save('checkpoint.pth')
# Or use Artifacts (recommended)artifact = wandb.Artifact('model', type='model')artifact.add_file('checkpoint.pth')wandb.log_artifact(artifact)Hyperparameter Sweeps
Section titled “Hyperparameter Sweeps”Automatically search for optimal hyperparameters.
Define Sweep Configuration
Section titled “Define Sweep Configuration”sweep_config = { 'method': 'bayes', # or 'grid', 'random' 'metric': { 'name': 'val/accuracy', 'goal': 'maximize' }, 'parameters': { 'learning_rate': { 'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1 }, 'batch_size': { 'values': [16, 32, 64, 128] }, 'optimizer': { 'values': ['adam', 'sgd', 'rmsprop'] }, 'dropout': { 'distribution': 'uniform', 'min': 0.1, 'max': 0.5 } }}
# Initialize sweepsweep_id = wandb.sweep(sweep_config, project="my-project")Define Training Function
Section titled “Define Training Function”def train(): # Initialize run run = wandb.init()
# Access sweep parameters lr = wandb.config.learning_rate batch_size = wandb.config.batch_size optimizer_name = wandb.config.optimizer
# Build model with sweep config model = build_model(wandb.config) optimizer = get_optimizer(optimizer_name, lr)
# Training loop for epoch in range(NUM_EPOCHS): train_loss = train_epoch(model, optimizer, batch_size) val_acc = validate(model)
# Log metrics wandb.log({ "train/loss": train_loss, "val/accuracy": val_acc })
# Run sweepwandb.agent(sweep_id, function=train, count=50) # Run 50 trialsSweep Strategies
Section titled “Sweep Strategies”# Grid search - exhaustivesweep_config = { 'method': 'grid', 'parameters': { 'lr': {'values': [0.001, 0.01, 0.1]}, 'batch_size': {'values': [16, 32, 64]} }}
# Random searchsweep_config = { 'method': 'random', 'parameters': { 'lr': {'distribution': 'uniform', 'min': 0.0001, 'max': 0.1}, 'dropout': {'distribution': 'uniform', 'min': 0.1, 'max': 0.5} }}
# Bayesian optimization (recommended)sweep_config = { 'method': 'bayes', 'metric': {'name': 'val/loss', 'goal': 'minimize'}, 'parameters': { 'lr': {'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1} }}Artifacts
Section titled “Artifacts”Track datasets, models, and other files with lineage.
Log Artifacts
Section titled “Log Artifacts”# Create artifactartifact = wandb.Artifact( name='training-dataset', type='dataset', description='ImageNet training split', metadata={'size': '1.2M images', 'split': 'train'})
# Add filesartifact.add_file('data/train.csv')artifact.add_dir('data/images/')
# Log artifactwandb.log_artifact(artifact)Use Artifacts
Section titled “Use Artifacts”# Download and use artifactrun = wandb.init(project="my-project")
# Download artifactartifact = run.use_artifact('training-dataset:latest')artifact_dir = artifact.download()
# Use the datadata = load_data(f"{artifact_dir}/train.csv")Model Registry
Section titled “Model Registry”# Log model as artifactmodel_artifact = wandb.Artifact( name='resnet50-model', type='model', metadata={'architecture': 'ResNet50', 'accuracy': 0.95})
model_artifact.add_file('model.pth')wandb.log_artifact(model_artifact, aliases=['best', 'production'])
# Link to model registryrun.link_artifact(model_artifact, 'model-registry/production-models')Integration Examples
Section titled “Integration Examples”HuggingFace Transformers
Section titled “HuggingFace Transformers”from transformers import Trainer, TrainingArgumentsimport wandb
# Initialize W&Bwandb.init(project="hf-transformers")
# Training arguments with W&Btraining_args = TrainingArguments( output_dir="./results", report_to="wandb", # Enable W&B logging run_name="bert-finetuning", logging_steps=100, save_steps=500)
# Trainer automatically logs to W&Btrainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset)
trainer.train()PyTorch Lightning
Section titled “PyTorch Lightning”from pytorch_lightning import Trainerfrom pytorch_lightning.loggers import WandbLoggerimport wandb
# Create W&B loggerwandb_logger = WandbLogger( project="lightning-demo", log_model=True # Log model checkpoints)
# Use with Trainertrainer = Trainer( logger=wandb_logger, max_epochs=10)
trainer.fit(model, datamodule=dm)Keras/TensorFlow
Section titled “Keras/TensorFlow”import wandbfrom wandb.keras import WandbCallback
# Initializewandb.init(project="keras-demo")
# Add callbackmodel.fit( x_train, y_train, validation_data=(x_val, y_val), epochs=10, callbacks=[WandbCallback()] # Auto-logs metrics)Visualization & Analysis
Section titled “Visualization & Analysis”Custom Charts
Section titled “Custom Charts”# Log custom visualizationsimport matplotlib.pyplot as plt
fig, ax = plt.subplots()ax.plot(x, y)wandb.log({"custom_plot": wandb.Image(fig)})
# Log confusion matrixwandb.log({"conf_mat": wandb.plot.confusion_matrix( probs=None, y_true=ground_truth, preds=predictions, class_names=class_names)})Reports
Section titled “Reports”Create shareable reports in W&B UI:
- Combine runs, charts, and text
- Markdown support
- Embeddable visualizations
- Team collaboration
Best Practices
Section titled “Best Practices”1. Organize with Tags and Groups
Section titled “1. Organize with Tags and Groups”wandb.init( project="my-project", tags=["baseline", "resnet50", "imagenet"], group="resnet-experiments", # Group related runs job_type="train" # Type of job)2. Log Everything Relevant
Section titled “2. Log Everything Relevant”# Log system metricswandb.log({ "gpu/util": gpu_utilization, "gpu/memory": gpu_memory_used, "cpu/util": cpu_utilization})
# Log code versionwandb.log({"git_commit": git_commit_hash})
# Log data splitswandb.log({ "data/train_size": len(train_dataset), "data/val_size": len(val_dataset)})3. Use Descriptive Names
Section titled “3. Use Descriptive Names”# ✅ Good: Descriptive run nameswandb.init( project="nlp-classification", name="bert-base-lr0.001-bs32-epoch10")
# ❌ Bad: Generic nameswandb.init(project="nlp", name="run1")4. Save Important Artifacts
Section titled “4. Save Important Artifacts”# Save final modelartifact = wandb.Artifact('final-model', type='model')artifact.add_file('model.pth')wandb.log_artifact(artifact)
# Save predictions for analysispredictions_table = wandb.Table( columns=["id", "input", "prediction", "ground_truth"], data=predictions_data)wandb.log({"predictions": predictions_table})5. Use Offline Mode for Unstable Connections
Section titled “5. Use Offline Mode for Unstable Connections”import os
# Enable offline modeos.environ["WANDB_MODE"] = "offline"
wandb.init(project="my-project")# ... your code ...
# Sync later# wandb sync <run_directory>Team Collaboration
Section titled “Team Collaboration”Share Runs
Section titled “Share Runs”# Runs are automatically shareable via URLrun = wandb.init(project="team-project")print(f"Share this URL: {run.url}")Team Projects
Section titled “Team Projects”- Create team account at wandb.ai
- Add team members
- Set project visibility (private/public)
- Use team-level artifacts and model registry
Pricing
Section titled “Pricing”- Free: Unlimited public projects, 100GB storage
- Academic: Free for students/researchers
- Teams: $50/seat/month, private projects, unlimited storage
- Enterprise: Custom pricing, on-prem options
Resources
Section titled “Resources”- Documentation: https://docs.wandb.ai
- GitHub: https://github.com/wandb/wandb (10.5k+ stars)
- Examples: https://github.com/wandb/examples
- Community: https://wandb.ai/community
- Discord: https://wandb.me/discord
See Also
Section titled “See Also”references/sweeps.md- Comprehensive hyperparameter optimization guidereferences/artifacts.md- Data and model versioning patternsreferences/integrations.md- Framework-specific examples