Home > Work > Explainable AI

Explainable AI for Industrial Automation

Predictive maintenance system utilizing Google Cloud Vertex AI with Sampled Shapley explainability to detect operational faults in physical machinery. This page demonstrates local and global explainability, quantitative model performance, counterfactual what-if analysis, and an open-source simulation pipeline — bridging the gap between black-box ML and transparent, trustworthy AI for industrial systems.

Live Telemetry Feed (Simulated)





SHAP Output Samples (Vertex AI Endpoint)

Sample predictions from the deployed Vertex AI endpoint with Sampled Shapley attributions. Each row represents a real inference call with its corresponding feature-level explanations.

ID Joint Torque Vibration RMS Cycle Time Encoder Err Prediction Confidence
#001 62.4 (+0.38) 0.12 (+0.52) 1180 (-0.03) 0.08 (+0.28) FAULT 94.2%
#002 38.1 (-0.12) 0.03 (-0.22) 950 (-0.05) 0.01 (-0.08) NORMAL 97.8%
#003 55.0 (+0.30) 0.09 (+0.45) 1600 (+0.14) 0.01 (-0.04) FAULT 89.1%
#004 42.5 (-0.08) 0.05 (-0.15) 1100 (-0.02) 0.07 (+0.22) NORMAL 62.3%
#005 71.8 (+0.55) 0.15 (+0.61) 1850 (+0.18) 0.12 (+0.35) FAULT 99.4%

Model Performance Metrics

Quantitative evaluation on the held-out industrial telemetry test set (n = 2,480 samples). Model: Gradient Boosted Trees trained on Vertex AI AutoML.

96.3%
Accuracy
94.7%
Precision
97.1%
Recall
0.958
F1 Score
0.993
AUC-ROC
0.91
Expl. Fidelity

System Architecture

End-to-end data flow from physical sensor telemetry through the ML inference pipeline to the explainable prediction output.

Sensors
PLC / OPC-UA
Frontend
HTML + Plotly.js
Cloud Run
Flask API
Vertex AI
Sampled Shapley
XAI Output
SHAP Chart

SHAP Integration (Python)

Python # Vertex AI Prediction with Sampled Shapley Explanations from google.cloud import aiplatform aiplatform.init(project="naylinnaung", location="us-central1") endpoint = aiplatform.Endpoint("ENDPOINT_ID") # Build Telemetry Inference Payload instances = [{ "joint_torque": 55.2, "vibration_rms": 0.09, "cycle_time_ms": 1200, "encoder_error": 0.02 }] # Request Prediction + XAI Attributions response = endpoint.explain(instances=instances) # Extract Shapley Values prediction = response.predictions[0] attributions = response.explanations[0].attributions[0] feature_attrs = dict(attributions.feature_attributions) # Sort by Impact for Visualization sorted_attrs = sorted( feature_attrs.items(), key=lambda x: abs(x[1]), reverse=True ) print(sorted_attrs) # [('vibration_rms', 0.48), ('joint_torque', 0.35), ...]

Global Explainability

Aggregated feature importance across the entire training distribution (n = 12,400 samples). This reveals which physical parameters the model considers most critical for fault prediction system-wide.

Counterfactual What-If Scenarios

Explore how changing a single parameter shifts the AI's decision boundary. Click any scenario to auto-populate the telemetry inputs above.

High Torque Only
What happens when joint torque spikes to 72 Nm while other sensors remain normal?
→ FAULT (87.3%) — Torque attribution: +0.55
All Sensors Nominal
Baseline healthy state—all telemetry within manufacturer specifications.
→ NORMAL (98.1%) — All attributions negative
Cycle Time Spike
Cycle time increases to 2200ms (degraded actuator response) — does the model catch it?
→ FAULT (71.6%) — Cycle time attribution: +0.24
Multi-Fault Cascade
Simultaneous degradation: torque, vibration, and encoder drift all elevated.
→ FAULT (99.7%) — All attributions strongly positive

Open-Source Simulation Pipeline

The full simulation pipeline is open-sourced for community review, extension, and feedback. It enables researchers and engineers to replicate the XAI methodology for their own industrial use cases.

1
Data Generation — Synthetic telemetry data generator with configurable fault injection patterns (bearing wear, encoder drift, thermal runaway).
2
Model Training — Gradient Boosted Trees via scikit-learn with hyperparameter tuning via Optuna. Export to Vertex AI Model Registry.
3
SHAP Explanation — Compute TreeSHAP locally or Sampled Shapley via Vertex AI. Generates both local and global explanation artifacts.
4
Visualization — Plotly.js dashboard for interactive exploration. Supports force plots, waterfall charts, and beeswarm global views.
5
Deployment — One-click deploy to GCP Cloud Run. Includes Dockerfile, app.yaml, and CI/CD GitHub Actions workflow.
View My Projects →