LLM Concepts for Industrial
Automation and Robotics
A deep technical guide to Large Language Models — architecture, training, deployment, and real-world applications in SCADA, PLC, IIoT, collaborative robotics, and autonomous control systems.
Tokenization & Embeddings
How LLMs read industrial text data
What is Tokenization?
Tokenization splits raw text into subword units called tokens. LLMs like GPT-4 use Byte Pair Encoding (BPE) — a vocabulary of ~50,000 subwords. For industrial text:
FAULT_JOINT_3→[FAULT] [_JOINT] [_3]E-STOP TRIGGERED→[E] [-] [STOP] [TRIG] [GERED]PLC_ALARM_072→[PLC] [_ALARM] [_07] [2]
Embeddings in Industrial Context
Each token is mapped to a high-dimensional vector (e.g., 768D or 4096D). Semantically similar SCADA alarms cluster together in embedding space:
- "Motor Overheat" ≈ "Thermal Runaway" (nearby vectors)
- "E-Stop" ≠ "Speed Limit" (distant vectors)
- Used for semantic search over maintenance logs
For vocabulary \(V\), each token \(t_i \in V\) is mapped by embedding matrix \(W_E \in \mathbb{R}^{|V| \times d}\):
\[ \mathbf{e}_i = W_E[t_i] \in \mathbb{R}^d \]
where \(d\) is the embedding dimension (e.g., 768 for GPT-2, 4096 for LLaMA-3). Positional encoding \(\mathbf{p}_i\) is added: \(\mathbf{x}_i = \mathbf{e}_i + \mathbf{p}_i\)
Interactive Tokenizer Demo (BPE Simulation)
Transformer Architecture
The core engine of every modern LLM
Multi-Head Attention
Runs \(h\) parallel attention heads, each learning different relationship patterns between tokens — e.g., one head may track joint dependencies, another detects temporal fault sequences.
Add & LayerNorm
Residual connections prevent vanishing gradients in deep networks (96 layers in GPT-4). LayerNorm stabilizes training across variable-length industrial sequences.
Feed-Forward MLP
A 2-layer MLP with GELU activation expands dimensionality 4× then compresses back. This is where factual knowledge (e.g., PLC error codes) is believed to be stored.
import torch
import torch.nn as nn
import torch.nn.functional as F
class TransformerBlock(nn.Module):
"""Single Transformer block for industrial sequence modeling."""
def __init__(self, d_model=512, n_heads=8, d_ff=2048, dropout=0.1):
super().__init__()
self.attn = nn.MultiheadAttention(d_model, n_heads, dropout=dropout, batch_first=True)
self.ff = nn.Sequential(
nn.Linear(d_model, d_ff),
nn.GELU(),
nn.Linear(d_ff, d_model),
nn.Dropout(dropout)
)
self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x, mask=None):
# Self-Attention + Residual
attn_out, _ = self.attn(x, x, x, attn_mask=mask)
x = self.norm1(x + self.dropout(attn_out))
# Feed-Forward + Residual
x = self.norm2(x + self.ff(x))
return x
# Example: model robot joint torque sequences (batch=4, seq_len=32, features=512)
model = TransformerBlock(d_model=512, n_heads=8)
robot_seq = torch.randn(4, 32, 512) # 4 robots, 32 timesteps, 512 features
output = model(robot_seq) # Shape: [4, 32, 512]
print(f"Output shape: {output.shape}") # torch.Size([4, 32, 512])
Self-Attention Mechanism
How LLMs relate every token to every other token
\[ \text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right) V \]
Q (Query) — "What am I looking for?" (current sensor reading)
K (Key) — "What do I offer?" (historical timestep label)
V (Value) — "What information do I carry?" (actual sensor value)
Live Attention Heatmap — Robot Fault Sequence
Simulated attention weights for a cobot fault detection sequence. Brighter = stronger attention.
Pretraining
Building world knowledge from massive corpora
Causal Language Modeling (CLM)
The model predicts the next token given all previous tokens. Trained on trillions of tokens — including technical manuals, ISO standards, and engineering documentation. Loss function:
\[ \mathcal{L} = -\sum_{t=1}^{T} \log P(x_t \mid x_1, \ldots, x_{t-1}) \]
Scale & Parameters
- GPT-4 — ~1.8 trillion parameters (est.)
- LLaMA 3.1 405B — 405B params, open source
- Gemini 2.0 Ultra — multimodal, vision + text
- DeepSeek-R1 — reasoning-optimized, 671B MoE
Fine-Tuning (LoRA / RLHF)
Specializing LLMs for industrial domains
Full Fine-Tuning
Updates all model weights on domain data (e.g., your company's SCADA logs + maintenance records). Requires significant GPU resources. Best for production safety-critical systems.
LoRA (Low-Rank Adaptation)
Injects trainable rank-decomposition matrices into attention layers. Trains only ~0.1% of parameters. Ideal for fine-tuning on cobot fault datasets with limited GPU budget:
\[ W' = W + \Delta W = W + BA \]
where \(B \in \mathbb{R}^{d \times r},\ A \in \mathbb{R}^{r \times k},\ r \ll d\)
RLHF
Reinforcement Learning from Human Feedback aligns LLM outputs with expert preferences. Used to train models to give safe, conservative responses in safety-critical automation contexts — preventing hallucinated control commands.
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
# Load base model
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
# LoRA configuration for industrial fine-tuning
lora_config = LoraConfig(
task_type = TaskType.CAUSAL_LM,
r = 16, # Rank — higher = more capacity
lora_alpha = 32, # Scaling factor
lora_dropout = 0.05,
target_modules = ["q_proj", "v_proj"] # Attention layers only
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 8,388,608 || all params: 8,038,572,032 || trainable%: 0.10%
# Load your industrial dataset
dataset = load_dataset("json", data_files={
"train": "data/scada_fault_train.jsonl",
"eval": "data/scada_fault_eval.jsonl"
})
# Format: {"prompt": "ALARM: Motor_03 Overcurrent", "completion": "Root cause: ..."}
training_args = TrainingArguments(
output_dir = "./cobot-llm-lora",
num_train_epochs = 3,
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
learning_rate = 2e-4,
fp16 = True,
logging_steps = 50,
save_steps = 200,
evaluation_strategy = "steps",
eval_steps = 200,
)
trainer = Trainer(
model = model,
args = training_args,
train_dataset = dataset["train"],
eval_dataset = dataset["eval"],
)
trainer.train()
model.save_pretrained("./cobot-llm-finetuned")
Retrieval-Augmented Generation (RAG)
Grounding LLMs in real industrial knowledge
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama
# 1. Load industrial documentation (PLC manuals, maintenance logs, ISO standards)
loader = DirectoryLoader("./industrial_docs/", glob="**/*.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
chunks = splitter.split_documents(docs)
# 2. Embed and store in vector database
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_plc_db")
# 3. Build RAG chain with local LLM (runs on-premises — no cloud data leakage)
llm = Ollama(model="llama3.1:8b", temperature=0.1) # Low temp for factual answers
qa_chain = RetrievalQA.from_chain_type(
llm = llm,
retriever = vectordb.as_retriever(search_kwargs={"k": 5}),
chain_type = "stuff",
return_source_documents = True
)
# 4. Query the system
result = qa_chain.invoke({
"query": "What is the troubleshooting procedure for Allen-Bradley E-Stop fault code F07?"
})
print(result["result"])
print("\nSources:", [d.metadata["source"] for d in result["source_documents"]])
Agentic AI & Tool Use
LLMs that plan, act, and control systems autonomously
ReAct Pattern (Reason + Act)
The LLM alternates between Thought → Action → Observation loops. Applied to robot control:
- Thought: "Joint 3 torque exceeded threshold"
- Action: Call
read_sensor("joint_3_torque") - Observation: "87.3 Nm — 15% above nominal"
- Action: Call
reduce_speed(axis=3, factor=0.8)
ROSGPT Framework
Integrates GPT-4 with ROS2 (Robot Operating System 2), enabling conversion of natural language commands directly into robotic control commands. Example pipeline:
- "Move arm to pick position" → ROS2 MoveIt trajectory
- "Slow down on approach" → velocity controller update
- "Report joint states" → topic subscriber query
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain import hub
# Define industrial tools the LLM agent can call
@tool
def read_plc_tag(tag_name: str) -> str:
"""Read a real-time value from PLC tag via OPC-UA. Args: tag_name (str)"""
# Connect to Kepware OPC-UA server
from opcua import Client
client = Client("opc.tcp://kepware-server:4840")
client.connect()
node = client.get_node(f"ns=2;s=PLC.{tag_name}")
value = node.get_value()
client.disconnect()
return f"{tag_name} = {value}"
@tool
def trigger_alarm(alarm_id: str, message: str) -> str:
"""Trigger a SCADA alarm with given ID and message."""
# POST to SCADA REST API (e.g., Ignition Gateway)
import requests
resp = requests.post("http://ignition-gateway/api/alarm",
json={"id": alarm_id, "message": message, "priority": "High"})
return f"Alarm {alarm_id} triggered: {resp.status_code}"
@tool
def get_robot_joint_state(robot_id: str) -> str:
"""Get current joint positions and velocities of a collaborative robot."""
import requests
resp = requests.get(f"http://robot-api/v1/robots/{robot_id}/joints")
return resp.json()
# Build the LLM agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [read_plc_tag, trigger_alarm, get_robot_joint_state]
agent = create_react_agent(llm, tools, hub.pull("hwchase17/react"))
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=6)
# Run autonomous industrial task
result = agent_executor.invoke({
"input": "Check if Robot_01 joint 3 torque is safe. If over 80 Nm, trigger alarm FAULT_J3 and log the current state."
})
print(result["output"])
Industrial Applications Matrix
LLM concepts mapped to real automation systems
Industrial LLM Deployment
On-premises vs cloud vs edge strategies
Cloud (Google Vertex AI)
Best for: non-real-time analytics, SCADA report generation, maintenance scheduling.
- Gemini 2.0 Flash via API
- Vertex AI Pipelines for batch inference
- Latency: 500ms–2s
On-Premises
Best for: safety-critical systems, data sovereignty, air-gapped plants.
- Ollama + LLaMA 3.1 8B / 70B
- vLLM inference server
- Latency: 100–500ms (GPU)
Edge (Cobot / PLC)
Best for: real-time control decisions, offline environments, robot-side inference.
- Phi-3 Mini / Gemma 2B quantized
- ONNX Runtime on NVIDIA Jetson
- Latency: <50ms