LLM Concepts for Industrial Automation and Robotics

Tokenization & Embeddings

How LLMs read industrial text data

Concept 01

What is Tokenization?

Tokenization splits raw text into subword units called tokens. LLMs like GPT-4 use Byte Pair Encoding (BPE) — a vocabulary of ~50,000 subwords. For industrial text:

FAULT_JOINT_3 → [FAULT] [_JOINT] [_3]
E-STOP TRIGGERED → [E] [-] [STOP] [TRIG] [GERED]
PLC_ALARM_072 → [PLC] [_ALARM] [_07] [2]

Embeddings in Industrial Context

Each token is mapped to a high-dimensional vector (e.g., 768D or 4096D). Semantically similar SCADA alarms cluster together in embedding space:

"Motor Overheat" ≈ "Thermal Runaway" (nearby vectors)
"E-Stop" ≠ "Speed Limit" (distant vectors)
Used for semantic search over maintenance logs

Mathematical Definition

For vocabulary \(V\), each token \(t_i \in V\) is mapped by embedding matrix \(W_E \in \mathbb{R}^{|V| \times d}\):

\[ \mathbf{e}_i = W_E[t_i] \in \mathbb{R}^d \]

where \(d\) is the embedding dimension (e.g., 768 for GPT-2, 4096 for LLaMA-3). Positional encoding \(\mathbf{p}_i\) is added: \(\mathbf{x}_i = \mathbf{e}_i + \mathbf{p}_i\)

Interactive Tokenizer Demo (BPE Simulation)

Transformer Architecture

The core engine of every modern LLM

Concept 02

Input Tokens

→

Embeddings + Positional Enc.

→

Multi-Head Attention

→

Add & Norm

→

Feed-Forward MLP

→

NRepeat N Layers

→

Output Logits

Multi-Head Attention

Runs \(h\) parallel attention heads, each learning different relationship patterns between tokens — e.g., one head may track joint dependencies, another detects temporal fault sequences.

Add & LayerNorm

Residual connections prevent vanishing gradients in deep networks (96 layers in GPT-4). LayerNorm stabilizes training across variable-length industrial sequences.

Feed-Forward MLP

A 2-layer MLP with GELU activation expands dimensionality 4× then compresses back. This is where factual knowledge (e.g., PLC error codes) is believed to be stored.

Python — Minimal Transformer Block (PyTorch)

import torch
import torch.nn as nn
import torch.nn.functional as F

class TransformerBlock(nn.Module):
    """Single Transformer block for industrial sequence modeling."""
    def __init__(self, d_model=512, n_heads=8, d_ff=2048, dropout=0.1):
        super().__init__()
        self.attn    = nn.MultiheadAttention(d_model, n_heads, dropout=dropout, batch_first=True)
        self.ff      = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.GELU(),
            nn.Linear(d_ff, d_model),
            nn.Dropout(dropout)
        )
        self.norm1   = nn.LayerNorm(d_model)
        self.norm2   = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask=None):
        # Self-Attention + Residual
        attn_out, _ = self.attn(x, x, x, attn_mask=mask)
        x = self.norm1(x + self.dropout(attn_out))
        # Feed-Forward + Residual
        x = self.norm2(x + self.ff(x))
        return x

# Example: model robot joint torque sequences (batch=4, seq_len=32, features=512)
model = TransformerBlock(d_model=512, n_heads=8)
robot_seq = torch.randn(4, 32, 512)      # 4 robots, 32 timesteps, 512 features
output    = model(robot_seq)             # Shape: [4, 32, 512]
print(f"Output shape: {output.shape}")  # torch.Size([4, 32, 512])

Self-Attention Mechanism

How LLMs relate every token to every other token

Concept 03

Scaled Dot-Product Attention

\[ \text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right) V \]

Q (Query) — "What am I looking for?" (current sensor reading)
K (Key) — "What do I offer?" (historical timestep label)
V (Value) — "What information do I carry?" (actual sensor value)

Live Attention Heatmap — Robot Fault Sequence

Simulated attention weights for a cobot fault detection sequence. Brighter = stronger attention.

Pretraining

Building world knowledge from massive corpora

Concept 04

Causal Language Modeling (CLM)

The model predicts the next token given all previous tokens. Trained on trillions of tokens — including technical manuals, ISO standards, and engineering documentation. Loss function:

\[ \mathcal{L} = -\sum_{t=1}^{T} \log P(x_t \mid x_1, \ldots, x_{t-1}) \]

Scale & Parameters

GPT-4 — ~1.8 trillion parameters (est.)
LLaMA 3.1 405B — 405B params, open source
Gemini 2.0 Ultra — multimodal, vision + text
DeepSeek-R1 — reasoning-optimized, 671B MoE

Fine-Tuning (LoRA / RLHF)

Specializing LLMs for industrial domains

Concept 05

Full Fine-Tuning

Updates all model weights on domain data (e.g., your company's SCADA logs + maintenance records). Requires significant GPU resources. Best for production safety-critical systems.

LoRA (Low-Rank Adaptation)

Injects trainable rank-decomposition matrices into attention layers. Trains only ~0.1% of parameters. Ideal for fine-tuning on cobot fault datasets with limited GPU budget:

\[ W' = W + \Delta W = W + BA \]

where \(B \in \mathbb{R}^{d \times r},\ A \in \mathbb{R}^{r \times k},\ r \ll d\)

RLHF

Reinforcement Learning from Human Feedback aligns LLM outputs with expert preferences. Used to train models to give safe, conservative responses in safety-critical automation contexts — preventing hallucinated control commands.

Python — LoRA Fine-Tuning on Industrial SCADA Data (HuggingFace PEFT)

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset

# Load base model
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)

# LoRA configuration for industrial fine-tuning
lora_config = LoraConfig(
    task_type    = TaskType.CAUSAL_LM,
    r            = 16,          # Rank — higher = more capacity
    lora_alpha   = 32,          # Scaling factor
    lora_dropout = 0.05,
    target_modules = ["q_proj", "v_proj"]  # Attention layers only
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 8,388,608 || all params: 8,038,572,032 || trainable%: 0.10%

# Load your industrial dataset
dataset = load_dataset("json", data_files={
    "train": "data/scada_fault_train.jsonl",
    "eval":  "data/scada_fault_eval.jsonl"
})
# Format: {"prompt": "ALARM: Motor_03 Overcurrent", "completion": "Root cause: ..."}

training_args = TrainingArguments(
    output_dir          = "./cobot-llm-lora",
    num_train_epochs    = 3,
    per_device_train_batch_size = 4,
    gradient_accumulation_steps = 4,
    learning_rate       = 2e-4,
    fp16                = True,
    logging_steps       = 50,
    save_steps          = 200,
    evaluation_strategy = "steps",
    eval_steps          = 200,
)

trainer = Trainer(
    model           = model,
    args            = training_args,
    train_dataset   = dataset["train"],
    eval_dataset    = dataset["eval"],
)
trainer.train()
model.save_pretrained("./cobot-llm-finetuned")

Retrieval-Augmented Generation (RAG)

Grounding LLMs in real industrial knowledge

Concept 06

User Query

→

Embed Query

→

Vector DB Search

→

Retrieve Top-K Docs

→

LLM + Context

→

Grounded Answer

Python — RAG for PLC/HMI Maintenance Assistant (LangChain + ChromaDB)

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings  import HuggingFaceEmbeddings
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama

# 1. Load industrial documentation (PLC manuals, maintenance logs, ISO standards)
loader   = DirectoryLoader("./industrial_docs/", glob="**/*.pdf")
docs     = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
chunks   = splitter.split_documents(docs)

# 2. Embed and store in vector database
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb   = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_plc_db")

# 3. Build RAG chain with local LLM (runs on-premises — no cloud data leakage)
llm = Ollama(model="llama3.1:8b", temperature=0.1)  # Low temp for factual answers

qa_chain = RetrievalQA.from_chain_type(
    llm       = llm,
    retriever = vectordb.as_retriever(search_kwargs={"k": 5}),
    chain_type = "stuff",
    return_source_documents = True
)

# 4. Query the system
result = qa_chain.invoke({
    "query": "What is the troubleshooting procedure for Allen-Bradley E-Stop fault code F07?"
})
print(result["result"])
print("\nSources:", [d.metadata["source"] for d in result["source_documents"]])

Agentic AI & Tool Use

LLMs that plan, act, and control systems autonomously

Concept 07

ReAct Pattern (Reason + Act)

The LLM alternates between Thought → Action → Observation loops. Applied to robot control:

Thought: "Joint 3 torque exceeded threshold"
Action: Call read_sensor("joint_3_torque")
Observation: "87.3 Nm — 15% above nominal"
Action: Call reduce_speed(axis=3, factor=0.8)

ROSGPT Framework

Integrates GPT-4 with ROS2 (Robot Operating System 2), enabling conversion of natural language commands directly into robotic control commands. Example pipeline:

"Move arm to pick position" → ROS2 MoveIt trajectory
"Slow down on approach" → velocity controller update
"Report joint states" → topic subscriber query

Python — LLM Agent with Industrial Tool Use (LangChain)

from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain import hub

# Define industrial tools the LLM agent can call
@tool
def read_plc_tag(tag_name: str) -> str:
    """Read a real-time value from PLC tag via OPC-UA. Args: tag_name (str)"""
    # Connect to Kepware OPC-UA server
    from opcua import Client
    client = Client("opc.tcp://kepware-server:4840")
    client.connect()
    node  = client.get_node(f"ns=2;s=PLC.{tag_name}")
    value = node.get_value()
    client.disconnect()
    return f"{tag_name} = {value}"

@tool
def trigger_alarm(alarm_id: str, message: str) -> str:
    """Trigger a SCADA alarm with given ID and message."""
    # POST to SCADA REST API (e.g., Ignition Gateway)
    import requests
    resp = requests.post("http://ignition-gateway/api/alarm",
                         json={"id": alarm_id, "message": message, "priority": "High"})
    return f"Alarm {alarm_id} triggered: {resp.status_code}"

@tool
def get_robot_joint_state(robot_id: str) -> str:
    """Get current joint positions and velocities of a collaborative robot."""
    import requests
    resp = requests.get(f"http://robot-api/v1/robots/{robot_id}/joints")
    return resp.json()

# Build the LLM agent
llm   = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [read_plc_tag, trigger_alarm, get_robot_joint_state]
agent = create_react_agent(llm, tools, hub.pull("hwchase17/react"))

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=6)

# Run autonomous industrial task
result = agent_executor.invoke({
    "input": "Check if Robot_01 joint 3 torque is safe. If over 80 Nm, trigger alarm FAULT_J3 and log the current state."
})
print(result["output"])

Industrial Applications Matrix

LLM concepts mapped to real automation systems

Concept 08

Industrial LLM Deployment

On-premises vs cloud vs edge strategies

Concept 09

Cloud (Google Vertex AI)

Best for: non-real-time analytics, SCADA report generation, maintenance scheduling.

Gemini 2.0 Flash via API
Vertex AI Pipelines for batch inference
Latency: 500ms–2s

On-Premises

Best for: safety-critical systems, data sovereignty, air-gapped plants.

Ollama + LLaMA 3.1 8B / 70B
vLLM inference server
Latency: 100–500ms (GPU)

Edge (Cobot / PLC)

Best for: real-time control decisions, offline environments, robot-side inference.

Phi-3 Mini / Gemma 2B quantized
ONNX Runtime on NVIDIA Jetson
Latency: <50ms

LLM Concepts for IndustrialAutomation and Robotics

Tokenization & Embeddings

What is Tokenization?

Embeddings in Industrial Context

Interactive Tokenizer Demo (BPE Simulation)

Transformer Architecture

Multi-Head Attention

Add & LayerNorm

Feed-Forward MLP

Self-Attention Mechanism

Live Attention Heatmap — Robot Fault Sequence

Pretraining

Causal Language Modeling (CLM)

Scale & Parameters

Fine-Tuning (LoRA / RLHF)

Full Fine-Tuning

LoRA (Low-Rank Adaptation)

RLHF

Retrieval-Augmented Generation (RAG)

Agentic AI & Tool Use

ReAct Pattern (Reason + Act)

ROSGPT Framework

Industrial Applications Matrix

Industrial LLM Deployment

Cloud (Google Vertex AI)

On-Premises

Edge (Cobot / PLC)

LLM Concepts for Industrial
Automation and Robotics