Case Study – Manufacturing Dispatching
This demo shows Optimate AI application for manufacturing dispatching for a simplified real-world problem.
In this problem, Optimate AI learned the dispatching algorithm in hundreds of simulations of the production process. Optimate AI found an optimal dispatching policy that satisfies problem structure (e.g., which operations are feasible) and minimizes total delay (tardiness) of all inbound orders.
Manufacturing dispatching is the type of scheduling routine when a manager faces inbound production tasks (orders) and asked to assign each order on a specific machine (or a sequential set of operations on different machines in case of complex production).
Dispatching is distinguished from planning. In manufacturing planning, managers often care about long-term machine utilization (usually referred to OEE – Overall equipment effectiveness) because it related both to OPEX (operational expenses) and ROI (Return of Investment) in your machines.
While in dispatching, managers mostly care about orders tardiness (lateness), but of course, they also try to maximize utilization as much as possible.
The critical struggle which defines dispatching complexity is uncertainty. In real-life everything is a subject of uncertainty - deviations in production time, machine failures, personnel's sick, urgent unplanned orders, etc.
Demo Factory has 4 types of different cutting machines operated by computer programs (also known as CNC machines):
3-axis milling machine (5 machines)
6-axis milling machine (5 machines)
3-axis plasma cutting machine (3 machines)
6-axis laser cutting machine (2 machines)
Every machine's type has different characteristics in terms of surface complexity, precision, and materials they can cut. E.g., 6-axis CNC can carve more complex geometry than 3-axis, and laser cutting technology much more precise than milling.
Our Demo Factory produces 4 types of machines parts (products), say:
Body A (complex geometry)
Body B (simple geometry)
Plate A (high precision)
Plate B (made of extra-strong alloy)
And here is a mapping of products on machines (where a particular product type might be produced) and processing time (in minutes) for feasible machine-product combinations.
Not surprisingly, some machines cannot produce specific products, and various machine types have different processing time on the same product.
The goal of the dispatching process is to minimize the makespan - total production time of all inbound orders. This time also includes the wait time of orders (before it started to producing).
The demo shows the simulation of the production process where dispatching controls by Optimate AI.
The dashboard shows the following information:
Gantt chart that shows the utilization of every machine over time
Task buffer. Number of products of each type that need to be produced
Tasks left. The total amount of tasks (orders) over time
Machines. What type of task is assigned on each machine at the moment
Available resources. The total amount of idle machines over time
How does it work
Dispatching is performed by Reinforcement Learning Agent. The Agent learned in thousands of simulations of the manufacturing process.
In this case, Agent trained used classical Actor Advantage Critic Algorithm (A2C), but Actor and Critic neural networks have different inputs.
The Actor estimates the overall utility of assignment particular order on a specific machine. So for every feasible "order-machine" pair, it returns a number that characterizes its merit.
Input (state) of Actor is simple in our case:
Operation time to produce a required part (for considered order-machine pair)
Time to order's deadline (Order deadline minus current time)
Total queue size
Queue size of similar orders (same product to produce)
Number of idle machines that can process this order
The Actor has 1 fully-connected layer neural net (64-1) with ReLU activation.
Selection of feasible actions
Actor Network and its input
Critic estimates the state's value, which is used in the calculation of Actor's policy gradients in the learning phase.
In our case, the Critic uses different state representation. It's 15 (number of machines) x 20 (max queue length for every type of product) matrix. Each cell of the matrix is filled with time to order finish (deadline for particular order minus current time and minus order completion time).
Critic has 2 fully-connected layers (128-128-1) with ReLU activation. The Critic also has an additional fully-connected layer to estimate NOOP action's utility (do nothing).
Critic Network and its input
The reward is
r = - |deadline_time - operation_fininsh_time|
So the Agent tries to minimize order delay, but it penalizes for finishing the order too early. Ideally, Agent should produce the schedule that finishes the operation exactly on its deadline.
How learning works. At every step of the simulation, the environment calculates feasible actions ("order-machine" pairs) and sends them to the Actor. The Actor estimates the value of every pair. The Critic adds the value of NOOP action, and we end up with a vector of action values.
We apply softmax to choose an action. In a training phase, we sample action from probabilities (for an exploration). In a runtime phase, we apply argmax to select the most valuable action (assignment order on a particular machine or doing nothing).
Selecting optimal action
The Agent trained in about 10,000 simulation episodes; each episode is about 1,000 simulation steps long.
The simulation is discrete-event, so the 'step' is possible whether the new order comes or it's manufacturing finishes.
In comparison to baseline policy (assigning order with the closest deadline on a machine to minimize order's tardiness) Agent improved the target metric by 14% (from avg -3,960 to avg -3,445 per episode) even on such a small task!
Agent's performance in comparison to baseline heuristic