In traditional manufacturing, quality testing comes at the end. The part is produced, inspected - and then it becomes clear whether it fits or not. If not: rejects, rework, complaint. The reaction to a quality problem is always retrospective: something has gone wrong, now we analyze why.
Predictive quality reverses this logic. Instead of checking whether a part is good after production, it is predicted during production whether a part will be good - based on process parameters, machine data and historical samples. Quality inspection shifts from final inspection to process control.
That sounds like a dream of the future. But it is not. The underlying technologies - machine learning, process data integration, real-time monitoring - are mature. The limiting factor is not the technology, but the database: if you want to introduce predictive quality, you don't need AI specialists, you first need complete, linked process data.
This article explains what predictive quality actually means, what data you need, how typical ML workflows work and where the method is already delivering measurable results in practice.
THE MOST IMPORTANT FACTS IN BRIEFPredictive quality forecasts the quality of a part during production - not afterwards. It is based on process parameters, machine data and historical quality results, which are condensed into a probability of error using ML models. The critical success factor is not the ML model, but the database. Predictive quality requires complete, time-synchronized data: Process parameters per serial number, quality characteristic per serial number, time stamp machine-set. Without this link, there is no basis for training. Typical use cases are inline defect prediction, adaptive inspection planning and early warning of process drift. The greatest effects arise in the case of high inspection costs (destructive testing, laboratory analyses) or high scrap costs (expensive components, late defect detection). Getting started is easier than expected: an initial proof of concept with one characteristic, 3-6 months of historical data and a random forest model delivers initial results in 2-4 weeks - without a data science team. |
IN A NUTSHELLPredictive quality is not a replacement for quality inspections - it is an advance of them. The aim is not to abolish inspections, but to detect errors before they occur. The key question with predictive quality is: Which process parameter constellation is most likely to lead to a quality problem? The answer is in your data - if it is linked. ML models are only as good as their training data. A perfect algorithm on bad data delivers bad predictions. Data quality is the bottleneck, not model complexity. |
Predictive quality is a data-driven approach that predicts the likely quality of a product before or during production - based on process parameters, machine conditions and historical patterns.
| Traditional quality inspection | Predictive quality | |
|---|---|---|
| Point in time | After production (final inspection) | During production (inline prediction) |
| Question | Is this part good? | Will this part be good? |
| Data basis | Inspection result of the individual part | Process parameters + historical patterns |
| Reaction time | Reactive: Defect has already occurred | Proactive: Intervention before defect occurs |
| Inspection scope | 100 % or random sample - independent of risk | Risk-based: more testing with high error probability |
1. process parameters as quality indicators
Every quality characteristic - dimensional accuracy, strength, surface quality - is influenced by process parameters: temperature, pressure, speed, feed rate, cycle time, tool condition. Predictive Quality identifies which parameter combinations have historically led to quality problems.
2. machine learning for pattern recognition
ML models recognize complex, non-linear relationships between process parameters and quality results - relationships that are not visible to humans. A model learns from historical data: "If temperature > X and pressure < Y and tool age > Z, then probability of error = 73%."
3. inline integration for real-time forecasting
The trained model is integrated into the production line. At each production step, the current process parameters are recorded, transferred to the model and a probability of error is calculated - in real time, while the part is still in the line.
| 72 % | Reduction of undetected errors through inline prediction |
| Ø 34 % | Reduction in inspection effort through risk-based inspection planning |
| 2,3× | ROI in the first year with successful implementation |
| 68 % | Predictive quality projects fail due to data quality, not ML |
Predictive quality is not the only approach to quality assurance during production. The differentiation from other methods clarifies when which approach makes sense.
| SPC (Statistical Process Control) | Inline testing | Predictive quality | |
|---|---|---|---|
| Basic principle | Statistical monitoring of process parameters on control charts | 100% inspection of every part during production | ML-based forecast of the probability of defects |
| Data basis | Process parameter samples | Inspection result of each part | Process parameters + historical quality results |
| Detection logic | Rule-based: Intervention limits, trend rules | Measurement: Actual value vs. tolerance | Pattern-based: Parameter constellations |
| What is recognized? | Process drift, outliers | Defective parts (after creation) | Risk of error (before occurrence) |
| Typical use | Process monitoring, early warning | Quality assurance, reject sorting | Adaptive inspection planning, process control |
SPC is useful when:
Inline testing is useful if:
Predictive quality makes sense when:
The approaches are not mutually exclusive - they complement each other. Predictive quality can build on SPC data and control inline testing based on risk.
The most common reason for failed predictive quality projects is not an incorrect ML model, but missing or unlinked data. Before talking about algorithms, the database must be checked.
| Data category | Examples | Source | Critical requirement |
|---|---|---|---|
| Process parameters | Temperature, pressure, speed, feed rate, cycle time, tool age | PLC, MES, sensors | Available per serial number or cycle |
| Quality data | Test result (OK/NOT OK), measured value, error code, Cpk | QMS, test bench, laboratory | Linked per serial number |
| Context data | Shift, operator, batch, tool ID, environmental conditions | MES, ERP, manual | Consistently recorded, can be referenced |
Predictive quality learns from the question: "Which process parameters produced which quality result?"
This question can only be answered if the process parameters and quality result can be assigned to the same unit. This requires a continuous key - typically the serial number or a combination of batch + cycle number.
Problem in practice: In many companies, process parameters are stored in the MES, inspection results in the QMS and batch information in the ERP - without a common key. The data exists, but it cannot be linked.
| dimension | Minimum requirement | Check question |
|---|---|---|
| Link | Process parameter + quality characteristic per unit | Can I say for each serial number: these parameters → this result? |
| Completeness | < 5 % missing values for key parameters | How many data points have NULL values for temperature, pressure, etc.? |
| Time stamp | Machine set, ISO 8601, synchronized | Are process timestamp and inspection timestamp consistent? |
| Sample size | At least 500-1,000 units, of which 50-100 are defects | Do we have enough positive examples AND enough negative examples? |
| Labeling | Clear quality classification (O.K./N.K. or measured value) | Is the quality result clearly defined - no ambiguous categories? |
| Depth of history | At least 3-6 months, better 12 months | Does the data set cover seasonal fluctuations, batch changes, tool changes? |
| Data source | Typical content | Suitability for predictive quality |
|---|---|---|
| PLC / control system | Cycle data, target/actual values, machine states | ✓ High - raw data with high resolution |
| MES | Production order, process parameters, time stamp, serial number | ✓ High - structured and linked |
| QMS | Test results, error coding, Cpk values | ✓ High - labels for model training |
| ERP | Batch, supplier, material, order | ○ Medium - context data, rarely real-time |
| Excel / manual | Various, often unstructured | ✗ Low - data quality mostly insufficient |
| Sensor technology (IIoT) | Vibration, temperature, current, acoustics | ✓ High - if integrated into database |
Machine learning sounds complex - but the basic workflow is structured and comprehensible. The following six phases describe how a functioning predictive quality model is created from raw data.
Problem definition and target value
What exactly should the model predict?
Activities:
Output: Documented problem definition with target value, scope and success criteria
Data collection and integration
What data do we need - and how do we get it together?
Activities:
Output: Integrated raw data set with process parameters and quality results per unit
Data cleansing and feature engineering
How do we make the data modelable?
Activities:
Output: Cleaned, feature-engineered data set, split into training and test
Model selection and training
Which model fits - and how good is it?
Activities:
Typical model selection according to data situation:
| Data situation | Recommended model | Justification |
|---|---|---|
| < 1,000 data points, few features | Logistic regression | Robust with little data |
| 1,000-10,000 data points, medium complexity | Random forest | Good balance of accuracy/interpretability |
| > 10,000 data points, many features | XGBoost, LightGBM | Highest accuracy with large data sets |
| Sequential data (time series) | LSTM, Transformer | Takes into account temporal dependencies |
Output: Trained model with documented hyperparameters
Model evaluation and interpretation
How good is the model - and why does it make which decisions?
Activities:
Critical metrics for predictive quality:
| Metric | Meaning | Target value (orientation) |
|---|---|---|
| Recall (Sensitivity) | Percentage of actual errors that were detected | > 85 % |
| Precision | Percentage of risk warnings that were actually errors | > 70 % |
| F1 score | Harmonic mean of precision and recall | > 0,75 |
| AUC-ROC | Total model discriminatory power | > 0,85 |
Output: Evaluated model with documented performance and interpretable results
Deployment and monitoring
How does the model get into production - and does it stay good?
Activities:
Output: Productive model with monitoring and retraining process
Predictive quality is not an end in itself - the benefits arise from specific use cases. The following five use cases show where predictive quality has the greatest leverage in practice.
Inline fault prediction with real-time warning
"This part has a 78% probability of dimensional deviation - check immediately."
Situation: Injection molding line, 12 process parameters per cycle. Dimensional deviation is only detected in the final inspection - 40 cycles later.
Predictive quality approach:
Result:
| Before | After |
|---|---|
| 0.8 % scrap | 0.2 % rejects |
| Error detected after Ø 18 min. | Error detected after Ø 12 sec. |
Adaptive test planning (risk-based)
"Parts with low probability of error: random sample. Parts with high probability: 100%."
Situation: Assembly line, non-destructive final inspection possible, but time-consuming (45 sec./part). Current: 100 % inspection.
Predictive quality approach:
Result:
| Before | After |
|---|---|
| 100 % testing | 38 % Inspection |
| 0 Error slipped through | 0 Defects slipped through |
| Inspection costs 100 % | Inspection costs 41 % |
Replacement for destructive testing
"Instead of destroying the part, we predict the strength from the process parameters."
Situation: Weld seam strength test, destructive. Current: Sample 1 of 50, uncertainty for the other 49.
Predictive quality approach:
Result:
| Before | After |
|---|---|
| 2 % destructively tested | 0.5 % destructively tested |
| 49 of 50 parts without verification | Each part with predicted strength |
| Ø 3 complaints/month | Ø 0.4 complaints/month |
Early warning of process drift
"The process is drifting - in 200 cycles Cpk will fall below 1.33."
Situation: Milling process, Cpk value drops slowly with tool wear. Current: Reaction only at Cpk < 1.33.
Predictive quality approach:
Result:
| Before | After |
|---|---|
| Reaction with Cpk < 1.33 | Warning 3 hours before Cpk < 1.33 |
| Ø 120 parts in the limit range | Ø 8 parts in the limit range |
| Unplanned tool changes | Planned tool changes |
Root cause analysis through feature importance
"The three most important drivers for rejects are Coolant temperature, tool age, shift change break."
Situation: scrap rate fluctuates between 0.3 % and 1.2 % - with no recognizable pattern. 14 process parameters recorded.
Predictive quality approach:
Result:
| Insight | Measure |
|---|---|
| Coolant temperature > 28°C correlates with 3× reject rate | Coolant temperature sensor + warning at > 26°C |
| Tool age > 8,000 cycles: significant increase | Tool change interval reduced to 7,500 |
| First 30 minutes after shift change: increased scrap rate | Start-up procedure with reference part introduced |
Predictive quality is not a technology project - it is a quality improvement project that uses technology. The value is not created by the model, but by the measures that follow from the findings.
- Amadeus Lederle, CTE, CSP Intelligence GmbH
Predictive quality does not require a perfect database - but it does require a sufficient one. The following checklist shows whether your company meets the minimum requirements.
| Dimension | Minimum requirement | ✓ Ready | Not yet ✗ Not yet |
|---|---|---|---|
| Process parameters digital | At least 5 relevant parameters per unit recorded digitally | Parameters are saved automatically for each cycle | Parameters only recorded on paper or not at all |
| Quality results digitally | Test results (O.K./N.K. or measured values) in the system | QMS or MES with structured test reports | Inspection results in paper form or Excel islands |
| Linking | Common key (serial number) for parameters + quality | Serial number used throughout | No clear assignment possible |
| Depth of history | At least 3 months of data, better 6-12 months | Data has been stored for > 6 months | Data regularly deleted or available for < 3 months |
| Proportion of errors | At least 50 documented error cases in the data set | Rejects/NIOs are systematically recorded and categorized | Too few error cases or not categorized |
| Data quality | < 10 % missing values in key fields | Mandatory fields technically enforced, time stamp set by machine | High proportion of missing values, manual input without validation |
| Resources | At least 1 person with process knowledge + data affinity (20% FTE for 3 months) | Quality/process engineer with Excel/SQL skills available | No dedicated resource, only "on the side" |
If one or more dimensions show "Not yet", this is no reason to write off Predictive Quality - it shows where investments need to be made first.
| Gap | First step |
|---|---|
| Process parameters not digital | Set up PLC data export, check MES connection |
| No link | Introduce serial number as a mandatory field in MES and QMS |
| Too little historical data | Start today with systematic recording - the basis will be there in 3-6 months |
| Too few error cases | That's good! Extend scope if necessary (several lines, longer period) |
| Insufficient data quality | Define mandatory fields, check timestamp synchronization |
predictive quality does not have to start as a major project.
The following implementation path is designed for a pragmatic proof of concept - 8-12 weeks, one quality feature, one product.
Define pilot scope
Activities:
Output:
Documented pilot scope with target value and success criteria
Extract and link data
Activities:
Output:
Linked analysis data set
Exploratory data analysis
Activities:
Output:
Analysis report with identified patterns and feature candidates
Train and evaluate model
Activities:
Output:
Trained model with documented performance
Validation on the store floor
Activities:
Output:
Validated model with store floor feedback
Decision and next steps
Activities:
Output: Decision + roadmap
| 8-12 weeks | Typical proof of concept duration |
| 1 FTE × 30-40 % | Typical resource requirements (process engineer + data affinity) |
| Ø 67 % | PoCs leading to productization |
| < € 10,000 | Typical PoC costs (without license costs |
)
Predictive quality is not a panacea. The method has clear limitations - and typical mistakes during implementation can be avoided if you are aware of them.
| Limit | Explanation | Consequence |
|---|---|---|
| Only known failure modes | ML models recognize patterns that occur in the training data - new, unknown failure modes are not predicted | Predictive quality supplements but does not replace systematic FMEA and process control |
| Correlation ≠ causality | A model finds statistical correlations, not physical causes | Feature Importance shows correlation - process understanding is necessary to confirm causality |
| Data quality as a bottleneck | Garbage in = garbage out. Incomplete or incorrectly linked data provides incorrect models | Data quality is a prerequisite, not a by-product |
| Model drift | Processes change (new material, tool change, seasonal effects) - the model is left behind | Continuous monitoring and regular retraining are mandatory |
| False positives/negatives | No model is perfect - there are always false alarms and overlooked errors | Threshold values must be tailored to the use case (more alarms or fewer?) |
| Error | Why it happens | How to avoid it |
|---|---|---|
| Scope too broad | "We want to predict all defects on all lines" | Start with one feature, one line, one product |
| Data integration underestimated | "The data is there" - but not linked | Allow 50% of the PoC time for data integration |
| Black box model without interpretation | Model says "error", but nobody understands why | Feature Importance and SHAP analysis right from the start |
| No store floor involvement | Data scientists build model, process experts are not consulted | Involve process engineers from day 1 |
| One-off training | Model is trained, then never touched again | Define monitoring + retraining cycle |
| Overengineering | "We need deep learning and real-time streaming" | Random forest on clean data beats LSTM on bad data |
PRACTICAL TIP
Predictive quality rarely fails because of the algorithm - but because of a missing or unlinked database. IPM solves precisely this problem: all process parameters, test results and traceability data are stored in an integrated database, with the serial number as the continuous key.
For predictive quality, this means
The result: Data integration - typically 50% of the effort of a predictive quality PoC - is eliminated. Instead of weeks for data collection: start analyzing immediately.
Do we need a data science team for predictive quality?
Not to get started. An initial proof of concept can be carried out by a process engineer with basic Python knowledge - or with low-code ML platforms (DataRobot, Azure AutoML, H2O). The core competence is not ML, but process understanding: Which parameters are relevant? Which data is reliable? What do the results mean? External expertise can be useful for productization and scaling - but the PoC should be driven internally.
How many data points do I need as a minimum?
As a rule of thumb: at least 500-1,000 data points in total, including at least 50-100 error cases (positive class). This becomes difficult with highly unbalanced data (0.1 % rejects) - techniques such as oversampling (SMOTE) or adapted loss functions can help here. More important than the absolute number is the variability in the data: Does the data cover different batches, tools, shifts, seasons?
What is the difference between classification and regression?
Classification is more common for predictive quality - but regression can be more valuable if the specific measured value is important (e.g. replacement for destructive testing).
How do we explain the model decision?
Modern ML models are no longer black boxes. Two approaches:
Interpretability is crucial for acceptance on the store floor and for deriving improvement measures.
How often does the model need to be retrained?
That depends on the process stability. Typical intervals:
Monitoring is more important than a fixed rhythm: if the model performance drops significantly on new data, retraining is necessary - regardless of the calendar.
What does the introduction of predictive quality cost?
The costs vary greatly depending on the scope and data situation. Orientation:
| Phase | Typical costs |
|---|---|
| PoC (one feature, 8-12 weeks) | 5,000-20,000 € (internal, without tools) |
| Productization (one use case) | 20,000-80,000 € (incl. integration) |
| Scaling (several lines/products) | 50,000-200,000 €/year (incl. platform) |
The ROI results from: reduced rejects, lower inspection effort, avoided complaints, earlier defect detection. Typical: 2-3× ROI in the first year of successful implementation.
Can we also use predictive quality for predictive maintenance?
Yes - the basic logic is identical. The difference:
Both use process parameters and historical patterns. The same data is often relevant - and a common data model can serve both use cases.