1. Introduction
Initially, the industry adopted virtual twins in the form of simulation tools that represented the physics of materials, processes, structures, and systems from physics-based models. These computational tools transformed engineering science and technology to offer optimized design tools and became essential in almost all industries at the end of the 20th century.
Despite of the revolution that Simulation Based Engineering—SBE—experienced, some domains resisted to fully assimilate simulation in their practices for different reasons:
Computational issues related to the treatment of too complex material models involved in too complex processes, needing a numerical resolution difficult to attain. Some examples in polymer processing concern reactive extrusion or foaming, among many others.
Modeling issues when addressing materials with poorly known rheologies, as usually encountered in multi-phasic reactive flows where multiple reactions occur.
The extremely multi-parametric space defined by both the material and the process, where the processed material properties and performances strongly depend on several parameters related, for example in the case of reactive extrusion, to the nature of the reactants or the processing parameters, like the flow rate and viscosity, the processing temperature, etc.
In these circumstances, the use of data and the construction of the associated models relating the material and processing parameters to some quantities of interest—QoI, by using advanced artificial intelligence techniques, seems an appealing procedure for improving predictions, enhancing optimization procedures and enabling real-time decision making [
1].
1.1. Data-Driven Modeling
Engineered artificial intelligence—EAI—concerns different data-science functionalities enabling: (i) multidimensional data visualization; (ii) data classification; (iii) modeling the input/output relationship enabling quantitative predictions; (iv) extracting knowledge from data; (v) explaining for certifying, and (vi) creating dynamic data-driven applications systems.
The present work aims at creating a model able to relate material and processing parameters to the processed material properties and performances. For this reason, in that follows, we will focus on the description and use of different strategies for accomplishing that purpose.
In the past, science was based on the extraction of models, these being simply the causal relation linking causes (inputs) and responses (outputs). This (intelligent) extraction or discovery was performed by smart (and trained) human minds from the data provided by the direct observation of the reality or from engineered experimental tests. Then, with the discovered, derived, or postulated model, predictions were performed, leading to the validation or rejection of these models. Thus, physics-based models, often in the form of partial differential equations, were manipulated by using numerical techniques, with the help of powerful computers.
However, sometimes models are not available, or they are not accurate enough. In that case, the most natural route consists of extracting the model from the available data (a number of inputs and their associated outputs). When data are abundant and the time of response is not a constraint, deep-learning could constitute the best alternative. However, some industrial applications are subjected to: (i) scarce data and (ii) necessity of learning on-the-fly under stringent real-time constraints.
Some models, as those encountered in mechanics, are subjected to thermodynamic consistency restrictions. They impose energy conservation and entropy production. In our former works [
2,
3,
4,
5], we proved that such a route constitutes a very valuable framework for deriving robust models able to assimilate available data while fulfilling first principles. However, some models cannot be cast into a physical framework because they involve heterogeneous data, sometimes discrete and even categorical. Imagine for awhile a product performance that depends on four factors: (i) temperature of the oven that produces the part; (ii) process time; (iii) commercial name of the involved material, and (iv) given-name of the employee that processed it. It seems evident that processes, whose representing data-points (implying four dimensions in this particular example) are close to each other, do not imply having similar performances. In that case, prior to employing techniques performing in vector spaces, in general based on metrics and distances, data must be mapped into that vector space. For this purpose, we proposed recently the so-called
Code2Vect [
6] revisited later.
Nonlinear regressions, relating a given output with the set of input parameters, are subjected to a major issue: complexity. In other words, the number of terms of a usual polynomial approximation depends on the number of parameters and the approximation degree. Thus,
D parameters and a degree
Q imply the order of
D power
Q terms and, consequently, the same amount of data are needed to define it. In our recent works, we proposed the so-called sparse Proper Generalized Decomposition, sPGD [
7] able to circumvent the just referred issue. It is based on the use of separated representations with adaptive degree of approximation, defined on unstructured data settings and sparse sensing to extract the most compact representations.
Assuming that the model is expressible as a matrix relating arrays of inputs and outputs (as standard Dynamic Mode Decomposition—DMD—performs with dynamical systems [
8,
9]) both expressible in a low-dimensional space (assumption at the heart of all model order reduction techniques), the rank of the matrix (discrete description of the model) is assumed to be low [
10].
1.2. Reactive Polymers Processing
Reactive extrusion is considered to be an effective tool of continuous polymerization of monomers, chemical modification of polymers and reactive compatibilisation of polymer blends. In particular, co-rotating and contra-rotating twin-screw extruders have proven to be a relevant technical and economical solution for reactive processing of thermoplastic polymers. The literature dedicated to reactive extrusion shows that a very broad spectrum of chemical reactions and polymer systems has been studied [
11,
12,
13,
14,
15].
The many advantages of using the extruder as a chemical reactor can be described as follows: (i) polymerization and/or chemical modifications can be carried out in bulk, in the absence of solvents, the process is fast and continuous (residence time of the order of a few minutes); (ii) if necessary, devolatilization is effective, leading to the rapid removal of residual monomers and/or reaction by-products; and (iii) the screw design is modular, allowing the implementation of complex formulations (fillers, plasticizers, etc.).
However, there are also some disadvantages in using an extruder as a chemical reactor such as: (i) the high viscosity of the molten polymers, which lead to self-heating and therefore to side reactions (thermal degradation for example); (ii) the short residence time which limits reactive extrusion to fast reactions; and (iii) the difficulty of the scale up to industrial pilot and plants.
In terms of modeling and simulation, various strategies [
16] can be considered as it needs to deal with a large number of highly nonlinear and coupled phenomena. Actually, the strategy of modeling depends on the objectives in terms of process understanding, material development from machine design or process optimization, and control. For example, in the case of free radical grafting of polyolefins, a two-phase stochastic model to describe mass transport and kinetics based on reactive processing data was proposed in [
17].
Regarding process optimization, a simple 1D simulation approach provides a global description of the process all along the screws, whereas 3D models allow a more or less accurate description of the flow field in the different full zones of the extruder. However, most of these simulations are based on simplified steady-state 1D models (e.g., Ludovicc
© software [
18]).
Actually, the main processing parameters such as residence time, temperature, and extent of the reaction are assumed homogeneously distributed in any axial cross section. The use of one-dimensional models allows significant reductions of the simulation effort (computing time savings). In any case, the flow model is coupled with reaction kinetics that impact the fluid rheology [
19].
Thus, one-dimensional models are specially appropriate when addressing optimization or control in reactive extrusion. In particular, the model proposed in [
20] predicts the transient and steady-state behaviors, i.e., pressure, monomer conversion, temperature, and residence time distribution in different operation conditions.
However, these simulations require several sub-models on establishing constitutive equations (viscosity, chemical kinetics, mass and temperature transfers). Actually, it takes time and the intuition and accumulated knowledge of experienced specialists. Furthermore, it is important to note that, despite the impressive effort spent by hundreds of researchers and thousands of published papers, no constitutive equation exists describing, for example, the behavior of complex polymer formulations such as reactive extrusion systems.
In summary, such a process is quite complex and would require a detailed study on the influence of the nature of polymers and chemical reactions (kinetics and rheology), processing conditions (temperature, screw speed, flow rate, screw profile). Nevertheless, a determinist answer to each of these parameters is out of consideration and actually we believe that the understanding of such a process is quite unrealistic from usual approaches.
1.3. Objectives of the Study
The present work aims at addressing a challenge in terms of industrial applications that is not necessarily based on improving the understanding of the process itself, but replacing the complex fluid and complex flow by an alternative modeling approach able to extract the link between the process outputs and inputs, key for transforming experience into knowledge.
A model of a complex process could be envisaged with two main objectives: (i) the one related to the online process control from the collected and assimilated data; (ii) the other concerned by the offline process optimization, trying to extract the optimal process parameters enabling the target properties and performances. Even if the modeling procedure addressed in this work could be used in both domains, the present work mainly focuses on the second one, the process modeling for its optimization; however, as soon as data could be collected in real-time, with the model available, process control could be attained without major difficulties.
There are many works in which each one uses a different data-driven modeling technique, diversity that makes it difficult to understand if there is an optimal technique for each model, or if most of them apply and perform similarly. Thus, this paper aims at comparing several techniques first, and then, using one of them that the authors recently proposed, and that performs in the multi-parametric setting, address some potential uses.
2. Modeling
In this section, we revisit some regression techniques that will be employed after for modeling reactive extrusion. For additional details and valuable references the interested reader can refer to
Appendix A.
In many applications like chemical and process engineering or materials processing, product performances depend on a series of parameters related to both, the considered materials and the processing conditions. The number of involved parameters is noted by D and each parameter by , , all of them grouped in the array .
The process results in a product characterized by different properties or performances in number smaller or greater than D. In what follows, for the sake of simplicity and without loss of generality, we will assume that we are interested in a single scalar output noted by y.
From the engineering point of view, one is interested in discovering the functional relation between the quantity of interest—QoI—y and the involved parameters , mathematically, because it offers a practical and useful way for optimizing the product by choosing the most adequate parameters .
There are many techniques for constructing such a functional relation, currently known as regression, some of them sketched below, and detailed in
Appendix A where several valuable references are given.
2.1. From Linear to Nonlinear Regression
The simplest choice consists in the linear relationship
that if
data are available that is
couples
,
, then the previous equation can be written in the matrix form
where
denotes the value of parameter
at measurement
s, with
and
. The previous linear system can be expressed in a more compact matrix form as
Thus, the regression coefficients
are computed by simple inversion of Equation (
3)
from which the original regression form (
1) can be rewritten as
where
.
When the number of measurements becomes larger than the number of unknowns , i.e., , the problem can be solved in a least-squares sense.
However, sometimes linear regressions become too poor for describing nonlinear solutions and in that case one is tempted to extended the regression (
1) by increasing the polynomial degree. Thus, the quadratic counterpart of Equation (
1) reads
where the number of unknown coefficients (
) is
that roughly scales with
. When considering third degree approximations, the number of unknown coefficients scales with
and so on.
Thus, higher degree approximations are limited to cases involving few parameters, and multi-parametric cases must use low degree approximations because usually the available data are limited due to the cost of experiences and time.
The so-called sparse-PGD [
7] tries to encompass both wishes in multi-parametric settings: higher degree and few data. For that purpose, the regression reads
where the different single-valued functions
are a priori unknown and are determined sequentially using an alternate directions fixed point algorithm. As at each step one looks for a single single-valued function, higher degree can be envisaged for expressing it into a richer (higher degree) approximation basis, while keeping reduced the number of available data-points (measurements).
2.2. Code2Vect
This technique deeply revisited in the
Appendix A proposes mapping points
,
, into another space
, such that the distance between any pair of data-points
and
scales with the difference of their respective outputs, that is, on
.
Thus, using this condition for all the data-point pairs, the mapping is obtained, enabling for any other input array compute its image . If is very close to , one can expect that its output y becomes very close to , i.e., . In the most general case, an interpolation of the output is envisaged.
2.3. iDMD, Support Vector Regression, and Neural Networks
Inspired by dynamic model decomposition—DMD—[
8,
9] one could look for
minimizing the functional
[
10]
whose minimization results in the calculation of vector
that at its turn allows defining the regression
.
Appendix A and the references therein propose alternative formulations.
Neural Networks—NN—perform the same minimization and introduce specific treatments of the nonlinearities while addressing the multi-output by using a different number of hidden neuron layers [
21].
Finally, Support Vector Regression—SVR—share some ideas with the so-called Support Vector Machine—SVM [
22], the last widely used for supervised classification. In SVR, the regression reads
and the flatness in enforced by minimizing the functional
while enforcing as constraints a regularized form of
3. Experiments
The purpose of this project is the dispersion of a thermosetting (TS) polymer in a polyolefin matrix using reactive extrusion by in situ polymerisation of the thermoset (TS) phase from an expoxide resin and amine crosslinker. Here, Polypropylene (PP) has been chosen as the polyolefin matrix. A grafted PP maleic anhydride (PP-g-MA) has been used to ensure a good compatibility between the PP and the thermoset phases.
These studies were carried out as part of a project with TOTAL on the basis of a HUTCHINSON patent [
23]. This patent describes the process for preparing a reinforced and reactive thermoplastic phase by dispersing an immiscible reactive reinforcing agent (e.g., an epoxy resin as precursor on the thermoset dispersed phase). This process is characterized by a high shear rate in the extruder combined with the in-situ grafting, branching, and/or crosslinking of the dispersed phase. These in situ reactions permit the crosslinking of the reinforcing agent as well as the compatibility of the blend with or without compatibilizer or crosslinker. The result of this process is a compound with a homogeneous reinforced phase with thin dispersion (<
m) leading to an improvement of the mechanical properties of the thermoplastic polymer. The experiments carried out in the framework of the present project are mainly based on some experiments described in the patent. However, new complementary experiments have been carried out to complete the study.
3.1. Materials
The main Polypropylene used as the matrix is the homopolymer polypropylene PPH3060 from TOTAL. Two other polypropylenes have been used to study the influence of the viscosity, and several impact copolymer polypropylenes have also been tested in order to combine a good impact resistance with the reinforcement brought by the thermoset phase. A PP-g-MA (PO1020 from Exxon) with around 1 wt% of maleic anhydride has been used as a compatibilizer between the polypropylene matrix and the thermoset phase. All the polypropylenes used are listed in
Table 1 with their main characteristics.
Concerning the thermoset phase, three systems have been studied. As a common point, these three systems are based on epoxy resins that are DGEBA derivates with two epoxide groups, two different resins (DER 667 and DER 671 from DOW Chemicals have been used. The first two systems, named R1 and R2 here, are both constituted of an epoxy resin mixed with an amine at the stoichiometry. The first uses the DER 667 with a triamine (Jeffamine T403 from Huntsman) that is sterically hindered, whereas the second one uses the DER 671 with a cyclic diamine (Norbonanediamine from TCI Chemicals. Melamine has also been tested in one of the formulations. The third system, named R5 here, mixes the epoxy resin DER 671 with a phenolic hardener (DEH 84 from DOW Chemicals) that is a blend of three molecules: 70 wt% of an epoxy resin, a diol, and less than 1 wt% of a phenolic amine. These systems have been chosen in order to see the influence of the structure, molar mass, and chemical nature on the in-situ generation of the thermoset phase within our polyolefin matrix.
Table 2 summarizes the systems studied.
The kinetics of these chemical systems have been studied from the variation of the complex shear modulus from a time sweep experiment with an ARES-G2 Rheometer (TA Instruments). The experiments have been performed for temperatures from 115 °C to 165 °C using a 25 mm plate-plate geometry, with a 1 mm gap, at the frequency = 10 rad/s and a constant strain of 1%. The kinetics have been performed on a stoichiometric premix of the reactants. The gel times of the systems have thus been identified as the crossover point between the loss and storage modulus. Note that the reaction is too fast to be performed at temperatures beyond T = 165 °C. Consequently, an extrapolation according to an Arrhenius law allowed us to determine the gel time of the systems at T = 200 °C (Barrel temperature of the extruder). The results give a gel time lower than 10 s for the three systems ((R1) = 4.5 s, (R2) = 10 s, and (R5) < 1 s), so we made the hypothesis that the reaction time is much lower than 1 min and thus that the reaction is totally completed at the die exit of the extruder. Moreover, a Dynamic Mechanical Analysis (DMA) showed that the main mechanical relaxation T associated with the T of the thermoset phase is close to 80 °C, which is the T observed for TS bulk systems.
The influence of the addition of silica on the final properties has been studied with two different silicas (Aerosil R974 and Aerosil 200).
3.2. Methods
3.2.1. Extrusion Processing
The formulations have been fulfilled in one single step with a co-rotating twin screw extruder (Leistritz ZSE18, L/D = 60, D = 18 mm), with the screw profile described in
Figure 1.
Two different temperatures profiles have been used, one at 230 °C and the other one at 200 °C, both with lower temperatures for the first blocs to minimize clogging effects at the inlet. These temperature profiles are described in
Figure 2.
Several screw rotation speeds and flow-rates have been used to study the influence of the process on the final materials (N = 300, 600, 450, 800 rpm; 3, 5, 6, 10 kg/h).
The solid materials were mixed and introduced at the entrance by a hopper for the pellets and with a powder feeder for the micronized powders. As for the liquid reagents, they were injected over the third bloc with an HPLC pump. The formulations are cooled by air at the exit of the extruder and then pelleted.
3.2.2. Characterization
Tensile-test pieces (5A) and impact-test pieces have been injected with a Babyplast injection press at 200 °C and 100 bar. Young modulus has been determined by a tensile test with a speed of 1 mm/min and Stress at yield, Elongation at break, and Stress at break have been measured with a tensile speed of 50 mm/min. Impact strength has been measured by Charpy tests on notched samples at room temperature.
4. Data-Driven Modeling: Comparing Different Machine Learning Techniques
As previously mentioned, a model that links the material and processing parameters with the processed material properties is of crucial interest. By doing that, two major opportunities could be envisaged: the first one concerns the possibility of inferring the processed material properties for any choice of manufacturing parameters; (second), for given target properties, one could infer the processing parameters enabling them.
In this particular case, process parameters are grouped in the six-entrees array
:
whereas the processed material properties are grouped in the five-entrees array
, containing the Young modulus, the yield stress, the stress at break, the strain at break, and the impact strength,
As previously discussed, our main aim is extracting (discovering) the regression relating inputs (material and processing parameters)
with the outputs (processed material properties)
, the regression that can be written in the form
where
represents the linear or nonlinear regression associated with the
i-output, or when proceeding in a compact form by creating the multi-valued regression relating the whole input and output data-pairs, as
The intrinsic material and processing complexity justify the nonexistence of valuable and reliable models based on physics, able to predict the material evolution and the process induced properties. For this reason, in the present work, the data-driven route is retained, from the use of regression techniques, as the ones previously summarized.
The available data come from experiments conducted, described in the previous section that consists of
pairs of arrays
, and
,
, that is:
all them reported in
Table A1 and
Table A2 included in the
Appendix B (for the sake of completeness and for allowing researchers to test alternative regression procedures).
Table A1 groups the set of input parameters involved in the regression techniques. The hyper parameter
MaskIn is a boolean mask indicating if the data are included in the training (
MaskIn) or it is excluded from the training to serve for quantifying the regressions performance (
MaskIn). On the other hand,
Table A2 groups the responses, experimental measures, for each processing condition.
As indicated in the introduction section, one of the objectives of the present paper is analyzing if different machine learning techniques perform similarly, or their performances are significantly different. For this purpose, this section aims at comparing the techniques introduced in
Section 2, whereas the next section will focus on the use of one of them.
In order to compare the performances of the different techniques, an error was introduced serving to compare the regressions prediction. In particular, we consider the most standard error, the Root Mean Squared Error (RMSE). When applied to the different regression results, it offers a first indication on the prediction performances.
Table 3 reports the errors associated with each regression when evaluating the output of interest that is the array
for a given input
, for all the data reported in
Table A1 and
Table A2. Because the different outputs (the components of array
) present significant differences in their typical magnitudes,
Table 4 reports the relative errors, computed from the ratios between the predicted and measured data difference to the measured data.
Sparse PGD—sPGD—employed second degree Chebyshev polynomials and performed a regression for each of the quantities of interest according to Equation (
14). The use of low degree polynomials avoided overfitting, being a compromise for ensuring a reasonable predictability for data inside and outside the training data-set. From a computational point of view, 20 enrichment (
N in Equation (
A16)) were needed for defining the finite sum involved in the separated representation that constitutes the regression of each output of interest
.
Code2Vect addressed the low-data limit constraint by imposing a linear mapping between the representation (data) and target (metric) spaces, avoiding spurious oscillations when making predictions on the data outside the training set.
Considering the iDMD because of the reduced amount of available data, the simplest option consisting of a unique matrix relating the input–output data pairs in the training set (linear model) was considered, i.e., with respect to Equation (
15), it was assumed
, and matrix
ensuring the linear mapping was obtained by following the rationale described in
Section 2. The computed regression performs very well despite the fact of assuming a linear behavior.
The quite standard Neural Network we considered (among a very large variety of possible choices) presents a severe overfitting phenomenon in the low data-limit addressed here. This limitation is not intrinsic to NN, and could be alleviated by considering richer architectures and better optimizers, parameters, out of the scope of the present study.
The main conclusion of this section is the fact that similar results are obtained independently of the considered technique that seems quite good now from the point of view of engineering. Even if the errors seem quite high, it is important to note that: (i) the highest errors concern the variables exhibiting the largest dispersion in the measurements; (ii) the prediction errors are of the same order as that of the dispersion amplitudes; and (iii) we only considered 35 data-points from the 59 available for the training (regression construction) while the reported errors were calculated by using the whole available data (the 59 data-points). The next section proves that the prediction quality increases with the increase of the amount of points involved in the regression construction (training).
5. Data-Driven Process Modeling
In view of the reported results, it can be stressed that all the analyzed techniques show similar performances and work reasonably well in the low-data limit (where only 60% out of 59 data points composed the training data-set were used in the regressions).
As it can be noticed, some quantities of interest such as the Young’s modulus and the stress at break are quite well predicted when compared with the others on which predictions were less performant. There is a strong correlation between this predictability capability and the experimental dispersion noticed when measuring these other quantities, like the strain at break. That dispersion represents without any doubt a limit in the predictability that must be addressed within a probabilistic framework. All the mechanical tests were performed on five samples from the same experiment process on the extruder. The final value is the average of these five tests. The confidence interval is estimated: 10% for the Young modulus and Yield stress, 20% on elongation and stress at break.
Extracting a model of a complex process could serve for real-time control purposes, but also, as it is the case in the present work, for understanding the main tendencies of each quantity of interest with respect to each process or material parameter (the last constituting the regression inputs), enabling process and material optimization.
In order to perform that sensibility analysis, we consider a given quantity of interest and evaluate its evolution with respect to each of the input parameters. When considering the dependence with respect to a particular input parameter, all the others are fixed to their mean values, even if any other choice is possible.
Figure 3 shows the evolution of
with respect to the six input parameters, using the lowest order sPGD modes to extract the main tendencies.
From these AI-based metamodels, one should be able to identify the process conditions and the concentration of the TS phase in order to enhance a certain mechanical property. Thus, in order to increase the stress at break, increasing the content of thermoset seems a good option, with all the other properties (Young modulus, stress at yield, strain at break and impact strength being almost insensible to that parameter). A more detailed analysis, involving multi-objective optimization (making use of the Pareto front) and its experimental validation, constitutes a work in progress, out of the scope of the present work.
To further analyze the accuracy of the methodology and the convergence behavior, in what follows, we consider one of the regression techniques previously described and employed, the sPGD, and perform a convergence analysis, by evaluating the evolution of the error with respect to the size of the training data-set.
The training-set was progressively enriched, starting from 30 data points, and then considering 35, 41, 47, and finally 53 (that almost correspond to 50%, 60%, 70%, 80%, and 90% of the available data-set). The error was calculated again by considering both the training and test data-sets.
Table 5 reports the results on the elastic modulus prediction, and clearly proves, as expected, that the prediction accuracy increases with the size of the training-set, evolving from around 15% to finish slightly below 10%.
It is important to note that one could decrease even more the error when predicting the training-set, but overfitting issues will occur and the error will increase tremendously out of the training set compromising robustness. The errors here reported are a good compromise between accuracy in and out of the training-set.
In order to facilitate the solution reproducibility, in what follows, we give the explicit form of the sPGD regression. As previously discussed, the sPGD makes use of a separated representation of the parametric solution, whose expression reads for a generic quantity of interest
More explicitly, each univariate funcion
is approximated using an approximation basis,
When approximating the elastic modulus, whose results were reported in
Table 5, we considered six parameters, i.e.,
, a polynomial Chebyshev basis consisting of the functions
(needing for a pre-mapping of the parameter intervals into the reference one
where the Chebyshev polynomials are defined). The number of modes (terms involved by the finite sum separated representation) and number of interpolation functions per dimension were set to
and
. The coefficients related to Equation (
18) when applied to the elastic modulus approximation are reported in
Appendix C.
An important limitation, inherent to machine learning strategies, is the fact that most likely other factors instead of the ones considered as inputs could be determinant for expressing the selected outputs. This point constitutes a work in progress.
6. Conclusions
We showed in this paper that different machine learning techniques are relevant in the low-data limit, for constructing the model that links material properties and process parameters in reactive polymer processing. Actually, these techniques are undeniably effective in complex processes such as reactive extrusion. More precisely, this work was based on the in situ synthesis of a thermoset phase during its mixing/dispersion with a thermoplastic polymer phase, which is certainly one of the most complex cases in the processing of polymers.
We proved that a variety of procedures can be used for performing the data-driven modeling, whose accuracy increases with the size of the training-set. Then, the constructed regression can be used for predicting the different quantities of interest, for evaluating their sensitivity to the parameters, crucial for offline process optimization, and also for real-time process monitoring and control.