14 views (last 30 days)

Show older comments

Hi, I would like to know if is it possible to apply Genetic Algorithm (GA) or Simulated Annealing (SA) optimization methods to a parametric system of ODEs in order to estimate the best set of parameters for curve fitting to an experimental data set.

I've written a code to do that, using nnlinfit and lsqcurvefit. With the first one I've obtained good results in fitting but some parameters had negative values, so I used lsqcurvefit (with trust-region-reflective algorithm) with lower bounds=0 but the system have a lot of local minima and is very susceptible to the initial values and to the upper bounds, giving always different results.

The ODEs system consists in 5 equations and 6 parameters, and for lsqcurvefit I've used this line script

[parameters] = lsqcurvefit(@odesystem,parr0,time,experimentaldata,zeros(size(p)),[ ])

parr0 is the vector of the parameters' initial values; I've tried to change it several times obtaining always different solutions.

For these reasons I've tried to replace lsqcurvefit with GA or SA but It doesn't work because they don't recognize as valid the function @odesystem used in the lsqcurvefit script.

In the lsqcurvefit script @odesystem is a function that uses ode23s to solve the parametric system.

Maybe I have to rewrite the code in order to give to GA or SA a correct form of the function to be used by these algorithm, do you have an idea how to solve this problem?

Do you think that the usage of GA or SA is not the correct way to obtain the best set of parameters because of the system complexity?

Thank you in advance.

Star Strider
on 27 Feb 2019

It is definitely possible to use the ga function to estimate the parameters of a system of differential equations. I experimented with this on my own.

An example using ga (based on a previous comparmental kinetics modeling problem that used lsqcurvefit):

% NOTES:

%

% 1. The ‘theta’ (parameter) argument has to be first in your

% ‘kinetics’ funcition,

% 2. You need to return ALL the values from ‘DifEq’ since you are fitting

% all the values

function C=kinetics(theta,t)

c0=[1;0;0;0];

[T,Cv]=ode45(@DifEq,t,c0);

%

function dC=DifEq(t,c)

dcdt=zeros(4,1);

dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);

dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);

dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);

dcdt(4)= theta(5).*c(2)-theta(6).*c(4);

dC=dcdt;

end

C=Cv;

end

t=[0.1

0.2

0.4

0.6

0.8

1

1.5

2

3

4

5

6];

c=[0.902 0.06997 0.02463 0.00218

0.8072 0.1353 0.0482 0.008192

0.6757 0.2123 0.0864 0.0289

0.5569 0.2789 0.1063 0.06233

0.4297 0.3292 0.1476 0.09756

0.3774 0.3457 0.1485 0.1255

0.2149 0.3486 0.1821 0.2526

0.141 0.3254 0.194 0.3401

0.04921 0.2445 0.1742 0.5277

0.0178 0.1728 0.1732 0.6323

0.006431 0.1091 0.1137 0.7702

0.002595 0.08301 0.08224 0.835];

theta0=[1;1;1;1;1;1];

% [theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@kinetics,theta0,t,c);

ftns = @(theta) norm(c-kinetics(theta,t));

PopSz = 500;

Parms = 6;

opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'PlotFcn','gaplotbestf');

t0 = clock;

fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)

% [B,fval,exitflag,output,population,scores] = ga(ftns, 53, [],[],[],[],zeros(53,1),Inf(53,1),[],[],opts)

[theta,fval,exitflag,output] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts)

t1 = clock;

fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)

GA_Time = etime(t1,t0)

fprintf('\nElapsed Time: %23.15E\t\t%02d:%02d:%02d.%03d\n', GA_Time, hmsdv(GA_Time))

fprintf(1,'\tRate Constants:\n')

for k1 = 1:length(theta)

fprintf(1, '\t\tTheta(%d) = %8.5f\n', k1, theta(k1))

end

tv = linspace(min(t), max(t));

Cfit = kinetics(theta, tv);

figure(1)

hd = plot(t, c, 'p');

for k1 = 1:size(c,2)

CV(k1,:) = hd(k1).Color;

hd(k1).MarkerFaceColor = CV(k1,:);

end

hold on

hlp = plot(tv, Cfit);

for k1 = 1:size(c,2)

hlp(k1).Color = CV(k1,:);

end

hold off

grid

xlabel('Time')

ylabel('Concentration')

legend(hlp, 'C_1(t)', 'C_2(t)', 'C_3(t)', 'C_4(t)', 'Location','N')

Note the coding of the ‘ftns’ fitness function, and its relation to the objective function I originally wrote to use with lsqcurvefit. I wrote this as my own experiment, so it is not throughly as comment-documented as I usually do for code I post here. I will provide as much of an explanation as I can for anything that is not clear.

Genetic algorithms take a while to converge, however in my experience, they are generally successful if you give them the opportunity.

Star Strider
on 28 Feb 2019

As always, my pleasure.

Note that in my code, I constrained the parameters to be greater than zero, since the kinetic parameters are all positive by definition.

I doubt that the replicates would cause problems providing your differential equations and data account for them (the same replicates for the independent and dependent variables, for example). The only effect those would have is to give the replicates more ‘weight’, in the sense that they would have more influence on the fitness (and therefore the estimated parameters) than if they were not replicated. If all the data were replicated, it would simply slow down the ga function.

If you are having problems getting ga to fit your data, I would use the original ‘opts’ structure in my code, where ‘Parms’ is the number of parameters you want to fit. I scaled the initial population as:

'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3

so scale it to provide appropriate initial parameter magnitudes. (Here they are all in the range of about 1, however you can scale their magnitudes individually by multiplying them with a vector of magnitudes, and using the bsxfun function, most efficiently done in a separate line of code. I chose randi because it is simply easier to define the initial range of parameters with it.) I designed it to begin with a large and very wide-ranging initial population, specifically in order to easily find the best parameter set in its parameter space. It does this, although at the expense of speed, since larger populations produce slower convergence.

I have occasionally had to run ga a few times with the same problem if it ended up going wildly off course the first time. A large initial population (I chose 500) usually prevents this. Also, if you are having problems getting a good fit of your model to your data, go over your model carefully to be certain you coded it correctly. It has been my personal experience that small errors can cause significant problems.

alesmaz
on 18 Jun 2019

Star Strider
on 18 Jun 2019

crt56 ak
on 8 Aug 2021

Edited: crt56 ak
on 8 Aug 2021

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!