Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication

> 0, ι=[ι1,...,ιm]T. The function h(·) is assumed to be a monotonic odd function satisfying h(0) = 0. For the purposes of this article, h(·) is specifically selected as h(x) = (ez − e−z)/(ez + e−z).

According to the optimal control theory, Equation 11 is a Lyapunov function for the Equation 3 and the Hamiltonian function can be derived as

H(z,μ,V(z))=γ1γ̄2+zTQz+Ū(μ)+∇V(z)(f(z)+g(z)u+γ),    (13)

with ∇V(z)=∂V(z)∂z. On defining V*(z) as the minimum value of Equation 11, based on Bellman's principle of optimality, we have

0=H(z,μ,V*(z))  =γ1γ̄2+zTQz+Ū(μ)+∇V*(z)(f(z)+g(z)u*+γ),    (14)

and the optimal control u* is obtained from ∂H(z,μ,V*(z))∂u*=0:

u*=θ1tanh(-12θ1R-1gT(z)∇V*(z))+ud.    (15)

Substituting Equation 15 into Equation 12 yields

Ū(μ*)=∇V*T(z)g(z)tanh(D(z))+θ12∑i=1mln(1-tanh2(Di(z))),    (16)

where D(z)=12θ1R-1gT(z)∇V*(z) and μ*=u*-ud. Then, the HJB equation can be derived as

H(z,μ*,V*(z))=γ1γ̄2+zTQz+∇V*(z)(f(z)+γ)                             +θ12∑i=1mln(1-tanh2(Di(z)))=0.    (17)

As highlighted in the preceding analysis, obtaining the optimal controller in Equation 15 necessitates solving the HJB Equation 17, a task well-known for its considerable computational and analytical challenges. To overcome this challenge, an iterative algorithm based on ADP is employed to obtain an approximate solution. The details of this iterative algorithm are presented in Algorithm 1.

www.frontiersin.org

Algorithm 1. Encrypted guaranteed cost policy iteration algorithm.

Lemma 1. By utilizing the encrypted PI process as described in Algorithm 1, which incorporates encryption and decryption steps for secure control of the tracking error dynamics in an AVR, the resulting control uς ensures the asymptotic stability of the system dynamics. Additionally, Vς(z) will converge to the optimal value function V*(z) as ς → ∞, ensuring that uς converges to the optimal control u*.

Proof. Initially, without iterations, the control u1 is considered admissible. For ∀uς produced during iterations, consider the Lyapunov function Vς(z), which satisfies

V˙ς(z)=∇Vς(z)ż           =∇Vς(z)(f(z)+g(z)uς+γ).    (20)

According to HJB Equation 17, we can drive

∇Vς(z)(f(z)+g(z)uς+γ)=-γ1γ̄2-zTQz-Ū(μς),    (21)

where μς = uς − ud. Then, substituting Equation 21 into Equation 22 yields

V˙ς(z)=-γ1γ̄2-zTQz-Ū(μς)≤0.    (22)

Therefore, the iteration process ensures that the error dynamics remain asymptotically stable. Moreover, policy improvement is achieved by minimizing the associated value function, consistent with the Kleinman method, guaranteeing convergence. As the iteration count ς → ∞, Vς(z)→V*(z), and uς→u* hold. This concludes the proof.     □

Based on Lemma 1, the iterative process, enhanced with secure encryption and decryption, converges, leading to optimal control as the approximation errors diminish.

4 Critic neural network design

In this section, this study employs the fundamental update equations of PI to design a NN, utilizing the critic neural network (CNN) to approximate the solution of the HJB Equation 17 during each iteration step. Therefore, based on the universal approximation property of NNs, there exist ideal weights W* such that the ideal value function can be approximated as

V*(z)=W*Tφ(z)+ϵ1(z),    (23)

where φ(z) ∈ ℝα denotes activation functions and α is the number of neurons. Utilizing Equation 23, HJB Equation 17 becomes

γ1γ̄2+zTQz+(W*T∇φ(z)+∇ϵ1T(z))(f(z)+γ)                            +θ12∑i=1mln(1-tanh2(Hi(z)))=0,    (24)

where

Hi(z)=H1i(z)+H2i(z)           =12θR-1gT(z)∇φT(z)W*+12θ1R-1gT(z)∇ϵ1T(z),    (25)

with ∇φ(z)=∂φ1∂z and ∇ϵ1(z)=∂φ∂z. Therefore, by defining residual error ϵH, Equation 24 can be rewritten as

    γ1γ̄2+zTQz+W*T∇φ(z)(f(z)+γ)+ϵH+θ12∑i=1mln(1-tanh2(H1i(z)))=0,    (26)

where

ϵH=∇ϵ1T(z)(f(z)+γ)-θ12∑i=1m1O1i(z)tanh(O2i(z))           (1-tanh2(O2i(z))),    (27)

with O1i(z)∈[1-tanh2(Di(z)),1-tanh2(H1i(z))], O2i(z) ∈ [Di(z), H1i(z)]. Note that if the number of hidden layer neurons α is sufficiently large, the residual error ϵH will approach zero. Based on the Lipschitz assumption of the system dynamics, this ϵH is bounded within a compact set, that is, ‖ϵH‖≤ϵ̄H. Therefore, based on Equation 23 the ideal optimal control is

u*=θ1tanh(-12θ1R-1gT(z)∇φTW*)+ud+ϵ2    (28)

where ϵ2=-12∑i=1m(1-tanh2(ψi))R-1gT(z)∇ϵ1, ψi ∈ [Di, H1i].

Since the ideal weight is unknown, the approximated value function is

V^(z)=W^Tφ(z),    (29)

where W^ is approximated value of W*. Then, we can get

û=-θ1tanh(12θ1R-1gT(z)∇φT(z)W^)+ud.    (30

留言 (0)

沒有登入
gif