Bio-inspired neural networks for decision-making mechanisms and neuromodulation for motor control in a differential robot

1. Introduction

Decision-making is a process in animals that allow them to increase their chances of survival. Decision-making includes, for example, knowing when to flee from a threat, avoiding the consuming of spoiled food, or even performing attacking or breeding behaviors. Understanding how decision-making mechanisms work within the cerebral cortex and generating a model of their output behaviors has been a focus of research in neuroscience. In Hurtado-López et al. (2017) and Hurtado-López and Ramirez-Moreno (2019) authors describe the model of a neural network that mimics social behavior in mice involving breeding and attack interactions. As mentioned in Hikosaka et al. (2018) the basal ganglia control body movements. In addition, it is involved in behavioral changes in animals. Héricé et al. (2016) propose a neural network model of the basal ganglia based on spiking neurons. The developed model allows second-level decision-making to be performed as in primates. There are other experiments, performed on Drosophila flies and consisting of introducing them into a flight simulator containing green and blue colored regions. If the fly stood on the blue regions it received a heat punishment. The results gave evidence that these insects have the ability to adjust their flight behavior based on visual color information. In Wei et al. (2017) a model based on Spiking Neural Networks (SNN) and postsynaptic plasticity is proposed to describe in a mathematical way both the decision-making behavior based on visual information received by Drosophila and the learning process.

Providing robotic navigation systems with the capacities mentioned above is of great interest in order to enhance their efficiency and autonomy. Zhao et al. (2020) developed an SNN model that allows describing the experimental behavior in Drosophila and implementing it in a UAV (Unmanned Aerial Vehicle). The results show that with the proposed model the UAV learns to make decisions quickly from the available visual information similar to the experiment. The closest approach to ours is made by Pardo-Cabrera et al. (2022), in which a bio-inspired navigation and exploration system for a robotic hexapod is developed. In this work, a network of social behaviors in mice, proposed in Hurtado-López et al. (2017) and Hurtado-López and Ramirez-Moreno (2019) is modified to perform homing, exploration, and approaching behaviors in robots. We propose a decision network to perform exploration in robots, and present as a novelty the implementation of a network inspired by the basal ganglia proposed by Ramirez-Moreno and Sejnowski (2012) to moderate the decision taken by the main network, reducing the reactivity of the system and providing greater safety in the navigation of the mobile platform.

Using bio-inspired neural networks allows us to perform numerous adaptations from animal-human behaviors and kinetics into autonomous robots. For instance, the performance of fast learning mechanisms for continuous adaptation or flexible plasticity in sensory pathways, in order to generate stable self-organized locomotion, deals with failures and adaptions to different walking in robots. In addition, it is clear how bio-inspired networks can work combined with distributed neural CPG, proprioceptive sensory adaptation, and body-environment interaction, achieving adaptive and flexible interlimb coordination for walking robots, as mentioned in Miguel-Blanco and Manoonpong (2020).

Additionally, the use of frequency is useful in order to control the locomotion of an automaton. Previous results show that the integration between motor pattern mechanisms and adaptation with a CPG-RBF leads to locomotion control of a hexapod robot in a complex environment. This kind of frequency adaptation not only significantly reduces energy use but also is comparable to the biological behaviors observed in animal locomotion (Thor et al., 2021).

As we have seen in previous works, there are numerous architectures of bio-inspired neural networks for the motor control of robots and automata. In this work, a novel bio-inspired neural network was designed for the control of the right and left actuators of a differential robot. For the latter, a reciprocal lateral inhibition circuit was used, which projects periodic cyclic signals and generates antagonistic nonlinear oscillations. The neuronal activity of these synaptic circuits with reciprocal lateral inhibition is typical of the motor control systems of periodic tasks such as breathing, swimming, or walking in vertebrates, among others.

Other approaches bring us a neat use of CPGs in order to control a sprawling quadruped robot (Suzuki et al., 2021), contributing to decentralized control with cross-couple sensory feedback to shaping body-limb coordination, which differs from previous research based on CPGs that works with inter-oscillator couplings or gait patterns based on geometric mechanics.

Ngamkajornwiwat et al. (2020) propose an online self-adaptive locomotion control technique based on the integration of a modular neural locomotion control (MNCL) and an artificial hormone mechanism (AHM) for a walking hexapod robot. Their contribution allows robot control without needing its kinematics, environmental model, and exteroceptive sensors. The technique performed relies only on a correlation between a predicted foot contact signal and the incoming foot contact signal from proprioceptive sensors. The steering and velocity regulation of the robot is achieved.

Most recent research introduces two new concepts in order to develop bio-inspired neural networks for the motor control of an automaton, the Self-Organizing Map (SOM) and the Spiking Neural Networks (SNN). Zahra et al. (2022) integrate both architectures, the SNN in a motor cortex-like differential map transforming motor plans from task-space to joint-space motor commands, and the SOM in a static map correlating joint-spaces of the robot and a teaching agent, which allows a robotic arm to learn from human actions, thus, the robotic arm learns by imitation.

Spiking Neural Network Models are based on the action potential firing temporal sequences. Usually, the leaky integrate-and-fire (LIF) neuron model is used in these networks. This model involves biophysics properties of the neuron such as membrane capacitance, conductance, and resting potential.

Current research shows wide implementations of SNN in mobile automata's navigation tasks. In Cao et al. (2015) a three-layer SNN-based controller is designed and implemented for target tracking of a mobile robot. Environmental information and target information are provided by CCD cameras, encoders, and ultrasonic sensors. The authors implemented a learning strategy based on Hebb's rule to modify the synaptic weights in the connections of the neural network (NN) in charge of the tracking task. Besides, the synaptic weights from a NN specialized in obstacle avoidance are defined by the designer and do not change in time. This strategy seeks to have more relevance in the obstacle avoidance task than that in the tracking task.

Shim and Li (2017), Lobov et al. (2020), and Liu et al. (2022) addressed the use of SNN for the motor control of mobile robots. Liu et al. (2022) proposed a biological autonomous learning algorithm based on reward modulated spike-timing-dependent plasticity (STDP). Taking this into consideration, an automaton can improve its decision-making in obstacle avoidance by a few sessions of trial-and-error in presence of new environments providing robustness to the exploration task. Approaching the cognitive and perception functions instrumented in automata behaviors, Macktoobian and Khataminejad (2016) developed a high-level cognitive behavior into a reactive agent, a Braitenberg vehicle (BV). Low-level perception is obtained by an SNN-Curved trajectory detection (CDT) model with which the motion of an agent in the environment is detected. The vehicle's control for producing the desired behaviors depending on the perception is made by an engineering method, approaching and fleeing behaviors are obtained.

Neurons respond to stimuli by generating action potentials. To describe the state of a neuron, the mean firing rate (MFR) of these action potentials can be taken. The dynamics of the NN based on MFR models can bring a better understanding of the expected behavior of the neuron at first sight than in SNN. Architectures based on SNN found in the literature have solved decision-making and motor control tasks. As mentioned in Suzuki et al. (2021), Thor et al. (2021), and Pardo-Cabrera et al. (2022), MFR models have been implemented to solve these tasks as well. The literature reviewed in this work shows that MFR models which satisfy the decision-making and control-motor for mobile automata navigation have not been explored widely. The present work proposes the design and implementation of an automaton's bio-inspired navigation framework using a mathematical MFR neural model described by Wilson and Cowan (1972).

SNN models have an advantage in single event-based learning on Hebb's rule. In our work, the advantage offered by MFR models is that it simply assimilates the advantage given in SNN models for single-event-based learning. We have achieved this in the meta-control network. Single-event-based learning enables modifying the performance of the automaton by setting the network's parameters in a single trial. In this work, the network's parameters are not modified, and for this reason, it cannot be considered a learning mechanism. The improvement in performance is obtained through the meta-control network. This network allows the adaptation of the automaton's behavior to significant environment changes (pop-up novelties) and dynamic obstacles, this is obtained by properly modulating the velocity applied to the robot's wheels. For the proposed network, the experimentation results show an improvement in the obstacle avoidance task when the meta-control network is involved. The literature consulted shows a recurrence of SNN models rather than mean firing rate models at the cost of the loss of a certain mathematical simplicity. As already mentioned, our model advantage is the combination of such mathematical simplicity and the formulation of single event-based learning.

2. Materials and methods

This section will explain the implemented bio-inspired neural circuits, the neural network design itself, the adaptation stage, and the signal processing.

2.1. Bio-inspired neural network design

As seen in Figure 1, the information perceived from the environment is captured by a LiDAR sensor and is processed by the Signal processing block (Section 2.5). From this block, the signals Ar and Al, and S1 and S2 are processed. The Ar and Al signals convey information about the obstacle's presence or absence in the right and left areas, respectively. These signals enter the Short-term memory circuits block (Section 2.2.1) which extends their information in time. The projections from the previous block enter the Memory linear chain block (Section 2.2.2) and retain and increase the intensity of the projections. In the Comparison Circuit block (Section 2.2.3) the projections from the Memory linear chain block are compared and thus promote a faster decision by the Competitive Neural Network (WTA) block (Section 2.2.4). In the Competitive Neural Network (WTA) block, the projections of the previous block are compared, and a proper motion decision is obtained among rightward, leftward, and forward. With the Adaptation Stage block (Section 2.2.5) it is possible to detect a tendency among the motions that have been executed in a time interval and thus adapt the parameters of the Non-linear oscillation generating circuit block (Section 2.2.7). The Non-linear oscillation generating circuit block, produces the signals for the automaton motor execution. Finally, the Meta-control circuit block (Section 2.2.6) modulates the rightward, leftward, and forward movements, allowing an improvement of the performance in situations where a novelty is prioritized before a previously weighted decision. The Meta-control circuit block is fed by the S1 signal, which corresponds to the information of any new obstacle, and by forced complementarity, we obtain the S2 signal. Forced complementary is understood as a decremental response to an incremental stimulus, obtained by the substruction between a threshold and the stimulus.

www.frontiersin.org

Figure 1. Architecture of the bio-inspired network for the exploration behavior.

The neuron model used in this work takes inspiration from the basic negative feedback loop described by Wilson and Cowan (1972), in which connections with arrow endings represent excitatory projections and connections with circled endings represent inhibitory projections.

The response (R) of a neuron to a single stimulus (P) is described by the differential (Equation 1) (Wilson and Cowan, 1972) where τ is the time constant.

dRdt=1τ(-R+Ψ(M,P,σ))    (1)

Ψ(M, P, σ) is the Naka-Rushton activation function (Wilson and Cowan, 1972), and is implemented as a mathematical approximation of these responses. M is the maximum firing rate for a very intense stimulus and σ, called the half-saturation constant, determines the value at which Ψ(M, P, σ) reaches half of its maximum. The mathematical representation is given in Equation (2).

Ψ(M,P,σ)=, i∈, j∈    (39)

In Figure 8A the O1 projection will be in charge of giving the order to generate the oscillations to perform a left-turn behavior. This will be fed to a parameter adaptation stage, note the order in which the connections coming from AP and AQ are given, these units refer to what is obtained in a stage of adaptation of left and right turning. Consequently, the left-turn adaptation (AP) presents an excitatory connection contrary to the inhibition of the right-turn adaptation (AQ). Our aim is to prolong this behavior over time. The projections of this stage continue to the nonlinear oscillation generator circuit. In this circuit, a left-turn adaptation unit is added again with the intention of increasing the difference between the widths of the oscillations and generate a torque that allows to change the orientation of the robot. The mathematical representation is given in Equations (36)–(39).

dRndt=1τ7(−Rn+Ψ(A−ρG,λ,B+R(n+2))    (40)  dRidt=1τ8(-Ri+βR(i-2))    (41)  λ=Kjf(O2+AQ-AP)-dR(2-(n-1))+(-1)nψAQ    (42)  n∈, i∈, j∈    (43)

For the generation of the right-turn swing oscillations (Figure 8B), the same structure and principle is used. However, care must be taken, once again, with the connections of the adaptation units. In this case, the right-turn adaptation (AQ) has excitatory connections, and the left-turn adaptation (AP) has inhibitory connections. Likewise, the right-turn adaptation unit is added to the oscillation generator circuit to generate the difference in the width of the oscillations, in this case in the opposite signal to that of the right-turn and, in that way, to rotate in the opposite direction. The mathematical representation is given in Equations (40)–(43).

dFndt=1τ7(-Fn+Ψ(A-ρG,   K(2j+1)λ,   B+F(n+2)))    (44)  dFidt=1τ8(-Fi+βF(i-2));    (45)  λ=f(O3-AP-AQ)-dF2-(n-1)    (46)  n∈, i∈, j∈    (47)

Finally, for the generation of oscillations corresponding to the forward motion (Figure 8C), both adaptations (AP and AQ) project inhibitory connections, taking into account that in the established design the forward motion is expected to be less predominant. The mathematical representation is given in Equations (44)–(47).

www.frontiersin.org

Figure 8. Non-linear oscillation generator circuit. (A) Non-linear oscillation circuit for turning left. (B) Non-linear oscillation generator circuit for turning right. (C) Non-linear oscillation generator circuit for forward motion.

2.3. Software configuration

For the virtual implementation, we made use of the robot Burger from the TurtleBot3 open source libraries (Open Source Robotics Foundation, 2020). The simulated environment was performed in the Gazebo simulator (Foundation, 2014). The middle-ware used was ROS (Robotics, 2021).

2.4. Hardware configuration

A TurtleBot3 Burger platform was used as the mobile automaton. This robot is configured with a 360-degree LDS-01 LiDAR sensor, a Raspberry Pi 3 Model B board for processing, and an OpenCR board for hardware control. The wheels actuator is the Dynamixel XL430-W250 motor. All the system is powered by a 3 cell LiPo battery of 11.1v and 2.2 Ah. The robot dimensions are visualized in Figure 9.

www.frontiersin.org

Figure 9. Turtlebot3 Burger model dimensions. Taken from Robotis (2022).

The robotic platform was configured with ROS Kinectic middleware installed on a Raspbian Buster operating system. The processing of the bio-inspired exploration system was tested both on the embedded and on an external computing unit, the latter configured with ROS noetic, Ubuntu 20.04, an Intel Icore i7 8th generation processor, and 16 GB ram memory. The communication between the embedded and the computational unit was done via WiFi.

2.5. Signal processing

Considering that the objective of the terrestrial navigation platform is to perform an obstacle avoidance exploration behavior, it was proposed to make use of the information captured by the LiDAR sensor to generate the input signals to the bio-inspired network. This was processed as shown in Figure 10. Just frontal information provided by the sensor was considered and was divided into two areas Ar (0°–90°) and Al (90°–180°). A safety area of 0.5m radius was defined, with which it is defined that: points belonging to the degree range of the Ar area are classified into points inside the safety area (Pri) and points outside the safety area (Pro), likewise for the Pli and Plo points of Al. Points inside the safety area are penalized with a value of −1, while points outside the safety area are assigned with a value of +1. So, the values assigned to the areas Al and Ar are defined as shown in Equations (48), (49). The former processing is intended to define in which direction (right or left) obstacles are closer to the robot so that the robot will head toward the clearest area.

www.frontiersin.org

Figure 10. Signal processing. Take l for left and r for right. Take i for inside and o for outside.

As mentioned in Section 2.2.6, the aim of incorporating a basal ganglia-inspired meta-control network is to mediate the decisions made by the main network. It is proposed that the meta-control network will act on decisions where the robot's environment changes dramatically, for instance, when there is the presence of dynamic objects. To detect this, it is proposed to keep a record of the result obtained from the areas at instant t − 1 and compare it with those obtained at instant t. If a difference greater than a threshold ϵ exists, a value of 100 will be given to the signal S1 of the meta-control network in Equation (50) (Section 2.2.6).

Ar=∑k=0°90°Prok+Prik;   Prok=+1,   Prik=-1    (48) Al=∑k=90°180°Plok+Plik;   Plik=-1,   Plok=+1    (49) S1={100,|Art−1−Art|Art−1>ϵ   and   |Alt−1−Alt|Alt−1>ϵ0,otherwise    (50) 3. Results

In this section, the results obtained from both the simulation part and its implementation in the TurtleBot3 Burger robot are presented. The performance of the automaton in the exploration task and in the obstacle avoidance task was measured.

3.1. Simulation 3.1.1. Exploration task

To evaluate the performance of the exploration behavior, as well as its obstacle avoidance task, controlled by the bio-inspired neural network, the adaptation and simulation of the environments for exploration proposed in Yan et al. (2015) were implemented in Gazebo. In this work, the environments have a maximum exploration area of 4 m2. The maze walls are rigid and fully reflective surfaces, and the corridor width is, at least, 3 times the outside diameter of the robot. These mazes are denominated loop, cross, zigzag and traditional maze. In Figures 11AD, the navigation in an established way is evaluated. In Figures 11EH the environments are simulated until a collision or a deadlock situation takes place. The simulation results of these environments are presented in Figure 11.

www.frontiersin.org

Figure 11. Simulation environment results. Figures on the left side show the Gazebo simulation environment without the meta-control circuit. The right side images show the trajectory made by the robot in the exploration behavior with the meta-control circuit. (A, B) Loop. (C, D) Zigzag. (E, F) Cross. (G, H) Traditional maze. In (B, D, F, H), one can observe how we obtain a better performance using the meta-control network and allowing to achieve a greater trajectory in (F, H).

Figure 11 illustrates the performance of the automaton without the meta-control network (left column, Figures 11A, C, E, G) and with the meta-control network (right column, Figures 11B, D, F, H). It is observed how the network modulates the right and left behaviors in the left column allowing better performance in the right column along the same path.

In the results shown in Figure 11, the mazes have a total area of 4.0 x 4.0 m with walls 1.0 m high and corridors 0.50 m wide. The environments in Figures 11AD have 0.15 m wide walls, and the environments in Figures 11EH have 0.05 m wide walls. The LiDAR sensor has a 360° field of view with a reading range of 0.12–3.50 m. Considering the safety area defined on the robot, Section 2.5, this field of view is reduced to 180° and a range of 0.12–0.50 m. In environments such as cross or traditional maze, if the width of its corridors is increased, it would cause the automaton to make a late decision between its three behaviors at interceptions, due to its actual change of vision, colliding with the outside corners while taking a wide-open curve. Considering the average speed of 0.04 m/s at which the automaton travels, this does not favor such decision-making. The opposite is true for loop and zigzag environments, where the automaton only decides between one of its behaviors.

3.2. Implementation

The bio-inspired neural network with neuromodulation designed in this work was mounted in the automaton TurtleBot3 Burger in order to measure its performance.

3.2.1. Exploration task

To evaluate the performance of the automaton in a natural environment, a hand-made maze was built, as shown in Figure 12. Each environment has a minimum of 1.0 m2 and a maximum of 2.0 m2; except for the simple maze that was built freely.

www.frontiersin.org

Figure 12. Implementation environment results. The path made by the automaton was drawn with red lines in each type of environment. Green circles are initial positions and blue circles are final positions. (A) Loop. (B) Cross. (C, D) Traditional maze part 1 and part 2, respectively. (E) Zigzag. (F) Simple maze. There one can be observed how the automaton completed the (A, E, F) environments successfully. In the (B) environment the automaton's performance started in the middle of the cross-environment and finished doing circles around the environment. In the (C, D) environments, there can be observed how the automaton's trajectory finishes at its starting point.

Figure 13 shows the signals obtained in the physical implementation of a zigzag environment. Figure 13A illustrates the information obtained from the real environment and its processing in time, top left image exhibits the LiDAR's points processing inside a corridor of the zigzag-maze, blue points correspond to points inside the safety zone and red points are those outside it. S1, Ar, and Al curves are the signals mentioned in Section 2.5, these were sampled within an interval of 360ms. Notice that S1 fires when there is an appropriate change in the values of Ar and Al from one instant to another one. For instance, near sample 99 the Ar signal changes from 90 to 50 and Al from 30 to 0, then, S1 triggers from 0 to 100. The automaton's trajectory seen in Figure 12E is a result of processing Ar and Al signals. The biggest values of Ar and Al appear when the robot executes turns. The projection of the meta-control network is shown in Figure 13B. Approaching sample 230 of S1, Ar, and Al signals, there were more obstacles inside the left area, thus, the robot must turn to the right. Figure 13C shows the wheels' motor action corresponds to this time, this signal was sampled within an interval of 1.0ms. The blue signal corresponds to the left wheel and the orange to the right. Blue oscillations are wider than orange oscillations, then, the left wheel spins more than the right wheel, and the right turn is made. When S1 fires, the modulation in the wheels' motor action is applied, and this generates a reduction in the amplitude of the oscillating signals. This reduces the automaton's velocity which gives time to taking a better decision.

www.frontiersin.org

Figure 13. Implementation of the zigzag environment. (A) Real signals obtained from the environment and its processing in time. The top left image exhibits the LiDAR's points processing inside a corridor of the zigzag environment. Blue dots correspond to the points inside the safe area and red dots are the points outside the safe area. S1, Ar, and Al curves are the inputs signals for the bio-inspired network (Section 2.5), these were sampled within an interval of 360ms. The automaton's trajectory seen in Figure 12E is a result of processing Ar and Al signals. The biggest values of Ar and Al appear when the robot executes turns. (B) Meta-control circuit projection G. (C) Motor control signals of the mobile automaton. This signal was sampled within an interval of 1.0ms. Blue and orange signals correspond to the left wheel and the right wheel, respectively.

3.2.2. Meta-control circuit test

The performance of the neuromodulation network was tested by putting an obstacle (box) in the automaton's area vision. The automaton automatically avoids the obstacle and continues exploring (see Supplementary Videos 6, 8).

3.3. Metrics

To quantify the performance of the exploration in the different established environments the next metrics are proposed:

• Covered distance (Td): Covered distance by the robot measured in meters.

• Elapsed time (Tt): Spent time in seconds.

• Average speed reached (Tv): Quotient between Td and Tt.

• Exploration area (Ea): Percentage of the total environment area covered by the robot.

These metrics values obtained for each simulated environment are presented in Table 2A.

www.frontiersin.org

Table 2. Metrics values for simulation environments in Figure 11.

In order to compare quantitatively the performance evaluation of the automaton's trajectory and the optimum trajectory we added point-to-point metrics. The automaton's trajectory evaluation was evaluated considering the optimum trajectory, defined as the way that keeps in the middle of the corridors. In this comparison, the RMSE, mean error, standard deviation error, minimum error, and maximum error were computed for each axis. The results of the error metrics for each environment presented in Figure 11 are shown in Tables 2B, C.

4. Discussion and conclusion

The framework proposed in this work faces strong difficulties when it comes to navigate much more complex mazes (see Figures 11B, D). The automaton shows a very good performance in environments like those seen in Figures 11A, C. That difficulty is linked to the analysis of the environment information. Reducing the analysis to a specific area provoked a delay in the decision-making when an object appeared suddenly in front of the robot in open environments. Shortly, this problem could be solved by increasing the safety area, nevertheless, this could affect the performance in reduced space environments as shown in Figures 11A, C. The sensed area could be penalized with negatives. Due to this fact, it is proposed as part of future work the development and implementation of a bio-inspired strategy that allows a dynamic adjustment of the robot's safety area depending on the environment (wide or narrow areas).

The discussion presented above is also supported by the information presented in Table 2A. It shows the good performance exhibited by the cortical synaptic circuits adapted and applied, as mentioned in Section 2.2, for the exploration of unstructured environments in their entirety (Ea). In addition, the performance of the automaton with and without meta-control network is shown in the error metrics in Tables 2B, C. The results illustrate that we obtain better performance with the implementation of this network. On average, the TurtleBot3 Burger's navigation speed was approximately 0.04m/s. By comparing with Miguel-Blanco and Manoonpong (2020) our exploration system is slow, similar to the one developed by Pardo-Cabrera et al. (2022).

A first approximation of the motor control of a mobile autonomous was proposed in Guerrero-Criollo et al. (2022). In that work, the input signals were simulated rather than being captured by a robust system. The meta-control network, which is responsible for detecting novelties, is also absent. In this work, we implement both the sensor part of the system that measures environmental data for inputs and the meta-control network. The bio-inspired network was implemented into the TurtleBot3 Burger embedded system. In this work, the design, simulation, and implementation of a bio-inspired neural network allows a differential robot to perform a safe exploration. An exploration task is defined as the behavior of traversing a terrain indefinitely while avoiding obstacles. Here, a framework is proposed to extract information from a LiDAR sensor that generates the input signals to the neural network online. Additionally, the implementation of a modulatory or meta-control network inspired by the basal ganglia is carried out. This network allows modulating the exploration behavior of the robot by reducing its speed progressively when drastic changes occur in the ro

留言 (0)

沒有登入
gif