Autonomous Robot Obstacle Avoidance

Stony Brook University

As a "mini project" for my Advanced Controls class at Stony Brook, I utilized MATLAB to plan a path for a "robot" to autonomously navigate through an obstacle course. There were several hard requirements for this project, which were as follows:

The path had to take the robot from the start point at (1,1) to the end point at (9,9) on a 10x10 grid
The path must fully avoid any obstacles placed on the grid by any user
The trajectory had to be smooth and avoid sharp turns

The rest was up to me! The very first thing to do for a project like this is break it down into smaller discrete tasks, that all build towards the end goal and can be completed one after the other. For this project these tasks were building the environment, generating the path, smoothing the path, then designing the observer and controller to avoid the obstacles.

Environment Creation

First things first, an obstacle course for the little robot to traverse through had to be designed. This was pretty straightforward, it just took a little while to to properly format the "obstacle" matrix so that it could be properly displayed in the output graphics. The rectangles are generated using the patch function; which takes the coordinates of two opposite corners as inputs and just "fills" between them. After some fiddling, I ended up with the environment shown to the right.

Path Generation

For actual path generation, there are two approaches that could be taken to get the robot through the path. The first being reinforcement learning, and the second being dynamic programming.

Reinforcement learning relies on the robot navigating through the maze with trial and error, as there's no initial data set to base decisions on. The robot must take an action to interact with the environment, and then receives either a positive reward (getting closer to the end point or avoiding an obstacle) or negative reward (hit an obstacle or wanders away from the end point), with the ultimate goal being to maximize the reward. Dynamic programming is a method that solves complex problems, like navigating the maze, by breaking it down into much simpler sub-problems and solving those piece by piece. The solutions for each piece are saved, so they can be referenced later (a process called memoization), which reduces computational expense. More importantly, this allows for previous solutions from each sub-problem to be analyzed, and then combined, to produce the best possible (or optimal) solution to the problem. Both methods are valid approaches for path generation in this case, but ultimately, reinforcement learning proved to be too computationally demanding for my poor computer so dynamic programming was utilized.

With dynamic programming, unlike reinforcement learning, there isn't really a reward system; instead, each potential motion is assigned a weight, which is also known as the "cost to go". The path is then determined by either minimizing or maximizing the cost to go. For this application, minimization was used, so each obstacle is assigned a high cost to go, and the overall cost decreases as the end point is approached. This is illustrated in the graphic to the left, where the end point, at (9,9), is the lowest point on the grid and every obstacle's value is higher than the start point at (1,1).

The algorithm stores the cost to go at each index on the grid, then references these values to determine the actual path the robot will follow to avoid the obstacles. This allowed for successful generation of a path that avoided the obstacles, as shown below!

Path Smoothing

Although the path was successfully generated, the turns taken are a bit too sharp and would be difficult for a ground or aerial vehicle to actually implement. A straightforward fix to this issue is to smooth the path by performing an optimization that minimizes changes in the path, which can be done once an objective function is defined.

The equation above (my objective function) is the first part of defining what is needed for an optimization alogrithm. xy represents a point in the orginal data set (calculated by the path generation algorithm) and XY represents a point in the data set generated by smoothing the trajectory. The first term minimizes the distance between the original and smoothed trajectory. The weighting factor, α, is applied to the second term and penalizes sharp turns in the smoothed trajectory.

In addition to the objective function, some constraints are needed to keep the optimization on the right track. The start and end point of the smoothed trajectory must be the same as the old trajectory to achieve the main task, and the smoothed trajectory still needs to avoid all the obstacles. Start and end point constraints are simple enough to define, but guaranteeing the robot doesn't collide with obstacles requires defining "buffers" around the obstacles. This is done by creating virtual obstacles, which are larger than the original obstacles, and then calculating a new path which avoids the buffers (and by definition, avoids the obstacles). The value of the buffers, as well as the penalty weight, were adjusted until feasible results were obtained.

Penalty Factor of 0.1

Increasing the penalty factor from 0 (base value without smoothing) to 0.1 starts reducing the sharpness of the path.

Penalty Factor of 0.5

A much more attainable turn for a physical vehicle to achieve can be seen by bumping up the penalty to 0.5

Buffer of 0.1 Units

Generated path with a buffer of 0.1 units added to the obstacles, and α = 0.5

Buffer of 0.3 Units

Generated path with a buffer of 0.3 units added to the obstacles, and α = 0.5. This completely changed the path the robot could take, which isn't desireable.

Observer & Controller Design

With the path generated and smoothed for easy implementation, it was time to actually design a controller to guide the robot along the path. Controller design here was done using state space representation, which is a mathematical model of a physical system using a set of input variables, output variables and state variables. The state variables of the system explicitly define values from inside the system itself which change over time, such as the force applied by a compressed spring or velocity of a point on a body, and are selected for the specific system.

In this particular problem, the actual states are not known, only the output indices (the x and y values determined by the path generation algorithm) are known. This poses a significant problem, as I wanted to use pole placement to control the robot; pole placement is a robust feedback control method that's straightforward to implement, but requires that the states of the system are known for all discrete points in time. A controller on it's own isn't enough here - the system also requires an observer. An observer provides the controller with estimates of the current states, based on the inputs to the system and the resulting outputs. In practical applications, it's often pretty difficult to directly measure the physical states of any system, but the outputs are fairly easily measured, making observers powerful tools in control systems.

To actually put each piece of the control system together and navigate through the maze, a powerful concept known as the separation principle was utilized. The separation principle allows for the design of the observer and controller to be decoupled from one another; the system itself can be controlled as if the states are known, while the states themselves can be estimated using an observer. This does require that the system is completely controllable and observable, otherwise something needs to be modified in the physical system itself (better/different sensors, more actuators, etc.). In this case, the system was fully observable and controllable, so the separation principle was usable!

Simulation Results

A full state observer was used here, mainly due to ease of implementation, as well as the fact that the error converged quickly enough that a reduced order observer was not required to avoid the obstacles. As seen in the plots below, the states and the estimates from the observer are very similar, and the error converges quickly. The error converged to zero in under 1 "second" (in rover time), and the first obstacle is not encountered until 1.5 seconds passes so the chosen observer was sufficient. However there were fluctuations in convergence, and peaking was observed, so future work should utilize a reduced order observer.

States q1 and q2 represent the x and y positions (respectively). States q3 and q4 represent the x and y velocities (respectively).

Peaking can be observed in the initial few moments, which is something to be avoided in the future!

While the estimates and error look good, one final check was performed to ensure the output from the observer was correct. The desired trajectory was compared to the estimated position values as generated by the observer; if the values mimic one another, the observer is functioning well and will provide accurate data to the controller. Based on the plots of the position data, once the observer error converged, the position is estimated as intended, so the design was sound! As mentioned earlier, this error could be partially corrected by using a reduced order observer, as well as adjusting the contoller and observer gain matrices.

Desired Trajectory Plot

Estimated Trajectory Plot