Thursday, March 19, 2009

Goto and Exceptions

It slipped quietly under the radar for some. For others, it raised quite a stir. No, I'm not talking about PHP's implementation of namespaces (a battle that's finally done and over much to everyone's relief). I'm referring to the infamous goto statement.

Many programmers have had the "goto is evil" mantra drilled into them from an early start. The basis of this can probably be traced back to Edsger Dijkstra's March 1968 letter to the editor in Communications of the ACM (though he wasn't the first to argue against goto). Dijkstra felt the proliferation of goto at the time was producing unmaintainable "spaghetti" code. Fast-forward more than 40 years later and the controversial feature is still alive and well... and about to make its debut appearance in PHP.

I found myself discussing some of the new features in PHP 5.3 with a friend a few days ago after he read my previous post about anonymous functions and closures. Our conversation eventually turned towards goto and whether it was possible or not to use it responsibly.

goto re-routes the execution flow of a program from one section of its code to another. Even the most basic control structures like if wouldn't exist in high-level languages like C, Java, and even PHP, if it weren't for some type of goto-like operation (high-level languages are either implemented in low-level languages like x86 Assembler which has the equivalent operation JMP, or are implemented in other high-level languages which in turn are implemented in low-level languages). Constructs like conditional statements, loops, and functions all ultimately reduce down to a goto operation.

goto itself isn't much of a necessity in high-level languages because dedicated constructs like conditional statements, loops, etc. are available to the programmer, and these are much easier for humans to work with than goto. Yet goto can still be useful under certain circumstances. For example, some programmers use goto to direct the execution flow to dedicated error-handling logic elsewhere in a program in languages that lack exception handling (such as C).

So is it possible to use goto responsibly in high-level languages, particularly in PHP? I've drawn the conclusions that it is indeed possible:
  • to emulate exception handling in a procedural environment (otherwise good, but not if the language supports offers real exception handling which PHP does)

  • to overcome some of the perceived limitations in the try/catch exception handling model often found in an OOP environment (useful and arguably good because the alternatives can lead to fragile code)
Let's start right in by comparing some sample code to decide if using goto to emulate exceptions is responsible use. (Everyone loves code, right?) Here's a function with the error-handling code interspersed throughout it:
// return an array of employees, else -1 on error
function retrieveEmployees()
{
// assume DB_HOSTNAME et al are defined elsewhere as
// constants
$db = mysql_connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD);
if ($db === false) {
echo "Unable to connect to database server.";
return -1;
}

$success = mysql_select_db(DB_SCHEMA);
if (!$success) {
mysql_close($db);
echo "Unable to select database.";
return -1;
}

$query = "SELECT id, last_name, first_name FROM employees
ORDER BY last_name ASC, first_name ASC";
$result = mysql_query($query, $db);
if ($result === false) {
mysql_close($db);
echo "Unable to execute query.";
return -1;
}

$employees = array();
while ($row = mysql_fetch_assoc($result)) {
$employees[] = $row;
}

mysql_close($db);
return $employees;
}
Now, here's the same function refactored to use goto. The actual error-handling code has been moved to the end of the function, where it is out of the way of a programmer trying to read through the function's code and to understand its purpose.
function retrieveEmployees()
{
$db = mysql_connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD);
if ($db === false) goto CONNECT_ERROR;

$success = mysql_select_db(DB_SCHEMA);
if (!$success) goto SCHEMA_ERROR;

$query = "SELECT id, last_name, first_name FROM employees
ORDER BY last_name ASC, first_name ASC";
$result = mysql_query($query, $db);
if ($result === false) goto QUERY_ERROR;

$employees = array();
while ($row = mysql_fetch_assoc($result)) {
$employees[] = $row;
}

mysql_close($db);
return $employees;

// possible errors
CONNECT_ERROR:
echo "Unable to connect to database server.";
return -1;

SCHEMA_ERROR:
mysql_close($db);
echo "Unable to select database.";
return -1;

QUERY_ERROR:
mysql_close($db);
echo "Unable to execute query.";
return -1;
}
The use of goto in this case makes it arguably easier to follow the logic and understand the purpose of retrieveEmployees() because you no longer have to visually sift through code chafe to find the proverbial wheat.

Incidentally, this is the same goal of exceptions-- move error handling to make the code easier to understand. PHP has supported exceptions since the nascent days of PHP 5, so let's refactor the code again. This time I'll make use of exceptions.
// extend Exception class
class ConnectErrorException extends Exception { }
class SchemaErrorException extends Exception { }
class QueryErrorException extends Exception { }

function retrieveEmployees()
{
try {
$db = mysql_connect(DB_HOSTNAME, DB_USERNAME,
DB_PASSWORD);
if ($db === false) throw new ConnectErrorException();

$success = mysql_select_db(DB_SCHEMA);
if (!$success) throw new SchemaErrorException();

$query = "SELECT id, last_name, first_name FROM
employees ORDER BY last_name ASC, first_name ASC";
$result = mysql_query($query, $db);
if ($result === false) throw new QueryErrorException();

$employees = array();
while ($row = mysql_fetch_assoc($result)) {
$employees[] = $row;
}

mysql_close($db);
return $employees;
}
catch (ConnectErrorException $e) {
echo "Unable to connect to database server.";
return -1;
}
catch (SchemaErrorException $e) {
mysql_close($db);
echo "Unable to select database.";
return -1;
}
catch (QueryErrorException $e) {
mysql_close($db);
echo "Unable to execute query.";
return -1;
}
}
Comparing the second and third iterations of retrieveEmployees(), one could argue that using goto results in a cleaner syntax than actual exception handling. With goto, the programmer doesn't need to extend the base Exception class, position the code in question within a try/catch blocks, and make sure all of the braces were matched correctly. Instead, the programmer only has to provide a target label and the actual goto call.

Without the overhead of instantiating an Exceptionobject, there's also a slight performance boost. The results of a highly (un-)scientific benchmark I ran to compare goto error-handling against exceptions using the above examples showed goto is a little over 4% faster.

Unfortunately, those benefits are meager when you take a closer look and see the drawbacks of using goto. Exceptional events allow the programmer to signal than an error occurred, but goto is a direct jump to the error handling logic. Each label must be unique. The programmer may soon find himself duplicating code and devising creative label names to handle the same type of errors in slightly different ways. Another drawback is that PHP requires the label to be within the same scope as the goto call. This means the following code will generate a fatal error:
function generateError()
{
goto GENERATED_ERROR;
}

GENERATED_ERROR:
die("Oops!");
Exceptions can offer more flexibility since the thrown exception bubbles up the call stack until it is caught by a suitable catch block:
class GeneratedErrorException extends Exception { }

function generateError()
{
throw new GeneratedErrorException();
}

try {
generateError();
}
catch (GeneratedErrorException $e) {
die("Oops!");
}
Exceptions were designed explicitly for the purpose of error-handling. There's only one way to get into the error-handling code of a catch block-- by a throw call. The code that follows a goto label on the other hand can be executed as part of the normal flow of execution, or as a backwards redirect. Exceptions have their own dissidents, but even so they don't have nearly the bad rap that goto does. Using goto will probably draw the wrath of many anti-goto programmers upon you faster than a speeding photon, while using exceptions will win you job interviews.

The conclusion for my first talking point is that using goto to emulate exceptions is useful, but it's better to use exceptions instead of faking things if the language provides them (which PHP does).

Exceptions are well suited to the event-driven execution model where the execution flow is determined by events (such as the user clicking on a graphical interface component or pressing a key on the computer). The runtime loop waits for something to happen, possibly for an infinite amount of time, and then executes the logic assigned by the programmer to an appropriate action when it detects an event. If there is a problem that will prevent the action from being completed successfully then an exception can handle it, terminate that thread, and return the flow to the main wait-state. Users can fix the problem and try again without the program terminating.

Unfortunately, PHP doesn't enjoy this execution model, and it's been my experience that exceptions are just as easy to abuse when used in batch/imperative models as goto is. All too often a programmer will catch the exception and then simply terminate the program.
class FileAccessException extends Exception { }

$filename = "test.txt";

try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException("cannot open $filename");
}
}
catch (FileAccessException $e) {
die("Error: " . $e->getMessage());
}
Is this really effective error handling? It might be if the script is processing a batch job or generating a web page. There is nothing the requestor can do in those cases to address whatever it is that caused the error, and the only reasonable course of action is to gracefully terminate the script. But what if the script were running interactively from a command prompt? PHP is primarily used to generate web content, but as more and more people realize its flexibility and the benefits of reducing the number of programming language across all parts of an application, CLI scripting with PHP is becoming more popular. In that case, a more proper action for the script to take would be to inform its users what the error was and suggest some possible steps they can take to resolve it.
// return whether the script is allowed read access ("r"),
// or write access ("w") a file
function testPermission($filename, $access = "r")
{
clearstatcache();
list(,,$mode,,$uid,$gid) = stat($filename);
$perms = array("u_r" => (bool)($mode & 0400),
"u_w" => (bool)($mode & 0200),
"g_r" => (bool)($mode & 0040),
"g_w" => (bool)($mode & 0020),
"o_r" => (bool)($mode & 0004),
"o_w" => (bool)($mode & 0002));
list($user) = posix_getpwuid($uid);
$group = posix_getgrgid($gid);

list($eUser) = posix_getpwuid(posix_geteuid());
$isUser = ($user == $eUser);
$isGroup = in_array($eUser, $group["members"]);
$isOther = !($isUser || $isGroup);

return (($isUser && $perms["u_" . $access]) ||
($isGroup && $perms["g_" . $access]) ||
($isOther && $perms["o_" . $access]));
}

try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException();
}
}
catch (FileAccessException $e) {
if (!file_exists($filename)) {
die("$filename does not exist. Please create the " .
"file.\n");
}
else {
if (!testPermission($filename, "r")) {
die("Please check read permissions on " .
"$filename.\n");
}
else {
die("Unknown error attempting to access " .
"$filename.\n");
}
}
}
The code is a bit smarter now about the error and offers users some guidance as to what needs to be done to resolve it, but the program still terminates. Not all exceptions should be fatal. In this case, it would be better for the code to give the users an opportunity to fix the error and then try to re-read the file instead of forcing them to start the program over again.

Whereas in event-driven execution users can simply retry the action, there is no clean way to retry the code that triggered the error procedurally. One possibility is to surround the action in a do/while loop.
// prompt the user whether to retry an action
function promptRetry()
{
do {
echo "Type 'R' to retry or 'Q' to quit: ";
$retry = strtoupper(trim(fread(STDIN, 2)));
if ($retry == "R") {
return true;
}
else if ($retry == "Q") {
return false;
}
else {
echo "Invalid entry. ";
}
}
while (true);
}

do {
$retry = false;
try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException();
}
}
catch (FileAccessException $e) {
if (!file_exists($filename)) {
echo "$filename does not exist. Please create " .
"the file.\n";
}
else {
if (!testPermission($filename, "r")) {
echo "Please check read permissions on " .
"$filename.\n";
}
else {
echo "Unknown error attempting to access " .
"$filename.\n";
}
}
if (!promptRetry()) exit();
$retry = true;
}
}
while ($retry);
I don't like how the $retry variable is set within the catch block to trigger the reiteration of the surrounding do/while loop. It seems a bit fragile to set variables within catch blocks in order to influence the behavior of code outside the block. Instead, the example can be refactored to use goto. This eliminates having to keep track of $retry altogether and just redirects the execution flow itself.
TRY_OPEN_FILE:
try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException();
}
}
catch (FileAccessException $e) {
if (!file_exists($filename)) {
echo "$filename does not exist. Please create " .
"the file.\n";
}
else if (!testPermission($filename, "r")){
echo "Please check write permissions on " .
"$filename.\n";
}
else {
echo "Unknown error attempting to access " .
"$filename.\n";
}
if (!promptRetry()) exit();
goto TRY_OPEN_FILE;
}
The conclusion on my second talking point was that goto can be used to overcome some of the perceived limitations of working with exceptions in a batch/procedural execution model. Without the support of an event-driven execution model, and without a dedicated retry-type statement, programmers need to resort to looping constructs. The code that results can be brittle and difficult to maintain over time. goto offers an eloquent and succinct alternative.

13 comments:

  1. Very cool Tim! I always appreciate your ability to think through concepts and best practices yourself and not take tips from anybody (even Dijkstra, though :-p ?) until thoroughly challenged and discussed.

    That said...: It seems you're making two points, the first of which even you concede is not best practice (swapping exceptions with goto implementation). The second argument, however, is intriguing.

    If scoped appropriately, the code in the final snippet is sexy. It's readable, clean and concise. My only question would be: can the same logic flow be implemented without the use of goto. I'd argue yes and a good example would be your second to last snippet. I think even that snippet could be optimized but I'd argue that even if it looks uglier than the final it's ultimately better than using goto.

    In closing: Dijkstra's smarter than you :-p

    Keep challenging! This is good discourse!!

    ReplyDelete
  2. I'm glad you appreciate the discourse!

    I was a bit nervous putting this out there honestly in fear of someone reading it wrong and then getting me blacklisted from all the cool PHP conferences and chat rooms because I wrote "favorably" about goto. But I wanted to do something different-- not the same old "goto is evil" mantra. Best practices are in place for a reason, but that doesn't mean we should blindly follow an idea and not at least think about the hows and whys.

    Yes, the first point is using goto to fake exceptions would be good if PHP didn't already provide exception handling. There is more to exception handling than just jumping to a section of code, and the actual exception handling mechanism provided by a language should be used if one is available since they are "safer" and more flexible.

    The second point is that goto can be used responsibly to augment the existing exception handling mechanism in PHP. You can kill stop an event if there was an error when you're running in an event-driven enironment without terminating the entire application. But PHP is procedural, and too many people just catch the error and kill the entire script. That's the equivalent of you're web browser closing itself every time you run across a 404.

    Of course retrying a section of code that threw an exception can be accomplished without using goto, but there's an interplay between the outter loop and setting the loops sentinel value from within the catch block I don't like. My sample is just a short code sample, but over a large section of code with a lot of catch statements its too easy to lose track of the sentinel.

    If you can provide a cleaner alternative to retrying a block without the brittleness of a do/while loop or the impurity of goto, I'd be happy to ponder it.

    ReplyDelete
  3. Well met, colleague. I guess one thing that I'm having a hard time understanding is how the implementation with goto (final code snippet) is *not* event driven. Obviously, retrying a section of code that failed with a given input with the same input will produce the same result, so invariably, the use case will be event driven (asking for user input) even if just via CLI.

    ReplyDelete
  4. As an example of catching exception the right way, I work with Zend Framework where if an exception is thrown it is subsequently catched in the front controller that displays a nice error page (defined as a view in error controller), and with a little work even wrapping the error message with the common layout of pages.

    ReplyDelete
  5. Hi Tim

    This subject has been discussed in the PHP community for almost 5 years, and in the programming community for almost 30. And here we go again...

    I believe this post will do more harm than good. In a way, you are encouraging new developers to use GOTO, by justifying the unjustifiable. Cleaner syntax? Limitations with the object model? Performance boost? No, no, no. Please don't. PHP programmers should NOT use GOTO.

    It's not clear what limitations you see in the object model. The last example doesn't prove your point, it confuses people even more. While some people argue that GOTO is useful for error handling when writing procedural, I don't know any Java or PHP programmer that uses GOTO to overcome "limitations" in the exception handling model. Some of your examples are wrong, others can be refactored.

    So what's the next excuse: "GOTO's can be very handy if you are writing a... parser?". If you are using GOTO in PHP, well, then you are probably doing something wrong. But if you are writing a parser in PHP, then you are doing everything wrong.

    > CLI scripting with PHP is becoming more popular.

    Is it? Last time I checked, all the sys admins are still using Perl and Python. And this trend has continued to grow in the last 10 years.

    > Another drawback is that PHP requires the label to be within the same scope as the goto call.

    You see this as a drawback? This is the only thing that can save your company from spending time and money developing a software that is unmaintainable.

    > All too often a programmer will catch the exception and then simply terminate the program.

    How often? I've never seen something like that in my entire life. You can't use that as an excuse, or to say that exceptions are easy to abuse. Everything is easy to abuse when it comes to programming languages. The only limit is your knowledge and experience (or your imagination).

    Your last procedural example, the TRY_OPEN_FILE, is a clear example of why you shouldn't use goto as opposed to if/then and do/while. What you did there has a name, and I believe it's called: spaghetti code.

    > But I wanted to do something different-- not the same old "goto is evil"

    Well, maybe the reason why people, including award winning computer scientist, keep saying that GOTO is evil, believe it or not, it's because it really is :)

    ReplyDelete
  6. Ultimately I agree with you, Federico, about the danger of goto and that there is no reasonable application of it in high level languages. I disagree that somehow a discourse like this would confuse a programmer that's worth their weight. If this were a forum for a 100 level course in university, I'd agree. If we stop challenging even the most seemingly axiomatic of our assumptions, however, we'd never improve as an individuals nor as an industry.

    ReplyDelete
  7. I'm sure there is a good reason to use GOTO somewhere sometime in PHP code.

    If nothing else, if Sara needed it, then whatever she was doing is probably the best example.

    It may not fit well in a blog post, but there it is.

    I don't think these examples make a good use-case of goto, however. Sorry.

    ReplyDelete
  8. I much prefer a nice "goto label" over a "break 4" out of some nasty nested logic. It is hard to work backwards to see exactly where that break will end up, but with a label you can see it easily. As long as you can't goto labels outside of your scope and you can goto a spot inside a deeper construct, which is how I understand PHP goto is implemented, I don't see the issue at all. Just another handy tool for the pragmatic PHP programmer.

    ReplyDelete
  9. Goto can be handy and useful! i like its implementations and how easy to use!

    but goto is unlike other methods where it got a specific structure that i can trace easily!

    so the main problem is that its going to effect the language and its developers in long terms!

    how? because it will effect the way programmer minds working in accomplishing their work when they use goto!

    so at the end, we may end up with programmers making their job done in messy way!

    which is of course Not desired by any client or customer who wants an application to be maintainable!
    and this will make clients avoid language all together for the sake of maintainablity!

    ReplyDelete
  10. Your very first example (the one that didn't use goto) was complete spaghetti code with return statements scattered all through it. I stopped reading at this point, it's hard to trust somebody on the usefulness of goto when their non goto containing code is a mess.

    I might come in the future and overlook your mistakes to see if you have a point somewhere in there, but right now I feel I would be only wasting my time by reading any further.

    ReplyDelete
  11. Return statements are Goto statements. It's simply another way of saying:

    goto exit_label;

    That was inherited from C language, unfortunately, since people didn't pay attention to the way Pascal handled functions. A function in pascal is like a variable that you could assign a value to.
    function bar: integer;
    begin
    bar := YourResultHere;
    end;

    ReplyDelete
  12. I read your first example and stopped reading, it appeared as though you purposefully coded it poorly to show how goto's are better, write that first example without using multiple return statements, use if/elseif/else statements and an enum or equivalent for storing which error occured and you'll find its cleaner than the goto version and much easier to follow and not fragile at all. Some developers prefer multiple return statements over using if/elses, their code might LOOK arguably more AESTHETIC, but its not cleaner and its harder to maintain.

    ReplyDelete
  13. Tim,

    Isn't it more elegant if you had used a "break" statement in your second to last example?

    If you had put a "break" statement in the "try" block, then you wouldn't need any "retry" variable to keep track.

    You could use a "for(;;)" or a "while(true)" loop, and insert a break if it succesfully opens a file.

    ReplyDelete