Sunday, June 28, 2009

Currying in PHP

What happens if you don't have all the arguments handy for a function, but you want to give whatever arguments you do have now and then provide the rest of them to the function later? This is called currying, and is a core concept in functional programming. It's messy, but possible to curry functions in PHP now that closures have been added.

First, let me show you how currying looks in a functional language. Here's a basic example in OCaml/F#:
let do_math op x y =
match op with
'+' -> x + y
| '-' -> x – y
| _ -> failwith "Invalid op"

let add = do_math '+'

let inc = add 1
let dec = add (-1)
;;
A function named do_math is defined that accepts an operator and two operands. The function's return value will be either the sum or difference of the operands, depending on whether the given operator is + or -. Notice how do_math is then called with a single argument. OCaml doesn't raise an error; it simply returns a function that "remembers" the first argument and accepts the remaining two arguments later (this is an over-simplified and slightly inaccurate statement, but a good enough description for our purpose here). This intermediate function can be used elsewhere, as in the bindings for inc and dec.

Now here's a version of the do_math() function in PHP:
function do_math($op, $x, $y) {
switch ($op) {
case '+':
return $x + $y;

case '-':
return $x - $y;

default:
throw new Exception("Invalid op");
}
}
Unfortunately, PHP will throw warnings if you call do_math() without the three arguments it expects.

Warning: Missing argument 2 for do_math(), called in /home/tboronczyk/curry.php on line 16 and defined in /home/tboronczyk/curry.php on line 2

Warning: Missing argument 3 for do_math(), called in /home/tboronczyk/curry.php on line 16 and defined in /home/tboronczyk/curry.php on line 2


Whereas functional languages have currying "built-in," you must explicitly code this ability in an imperative language. Doing so in PHP requires the use of closures:
function do_math($op) {
return function ($x) use ($op) {
return function ($y) use ($op, $x) {
switch ($op) {
case "+":
return $x + $y;

case "-":
return $x - $y;

default:
throw new Exception("Invalid op");
}
};
};
}
It's also possible to extend the function using func_num_args() and func_get_arg() functions, anonymous functions, and closures, so that any number of parameters can be given at a time.
function do_math() {
if (func_num_args() >= 1) $op = func_get_arg(0);
if (func_num_args() >= 2) $x = func_get_arg(1);
if (func_num_args() == 3) $y = func_get_arg(2);

switch (func_num_args()) {
case 1:
return function () use ($op) {
if (func_num_args() >= 1) $x = func_get_arg(0);
if (func_num_args() == 2) $y = func_get_arg(1);

switch (func_num_args()) {
case 1:
return function ($y) use ($op, $x) {
return do_math($op, $x, $y);
};

case 2:
return do_math($op, $x, $y);

default:
trigger_error(
"invalid argument count",
E_USER_WARNING);
}
};

case 2:
return function ($y) use ($op, $x) {
return do_math($op, $x, $y);
};

case 3:
switch ($op) {
case "+":
return $x + $y;

case "-":
return $x - $y;

default:
throw new Exception("Invalid op");
}

default:
trigger_error("invalid argument count",
E_USER_WARNING);
}
}
It's messy... but it works! Now you are able to pass one or two arguments to do_math(), capture the intermediate function that's returned, and pass the remaining argument(s) later.
$add = do_math("+");

$inc = $add(1);
$dec = $add(-1);

echo do_math("-", 3, 2);
echo do_math("+", 1, 1);
echo $inc(2);
echo $add(2, 2);
echo $dec(6);
echo $add($inc(4), $dec(2));
The switch statements are rather unmanageable and the spaghettification of code grows exponentially with the addition of each argument. This pattern is straight forward, though. You may want to consider writing a code generator to handle the dirty work of retrofitting a function to one capable of being curried rather than writing them all by hand. Of course, if you know of a better way to curry functions in PHP then let me know by leaving a comment!

Update 06/29/09: Someone asked me what the "real-world use" for all this would be. Currying is used all the time in functional programming, but the hassle of explicitly enabling the behavior in PHP makes that a valid question. My motivation was just to see if it were possible and share my results. Indeed it is. Functions can be curried in any language that supports closures. But for those who want something a little more concrete, let's consider callback functions.

In a previous post I gave the following example to illustrate the use of closures:
$userPercent = 0.5;
$userList = array_filter($percentVowels,
function($percent) use ($userPercent) {
return ($percent >= $userPercent);
});
It showed an anonymous function being used with array_filter() to filter an array. The array is filtered based on a dynamic value, and a closure is used to "inject" the threshold rather than using a global statement. The same could also be accomplished with currying.

The problem is array_filter() expects a callback function that accepts one argument--the current array element. Currying will allow us to prepare the function with the sorting threshold, and the intermediate function can be used as the callback.
function callback($userPercent) {
return function($percent) use ($userPercent) {
return ($percent >= $userPercent);
};
}
$userList = array_filter($percentVowels, callback(0.5));

Sunday, June 14, 2009

Kember Identity

Ever wonder if there is an MD5 hash the same as the original input? Nope, me neither. But Mr. Kember does and he's asking the world to help him find out if such a thing exists. There's no fame if you find it for him (he's humbly named it the "Kember Identity" already)—but you might make a little cash. Check out his web page for the details. Go ahead and enter his contest if you're feeling gullible lucky!

The MD5 algorithm returns a fixed-length 128-bit hash, so there are 2128 possible values. The hash is typically expressed as a series of 32 hexadecimal values. Since the input string and its hash must be the same to reflect the Kember Identity, you wouldn't need to test random strings like "ruby on rails rots your brain"; you only need to test strings that are 32-characters long and contain the numbers 0 though 9 and letters a through f like 8d112b3c68248c12f178188c1b921ec1.

Kember suggests testing values at random because the range of candidates is so large (2128 is 34,028,236,692,093,846,346,337,460,743,177). Unfortunately, there're a few problems with this approach:It actually takes less time to test all values sequentially than through random-selection.

Additionally, one has to consider the possibility that such a value doesn't exist. The odds of finding the Kember Identity are actually quite small: 1/((2128!)/( 2128!)(1-2128)!). So how would you know when all possible values have been tested proving the Kember identity doesn't exist if the values are tested randomly? You don't.

The only reliable way to programmatically identify whether the Kember Identity exists and what hashes exhibit it is to test each hashes sequentially and record the results.

The whole thing might not bother me if money wasn't involved. Just send Mr. Kember your $5 entry fee and you're eligible to win the prize pot if your script is first to find the magical hash! But I have a few questions:
  • How do I contact Mr. Kember to receive my prize when I find a hash that exhibits the Kember Identity?

  • What happens to my $5 and the rest of the prize money if it is proven the Identity doesn't exist?

  • At 60-million hashes an hour, it would take over 646,987,670,262,051,588,140,743 millennia to verify them all. How long does Mr. Kember plan on holding on to the prize money?
While it might not be a scam (it says explicitly that it's not a scam somewhere on the irrationally highlighted contest page), it isn't well thought out.

Thursday, June 11, 2009

What's Wrong with OOP

Proponents of Object Oriented Programming feel the paradigm yields code that is better organized, easier to understand and maintain, and reusable. They view procedural programming code as unwieldy spaghetti and embrace OO-centric design patterns as the "right way" to do things. They argue objects are easier to grasp because they model how we view the world. If the popularity of languages like Java and C# is any indication, they may be right. But after almost 20 years of OOP in the mainstream, there's still a large portion of programmers who resist it. If objects truly model the way people think of things in the real world, then why do people have a hard time understanding and working in OOP?

I suspect the problem might be the focus on objects instead of actions. If I may quote from Steve Yegge's Execution in the Kingdom of Nouns:

Verbs in Javaland are responsible for all the work, but as they are held in contempt by all, no Verb is ever permitted to wander about freely. If a Verb is to be seen in public at all, it must be escorted at all times by a Noun.

OOP focuses primarily on the object and expresses actions in terms of the object's abilities. A airplane object can be flown (Airplane.fly()). A door object can be opened (Door.open()). But we really don't view the world in terms of objects and what actions can be done to them. It's backwards. We view the world in terms of ourselves and our abilities. We are the ultimate object. (And no, I don't mean a God object.)

Imagine you are returning from a trip to the local flower garden. Would you say "I smelled the flowers" or "The flowers were smelled by me?" Now you want to tell a friend to go and smell the flowers. Would you say "Go and smell the flowers" or "The flowers must be smelled by you?" When we convey instructions, we give them in terms relative to ourselves. What is programming but conveying instructions to a computer process how some sort of work should be done on our behalf.

How to make a Peanut Butter Sandwich:
  • Get Jar of Peanut Butter

  • Get Loaf of Bread

  • Get Knife

  • ...
All of these things (jar of peanut butter, bread, and knife) can be thought of as objects.
class PeanutButterJar extends Jar ...
Express them as such and they take on methods.
PeanutButterJar.open()
Knife.spread(PeanutButter, Bread)
Whoa! In my world, peanut butter doesn't do anything but sit there and taste yummy. And I'd start looking for a good exorcist the moment a knife starts performing actions all by by itself. A more realistic transcription of the instructions would be:
You.open(PeanutButterJar)
You.spread(Knife, PeanutButter, Bread)
Specifying You as a universal object would seem rather redundant as the scenario progressed, so a good language designer would remove the need to expressly identify it. This would yield
open(PeanutButterJar)
spread(Knife, PeanutButter, Bread)
which starts to look vaguely procedural.

The truth is, procedural and OOP paradigms languages express the complexity of their problem space in different ways. Procedural code is flat and wide with functions. OO code is hierarchical with inheritance. OO-code is not inherently better organized than procedural code merely because it is encapsulated in objects. A reusable library can consist of functions just as easily as a collection of objects.

The way some OOP languages (like Java and C#) force objects on the programmer borders on the absurd. If I'm writing a library of reusable code that needs to maintain its own state, then of course writing classes with proper encapsulation and dating hiding makes sense. On the other hand, If I'm generating a web page with some data stored in a database, then some procedural code and a handful of function calls makes more sense. One of the things I like about PHP so much is that it allows the programmer to decide which paradigm is best suited for the task at hand.

Sadly though, that decision isn't left to the programmer who has been tasked with developing and maintaining a system. Management can tend to focus too much on buzzword compliance. A procedural programming language designed today would never receive wide-spread adoption if it didn't offer some sort of OO construct despite both paradigms having produced successful libraries and applications. And the programmers that don't learn to think beyond themselves will be unfairly left in the dust.

Tuesday, April 28, 2009

Death Knell for MySQL

Someone asked me, "What do you think about the Oracle/Sun buyout as it pertains to MySQL?" Well, since you're asking...

I thought it was bad for MySQL when Sun bought them despite what others were saying at the time. It turns out I was right. I think Oracle will be worse, and this time the blogosphere are saying it'll probably be bad. Now the question is, just how bad will it be? Here's my predictions:
  • I'm sure Oracle realizes they need to tread lightly on the subject of MySQL or else risk the wrath of the open source community. They may integrate some of MySQL to improve Oracle, but they won't promote the continued development of MySQL proper (Berkeley DB anyone?). That is, Oracle won't actively kill MySQL, but they'll let continue to languish the slow and painful death that began before Sun came along. I don't see a financial benefit to Oracle for keeping MySQL healthy. If MySQL does survive, it might be branded as "Oracle Lite."

  • Core developers will continue work on MySQL in the form of Drizzle, a fork based on MySQL 6.0. Drizzle's focus is on refactoring the MySQL code base and scaling down the feature set-- views, triggers, stored procedures, etc. will be available through modules but not in the core-- to providing a fast and efficient RDBMS for web-based and distributed applications. Drizzle will become very popular as a MySQL alternative for dedicated community members and web developers, and enterprise users who require a larger feature set will migrate to PostgreSQL (and Pythonistas rejoice en masse).
If a commercial company buys control of an open-source project, but then the project's community and core developers fork the codebase and continue development, then the company has effectively only purchased rights to a particular branch. It's legal, but it's not a palatable situation for commercial corporations who might be looking to buy up open source applications. I doubt we'll see Oracle starting a SCO-like court battle over MySQL... but we sure are living in interesting times. Welcome to the era of new law.

I'm primarily a PHP developer so I'll most likely migrate to Drizzle if and when that time comes. A lot of what I do could probably be done with SQLite, but I don't particularly care for the way SQLite does some things. That's another story for another day...

Wednesday, April 1, 2009

Certification Failure

Some employers look favorably on certifications, or even require them; other employers could care less. Some people are certified in something but clueless when it comes to actually using the technology. Some people get certifications like they're going out of style just because they can. Some people cheat on the exam. So how much stock should one put in certifications? I'm not sure I know the answer to that. I guess it depends on the certification, what the testing environment is like, who runs the certification program, etc.

Today I ran across PHP-Rocks during my daily web-surfing. It's a small site that offers a set of tutorials ranging from beginner up to advanced, and a PHP "certification" exam. The exam piqued my interest. It was free to take, and I was curious as to what type of questions it asked, so I signed up. Of course I often sign up a dummy account and fake email address when I do such things because I don't intend on becoming a regular visitor to the site, nor do I care to be placed on some spam mailing list. I chose "Joe Biteme" as my name for this excursion.

I answered randomly, not taking the exam seriously (like I said, I was more interested in what type of questions they were asking rather than actually getting their "certification"). I utterly failed it with a miserable 26.6667%! But I figure if they don't feel guilty about offering me the opportunity to pay them $5 to email me the certificate for a failed exam, then I probably shouldn't feel guilty about making a mockery of their exam process (and perhaps even the exam itself) by registering a fake identity and answering randomly.

Click on the image below to enlarge it and you'll see I successfully completed the PHP developer exam with a fail!


In full disclosure, yes I took (and passed) the Zend Certified Engineer exam for PHP5 offered by Zend, and yes I took it much more seriously than I did PHP-Rock's exam. Also, it's not my purpose to single out a particular web site... I just found their snafu too humorous not to share.

Thursday, March 19, 2009

Goto and Exceptions

It slipped quietly under the radar for some. For others, it raised quite a stir. No, I'm not talking about PHP's implementation of namespaces (a battle that's finally done and over much to everyone's relief). I'm referring to the infamous goto statement.

Many programmers have had the "goto is evil" mantra drilled into them from an early start. The basis of this can probably be traced back to Edsger Dijkstra's March 1968 letter to the editor in Communications of the ACM (though he wasn't the first to argue against goto). Dijkstra felt the proliferation of goto at the time was producing unmaintainable "spaghetti" code. Fast-forward more than 40 years later and the controversial feature is still alive and well... and about to make its debut appearance in PHP.

I found myself discussing some of the new features in PHP 5.3 with a friend a few days ago after he read my previous post about anonymous functions and closures. Our conversation eventually turned towards goto and whether it was possible or not to use it responsibly.

goto re-routes the execution flow of a program from one section of its code to another. Even the most basic control structures like if wouldn't exist in high-level languages like C, Java, and even PHP, if it weren't for some type of goto-like operation (high-level languages are either implemented in low-level languages like x86 Assembler which has the equivalent operation JMP, or are implemented in other high-level languages which in turn are implemented in low-level languages). Constructs like conditional statements, loops, and functions all ultimately reduce down to a goto operation.

goto itself isn't much of a necessity in high-level languages because dedicated constructs like conditional statements, loops, etc. are available to the programmer, and these are much easier for humans to work with than goto. Yet goto can still be useful under certain circumstances. For example, some programmers use goto to direct the execution flow to dedicated error-handling logic elsewhere in a program in languages that lack exception handling (such as C).

So is it possible to use goto responsibly in high-level languages, particularly in PHP? I've drawn the conclusions that it is indeed possible:
  • to emulate exception handling in a procedural environment (otherwise good, but not if the language supports offers real exception handling which PHP does)

  • to overcome some of the perceived limitations in the try/catch exception handling model often found in an OOP environment (useful and arguably good because the alternatives can lead to fragile code)
Let's start right in by comparing some sample code to decide if using goto to emulate exceptions is responsible use. (Everyone loves code, right?) Here's a function with the error-handling code interspersed throughout it:
// return an array of employees, else -1 on error
function retrieveEmployees()
{
// assume DB_HOSTNAME et al are defined elsewhere as
// constants
$db = mysql_connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD);
if ($db === false) {
echo "Unable to connect to database server.";
return -1;
}

$success = mysql_select_db(DB_SCHEMA);
if (!$success) {
mysql_close($db);
echo "Unable to select database.";
return -1;
}

$query = "SELECT id, last_name, first_name FROM employees
ORDER BY last_name ASC, first_name ASC";
$result = mysql_query($query, $db);
if ($result === false) {
mysql_close($db);
echo "Unable to execute query.";
return -1;
}

$employees = array();
while ($row = mysql_fetch_assoc($result)) {
$employees[] = $row;
}

mysql_close($db);
return $employees;
}
Now, here's the same function refactored to use goto. The actual error-handling code has been moved to the end of the function, where it is out of the way of a programmer trying to read through the function's code and to understand its purpose.
function retrieveEmployees()
{
$db = mysql_connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD);
if ($db === false) goto CONNECT_ERROR;

$success = mysql_select_db(DB_SCHEMA);
if (!$success) goto SCHEMA_ERROR;

$query = "SELECT id, last_name, first_name FROM employees
ORDER BY last_name ASC, first_name ASC";
$result = mysql_query($query, $db);
if ($result === false) goto QUERY_ERROR;

$employees = array();
while ($row = mysql_fetch_assoc($result)) {
$employees[] = $row;
}

mysql_close($db);
return $employees;

// possible errors
CONNECT_ERROR:
echo "Unable to connect to database server.";
return -1;

SCHEMA_ERROR:
mysql_close($db);
echo "Unable to select database.";
return -1;

QUERY_ERROR:
mysql_close($db);
echo "Unable to execute query.";
return -1;
}
The use of goto in this case makes it arguably easier to follow the logic and understand the purpose of retrieveEmployees() because you no longer have to visually sift through code chafe to find the proverbial wheat.

Incidentally, this is the same goal of exceptions-- move error handling to make the code easier to understand. PHP has supported exceptions since the nascent days of PHP 5, so let's refactor the code again. This time I'll make use of exceptions.
// extend Exception class
class ConnectErrorException extends Exception { }
class SchemaErrorException extends Exception { }
class QueryErrorException extends Exception { }

function retrieveEmployees()
{
try {
$db = mysql_connect(DB_HOSTNAME, DB_USERNAME,
DB_PASSWORD);
if ($db === false) throw new ConnectErrorException();

$success = mysql_select_db(DB_SCHEMA);
if (!$success) throw new SchemaErrorException();

$query = "SELECT id, last_name, first_name FROM
employees ORDER BY last_name ASC, first_name ASC";
$result = mysql_query($query, $db);
if ($result === false) throw new QueryErrorException();

$employees = array();
while ($row = mysql_fetch_assoc($result)) {
$employees[] = $row;
}

mysql_close($db);
return $employees;
}
catch (ConnectErrorException $e) {
echo "Unable to connect to database server.";
return -1;
}
catch (SchemaErrorException $e) {
mysql_close($db);
echo "Unable to select database.";
return -1;
}
catch (QueryErrorException $e) {
mysql_close($db);
echo "Unable to execute query.";
return -1;
}
}
Comparing the second and third iterations of retrieveEmployees(), one could argue that using goto results in a cleaner syntax than actual exception handling. With goto, the programmer doesn't need to extend the base Exception class, position the code in question within a try/catch blocks, and make sure all of the braces were matched correctly. Instead, the programmer only has to provide a target label and the actual goto call.

Without the overhead of instantiating an Exceptionobject, there's also a slight performance boost. The results of a highly (un-)scientific benchmark I ran to compare goto error-handling against exceptions using the above examples showed goto is a little over 4% faster.

Unfortunately, those benefits are meager when you take a closer look and see the drawbacks of using goto. Exceptional events allow the programmer to signal than an error occurred, but goto is a direct jump to the error handling logic. Each label must be unique. The programmer may soon find himself duplicating code and devising creative label names to handle the same type of errors in slightly different ways. Another drawback is that PHP requires the label to be within the same scope as the goto call. This means the following code will generate a fatal error:
function generateError()
{
goto GENERATED_ERROR;
}

GENERATED_ERROR:
die("Oops!");
Exceptions can offer more flexibility since the thrown exception bubbles up the call stack until it is caught by a suitable catch block:
class GeneratedErrorException extends Exception { }

function generateError()
{
throw new GeneratedErrorException();
}

try {
generateError();
}
catch (GeneratedErrorException $e) {
die("Oops!");
}
Exceptions were designed explicitly for the purpose of error-handling. There's only one way to get into the error-handling code of a catch block-- by a throw call. The code that follows a goto label on the other hand can be executed as part of the normal flow of execution, or as a backwards redirect. Exceptions have their own dissidents, but even so they don't have nearly the bad rap that goto does. Using goto will probably draw the wrath of many anti-goto programmers upon you faster than a speeding photon, while using exceptions will win you job interviews.

The conclusion for my first talking point is that using goto to emulate exceptions is useful, but it's better to use exceptions instead of faking things if the language provides them (which PHP does).

Exceptions are well suited to the event-driven execution model where the execution flow is determined by events (such as the user clicking on a graphical interface component or pressing a key on the computer). The runtime loop waits for something to happen, possibly for an infinite amount of time, and then executes the logic assigned by the programmer to an appropriate action when it detects an event. If there is a problem that will prevent the action from being completed successfully then an exception can handle it, terminate that thread, and return the flow to the main wait-state. Users can fix the problem and try again without the program terminating.

Unfortunately, PHP doesn't enjoy this execution model, and it's been my experience that exceptions are just as easy to abuse when used in batch/imperative models as goto is. All too often a programmer will catch the exception and then simply terminate the program.
class FileAccessException extends Exception { }

$filename = "test.txt";

try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException("cannot open $filename");
}
}
catch (FileAccessException $e) {
die("Error: " . $e->getMessage());
}
Is this really effective error handling? It might be if the script is processing a batch job or generating a web page. There is nothing the requestor can do in those cases to address whatever it is that caused the error, and the only reasonable course of action is to gracefully terminate the script. But what if the script were running interactively from a command prompt? PHP is primarily used to generate web content, but as more and more people realize its flexibility and the benefits of reducing the number of programming language across all parts of an application, CLI scripting with PHP is becoming more popular. In that case, a more proper action for the script to take would be to inform its users what the error was and suggest some possible steps they can take to resolve it.
// return whether the script is allowed read access ("r"),
// or write access ("w") a file
function testPermission($filename, $access = "r")
{
clearstatcache();
list(,,$mode,,$uid,$gid) = stat($filename);
$perms = array("u_r" => (bool)($mode & 0400),
"u_w" => (bool)($mode & 0200),
"g_r" => (bool)($mode & 0040),
"g_w" => (bool)($mode & 0020),
"o_r" => (bool)($mode & 0004),
"o_w" => (bool)($mode & 0002));
list($user) = posix_getpwuid($uid);
$group = posix_getgrgid($gid);

list($eUser) = posix_getpwuid(posix_geteuid());
$isUser = ($user == $eUser);
$isGroup = in_array($eUser, $group["members"]);
$isOther = !($isUser || $isGroup);

return (($isUser && $perms["u_" . $access]) ||
($isGroup && $perms["g_" . $access]) ||
($isOther && $perms["o_" . $access]));
}

try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException();
}
}
catch (FileAccessException $e) {
if (!file_exists($filename)) {
die("$filename does not exist. Please create the " .
"file.\n");
}
else {
if (!testPermission($filename, "r")) {
die("Please check read permissions on " .
"$filename.\n");
}
else {
die("Unknown error attempting to access " .
"$filename.\n");
}
}
}
The code is a bit smarter now about the error and offers users some guidance as to what needs to be done to resolve it, but the program still terminates. Not all exceptions should be fatal. In this case, it would be better for the code to give the users an opportunity to fix the error and then try to re-read the file instead of forcing them to start the program over again.

Whereas in event-driven execution users can simply retry the action, there is no clean way to retry the code that triggered the error procedurally. One possibility is to surround the action in a do/while loop.
// prompt the user whether to retry an action
function promptRetry()
{
do {
echo "Type 'R' to retry or 'Q' to quit: ";
$retry = strtoupper(trim(fread(STDIN, 2)));
if ($retry == "R") {
return true;
}
else if ($retry == "Q") {
return false;
}
else {
echo "Invalid entry. ";
}
}
while (true);
}

do {
$retry = false;
try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException();
}
}
catch (FileAccessException $e) {
if (!file_exists($filename)) {
echo "$filename does not exist. Please create " .
"the file.\n";
}
else {
if (!testPermission($filename, "r")) {
echo "Please check read permissions on " .
"$filename.\n";
}
else {
echo "Unknown error attempting to access " .
"$filename.\n";
}
}
if (!promptRetry()) exit();
$retry = true;
}
}
while ($retry);
I don't like how the $retry variable is set within the catch block to trigger the reiteration of the surrounding do/while loop. It seems a bit fragile to set variables within catch blocks in order to influence the behavior of code outside the block. Instead, the example can be refactored to use goto. This eliminates having to keep track of $retry altogether and just redirects the execution flow itself.
TRY_OPEN_FILE:
try {
$fp = @fopen($filename, "r");
if ($fp === false) {
throw new FileAccessException();
}
}
catch (FileAccessException $e) {
if (!file_exists($filename)) {
echo "$filename does not exist. Please create " .
"the file.\n";
}
else if (!testPermission($filename, "r")){
echo "Please check write permissions on " .
"$filename.\n";
}
else {
echo "Unknown error attempting to access " .
"$filename.\n";
}
if (!promptRetry()) exit();
goto TRY_OPEN_FILE;
}
The conclusion on my second talking point was that goto can be used to overcome some of the perceived limitations of working with exceptions in a batch/procedural execution model. Without the support of an event-driven execution model, and without a dedicated retry-type statement, programmers need to resort to looping constructs. The code that results can be brittle and difficult to maintain over time. goto offers an eloquent and succinct alternative.

Saturday, March 7, 2009

Anonymous Functions and Closures

Of all the new goodies that are promised in PHP 5.3, the two which I think I am the most excited about are anonymous functions and closures.

Anonymous functions are functions that are defined without being bound to a proper name. Typically, anonymous functions are used only a limited number of times and for a specific purpose; you could think of them as "throw-away" functions if you'd like.

Let's consider the following example which illustrates a standard function used as a callback:
function percentVowels_callback($word) {
$word = strtolower($word);
$chars = count_chars($word);
$numVowels = 0;
foreach (array("a", "e", "i", "o", "u") as $vowel) {
$numVowels += $chars[ord($vowel)];
}
return $numVowels / strlen($word);
}

$animals = array("aardvark", "elephant", "iguana", "orangutan",
"urchin");
$percentVowels = array_map("percentVowels_callback", $animals);
The array_map() function accepts the name of a function and an array, and produces a new array resulting from applying the callback function to each element of the input array. The callback function I've defined accepts a string and returns the percentage of which is made up of vowels. For the sake of simplicity, I am only considering the letters A, E, I, O and U to be vowels, even though sometimes Y and rarely even W can be treated as vowels as well.

Since the function is only used once by this array_map() statement, it may make sense to refactor percentVowels_callback() as an anonymous function. Typically the purpose of a function is to eliminate repetitive code and build reusable components... but the purpose of anonymous functions is different. Anonymous functions group together a related set of statements.

Earlier versions of PHP (starting at version 4.0.1) provided limited support for defining anonymous functions with create_function(). Here's an example which shows the call to array_map() with percentVowels_callback() refactored as an anonymous function using this create_function().
$percentVowels_callback = create_function('$word', '
$word = strtolower($word);
$chars = count_chars($word);
$numVowels = 0;
foreach (array("a", "e", "i", "o", "u") as $vowel) {
$numVowels += $chars[ord($vowel)];
}
return $numVowels / strlen($word);');

$percentVowels = array_map($percentVowels_callback, $animals);
Depending on your preference and coding style, the callback can be defined inline, as well. Depending on how the application is organized, you may need to jump to a different section of the code file or to a different file all together to inspect the contents of the function when you are tracing through code, and then find your way back afterwards to the calling location. The function's body can be available visually where it is most pertinent.
$percentVowels = array_map(create_function('$word', '
$word = strtolower($word);
$chars = count_chars($word);
$numVowels = 0;
foreach (array("a", "e", "i", "o", "u") as $vowel) {
$numVowels += $chars[ord($vowel)];
}
return $numVowels / strlen($word);'), $animals);
The create_function() function accepts two strings--the first listing the variable names to serve as the anonymous function's arguments, the second containing the code for the function's body--and returns a unique string which can be used to identify the function.

create_function() does have a few drawbacks, though. Because the argument list and function body are provided as strings, you must be careful to make sure certain characters within the string are escaped properly. You need to escape any single-quotation marks that appear if your strings are single-quoted, or you need to escape double-quotation marks and dollar signs if your strings are double-quoted. Moreover, you also lose the benefits of any syntax highlighting your IDE may provide since it highlights the strings as... well, strings! Overall, the approach is cumbersome and clunky.

As of version 5.3, PHP will offer better support for anonymous functions and a new syntax which supports closures. The new syntax for anonymous functions is more similar to the manner in which JavaScript and other event-driven languages define them.
$percentVowels_callback = function($word) {
$word = strtolower($word);
$chars = count_chars($word);
$numVowels = 0;
foreach (array("a", "e", "i", "o", "u") as $vowel) {
$numVowels += $chars[ord($vowel)];
}
return $numVowels / strlen($word);
};

$percentVowels = array_map($percentVowels_callback, $animals);
Gone is the clumsiness of create_function() and its string arguments. The anonymous function's arguments and body are provided as PHP code which can be highlighted correctly by your IDE. The one caveat to the new syntax is that there MUST be a trailing semi-colon after the function's closing brace, as $percentVowels_callback = some value is a regular assignment statement.

One is also able to define the anonymous function using the new syntax inline as well.
$percentVowels = array_map(function ($word) {
$word = strtolower($word);
$chars = count_chars($word);
$numVowels = 0;
foreach (array("a", "e", "i", "o", "u") as $vowel) {
$numVowels += $chars[ord($vowel)];
}
return $numVowels / strlen($word);}, $animals);
So far I've shown you a rather uninspired example using an anonymous function, so let's expand it a bit and make things more interesting. Suppose the application should reduce the list of percentages to those that are equal to or greater than a user-specified value. For this we can use array_filter().

The array_filter() function accepts an input array and the name of a function, and produces a new array resulting from applying the callback function to each element of the input array. If the callback returns true then the element tested is included in the final output array. We have a slight problem, though-- the documentation for array_filter() shows it only passes one value to the callback (the current array element to be examined). We need access to user-specified value as well. Closures will allow us to "reach out" outside the scope of the percentFilter_callback() function to see the value of $userPercent.
$userPercent = 0.5;
$userList = array_filter($percentVowels,
function($percent) use ($userPercent) {
return ($percent >= $userPercent);
});
Closures allow you controlled access to values to the parent scope of a function. The new syntax introduces the use keyword to specify which variables should be imported. The anonymous function is passed the current element of the $percentVowels array by array_filter(), but it has access to the value of $userPercent as well which is needed for the comparison.

Closures in PHP import the variable from the parent scope the same as if it were passed as an argument to the function and manipulating the variable does not have an effect outside the function (unless it was passed by reference).

However, it is important to keep in mind that closure is not the same thing as using global. global reference variables within the global scope of the script. Closures on the other hand can only bind to variables in the parent scope of a function, which make using them considerably safer than using global. I'll leave you with this final example which highlights this difference between them:
$x = 42;

function foo() {
function fizz() {
global $x;
echo $x;
}
bar();
}

function bar() {
$bizz = function () use ($x) { echo $x; };
$bizz();
}
For more information on anonymous functions and closures, check out these pages: