Sunday, June 28, 2009

Currying in PHP

What happens if you don't have all the arguments handy for a function, but you want to give whatever arguments you do have now and then provide the rest of them to the function later? This is called currying, and is a core concept in functional programming. It's messy, but possible to curry functions in PHP now that closures have been added.

First, let me show you how currying looks in a functional language. Here's a basic example in OCaml/F#:
let do_math op x y =
match op with
'+' -> x + y
| '-' -> x – y
| _ -> failwith "Invalid op"

let add = do_math '+'

let inc = add 1
let dec = add (-1)
;;
A function named do_math is defined that accepts an operator and two operands. The function's return value will be either the sum or difference of the operands, depending on whether the given operator is + or -. Notice how do_math is then called with a single argument. OCaml doesn't raise an error; it simply returns a function that "remembers" the first argument and accepts the remaining two arguments later (this is an over-simplified and slightly inaccurate statement, but a good enough description for our purpose here). This intermediate function can be used elsewhere, as in the bindings for inc and dec.

Now here's a version of the do_math() function in PHP:
function do_math($op, $x, $y) {
switch ($op) {
case '+':
return $x + $y;

case '-':
return $x - $y;

default:
throw new Exception("Invalid op");
}
}
Unfortunately, PHP will throw warnings if you call do_math() without the three arguments it expects.

Warning: Missing argument 2 for do_math(), called in /home/tboronczyk/curry.php on line 16 and defined in /home/tboronczyk/curry.php on line 2

Warning: Missing argument 3 for do_math(), called in /home/tboronczyk/curry.php on line 16 and defined in /home/tboronczyk/curry.php on line 2


Whereas functional languages have currying "built-in," you must explicitly code this ability in an imperative language. Doing so in PHP requires the use of closures:
function do_math($op) {
return function ($x) use ($op) {
return function ($y) use ($op, $x) {
switch ($op) {
case "+":
return $x + $y;

case "-":
return $x - $y;

default:
throw new Exception("Invalid op");
}
};
};
}
It's also possible to extend the function using func_num_args() and func_get_arg() functions, anonymous functions, and closures, so that any number of parameters can be given at a time. This is a more general form of currying known as "partial functions".
function do_math() {
if (func_num_args() >= 1) $op = func_get_arg(0);
if (func_num_args() >= 2) $x = func_get_arg(1);
if (func_num_args() == 3) $y = func_get_arg(2);

switch (func_num_args()) {
case 1:
return function () use ($op) {
if (func_num_args() >= 1) $x = func_get_arg(0);
if (func_num_args() == 2) $y = func_get_arg(1);

switch (func_num_args()) {
case 1:
return function ($y) use ($op, $x) {
return do_math($op, $x, $y);
};

case 2:
return do_math($op, $x, $y);

default:
trigger_error(
"invalid argument count",
E_USER_WARNING);
}
};

case 2:
return function ($y) use ($op, $x) {
return do_math($op, $x, $y);
};

case 3:
switch ($op) {
case "+":
return $x + $y;

case "-":
return $x - $y;

default:
throw new Exception("Invalid op");
}

default:
trigger_error("invalid argument count",
E_USER_WARNING);
}
}
It's messy... but it works! Now you are able to pass one or two arguments to do_math(), capture the intermediate function that's returned, and pass the remaining argument(s) later.
$add = do_math("+");

$inc = $add(1);
$dec = $add(-1);

echo do_math("-", 3, 2);
echo do_math("+", 1, 1);
echo $inc(2);
echo $add(2, 2);
echo $dec(6);
echo $add($inc(4), $dec(2));
The switch statements are rather unmanageable and the spaghettification of code grows exponentially with the addition of each argument. This pattern is straight forward, though. You may want to consider writing a code generator to handle the dirty work of retrofitting a function to one capable of being curried rather than writing them all by hand. Of course, if you know of a better way to curry functions in PHP then let me know by leaving a comment!

Update 06/29/09: Someone asked me what the "real-world use" for all this would be. Currying is used all the time in functional programming, but the hassle of explicitly enabling the behavior in PHP makes that a valid question. My motivation was just to see if it were possible and share my results. Indeed it is. Functions can be curried in any language that supports closures. But for those who want something a little more concrete, let's consider callback functions.

In a previous post I gave the following example to illustrate the use of closures:
$userPercent = 0.5;
$userList = array_filter($percentVowels,
function($percent) use ($userPercent) {
return ($percent >= $userPercent);
});
It showed an anonymous function being used with array_filter() to filter an array. The array is filtered based on a dynamic value, and a closure is used to "inject" the threshold rather than using a global statement. The same could also be accomplished with currying.

The problem is array_filter() expects a callback function that accepts one argument--the current array element. Currying will allow us to prepare the function with the sorting threshold, and the intermediate function can be used as the callback.
function callback($userPercent) {
return function($percent) use ($userPercent) {
return ($percent >= $userPercent);
};
}
$userList = array_filter($percentVowels, callback(0.5));

Sunday, June 14, 2009

Kember Identity

Ever wonder if there is an MD5 hash the same as the original input? Nope, me neither. But Mr. Kember does and he's asking the world to help him find out if such a thing exists. There's no fame if you find it for him (he's humbly named it the "Kember Identity" already)—but you might make a little cash. Check out his web page for the details. Go ahead and enter his contest if you're feeling gullible lucky!

The MD5 algorithm returns a fixed-length 128-bit hash, so there are 2128 possible values. The hash is typically expressed as a series of 32 hexadecimal values. Since the input string and its hash must be the same to reflect the Kember Identity, you wouldn't need to test random strings like "ruby on rails rots your brain"; you only need to test strings that are 32-characters long and contain the numbers 0 though 9 and letters a through f like 8d112b3c68248c12f178188c1b921ec1.

Kember suggests testing values at random because the range of candidates is so large (2128 is 34,028,236,692,093,846,346,337,460,743,177). Unfortunately, there're a few problems with this approach:It actually takes less time to test all values sequentially than through random-selection.

Additionally, one has to consider the possibility that such a value doesn't exist. The odds of finding the Kember Identity are actually quite small: 1/((2128!)/( 2128!)(1-2128)!). So how would you know when all possible values have been tested proving the Kember identity doesn't exist if the values are tested randomly? You don't.

The only reliable way to programmatically identify whether the Kember Identity exists and what hashes exhibit it is to test each hashes sequentially and record the results.

The whole thing might not bother me if money wasn't involved. Just send Mr. Kember your $5 entry fee and you're eligible to win the prize pot if your script is first to find the magical hash! But I have a few questions:
  • How do I contact Mr. Kember to receive my prize when I find a hash that exhibits the Kember Identity?

  • What happens to my $5 and the rest of the prize money if it is proven the Identity doesn't exist?

  • At 60-million hashes an hour, it would take over 646,987,670,262,051,588,140,743 millennia to verify them all. How long does Mr. Kember plan on holding on to the prize money?
While it might not be a scam (it says explicitly that it's not a scam somewhere on the irrationally highlighted contest page), it isn't well thought out.

Thursday, June 11, 2009

What's Wrong with OOP

Proponents of Object Oriented Programming feel the paradigm yields code that is better organized, easier to understand and maintain, and reusable. They view procedural programming code as unwieldy spaghetti and embrace OO-centric design patterns as the "right way" to do things. They argue objects are easier to grasp because they model how we view the world. If the popularity of languages like Java and C# is any indication, they may be right. But after almost 20 years of OOP in the mainstream, there's still a large portion of programmers who resist it. If objects truly model the way people think of things in the real world, then why do people have a hard time understanding and working in OOP?

I suspect the problem might be the focus on objects instead of actions. If I may quote from Steve Yegge's Execution in the Kingdom of Nouns:

Verbs in Javaland are responsible for all the work, but as they are held in contempt by all, no Verb is ever permitted to wander about freely. If a Verb is to be seen in public at all, it must be escorted at all times by a Noun.

OOP focuses primarily on the object and expresses actions in terms of the object's abilities. A airplane object can be flown (Airplane.fly()). A door object can be opened (Door.open()). But we really don't view the world in terms of objects and what actions can be done to them. It's backwards. We view the world in terms of ourselves and our abilities. We are the ultimate object. (And no, I don't mean a God object.)

Imagine you are returning from a trip to the local flower garden. Would you say "I smelled the flowers" or "The flowers were smelled by me?" Now you want to tell a friend to go and smell the flowers. Would you say "Go and smell the flowers" or "The flowers must be smelled by you?" When we convey instructions, we give them in terms relative to ourselves. What is programming but conveying instructions to a computer process how some sort of work should be done on our behalf.

How to make a Peanut Butter Sandwich:
  • Get Jar of Peanut Butter
  • Get Loaf of Bread
  • Get Knife
  • ...
All of these things (jar of peanut butter, bread, and knife) can be thought of as objects.
class PeanutButterJar extends Jar ...
Express them as such and they take on methods.
PeanutButterJar.open()
Knife.spread(PeanutButter, Bread)
Whoa! In my world, peanut butter doesn't do anything but sit there and taste yummy. And I'd start looking for a good exorcist the moment a knife starts performing actions all by by itself. A more realistic transcription of the instructions would be:
You.open(PeanutButterJar)
You.spread(Knife, PeanutButter, Bread)
Specifying You as a universal object would seem rather redundant as the scenario progressed, so a good language designer would remove the need to expressly identify it. This would yield
open(PeanutButterJar)
spread(Knife, PeanutButter, Bread)
which starts to look vaguely procedural.

The truth is, procedural and OOP paradigms languages express the complexity of their problem space in different ways. Procedural code is flat and wide with functions. OO code is hierarchical with inheritance. OO-code is not inherently better organized than procedural code merely because it is encapsulated in objects. A reusable library can consist of functions just as easily as a collection of objects.

The way some OOP languages (like Java and C#) force objects on the programmer borders on the absurd. If I'm writing a library of reusable code that needs to maintain its own state, then of course writing classes with proper encapsulation and dating hiding makes sense. On the other hand, If I'm generating a web page with some data stored in a database, then some procedural code and a handful of function calls makes more sense. One of the things I like about PHP so much is that it allows the programmer to decide which paradigm is best suited for the task at hand.

Sadly though, that decision isn't left to the programmer who has been tasked with developing and maintaining a system. Management can tend to focus too much on buzzword compliance. A procedural programming language designed today would never receive wide-spread adoption if it didn't offer some sort of OO construct despite both paradigms having produced successful libraries and applications. And the programmers that don't learn to think beyond themselves will be unfairly left in the dust.