Monday, February 6, 2012

Relative Date Ranges: Current Week of Prior Year

A handy feature in reporting applications is the ability to query data using relative date ranges. A relative date range is nothing more than a predefined start and end time offset some manner from the current date/time, but the specifics are hidden from the end-user behind a human readable label. For example, if you had a database full of access logs and wanted to query for a list of login failures that happened yesterday you could write:

<?php
$yesterday = sprintf("BETWEEN '%s 00:00:00' AND '%1\$s 23:59:59'",
    date("Y-m-d", strtotime("-1 day")));
$query = "SELECT username, tstamp, ip_address FROM failed_logins
    WHERE tstamp $yesterday";

The user could select "Yesterday" from a list, and the code would dynamically build the query accordingly.

Alternatively, you could write it entirely in SQL using MySQL's date and time handling functions. It looks a bit messier, but is just as effective:

SELECT ... BETWEEN
    DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY), '%Y-%m-%d 00:00:00')
    AND DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59');

Recently I was asked to formulate a set of relative date range queries in SQL, and not all of the required ranges were as easy as "Yesterday." One such range was "Current Week of Prior Year". Before you think "oh that's easy!", keep in mind it has to determined entirely in SQL.

So the task here is to identify the week number in which the current date falls, and then return the start and end dates for that week of the previous year. The underlying assumptions are:

  1. a week is defined as having a Thursday
  2. the start of a week is Monday

The assumptions comes from the ISO-8601 definition and as a consequence week 1 of 2011 is Jan 3 - Jan 9 (Jan 1 and 2 are actually in week 52 of 2010).

For the sake of argument (and example), let's say today is Jan 23, 2012. According to the numbering scheme as understood with the above assumptions, Jan 23 falls within week 4 of 2012. This can be confirmed with WEEK('2012-01-23', 3) = 4 in MySQL. The target thus is to select the start and end dates of week 4 of the previous year; the results would be Jan 24 - Jan 30 of 2011.

Complicating this further, there was the requirement to adjust the starting day of the week (a modification of assumption 2). Continuing to use week 4 of 2011 (Jan 24 - Jan 30, Mon - Sun) as the example, if the user says the start of the week is Wednesday, then I would need to adjust the dates by 2 days giving Jan 26 - Feb 1. Thursday adjusts by 3 giving Jan 27 - Feb 2. etc.

The formula now has two parts:

  1. calculate the start and end of the *true* year week
  2. slide it into the "future" by whatever the start day would be

While the sliding into the future may or may not be correct, it would at least yield consistent results based on the conventional understanding of what "week 4" means.

I worked through deriving the formula step by step, starting with the Jan 23, 2012 example and comparing my results as I went along with the calendar at whatweekisit.com.

I first figured out the ability to get the start and end dates for week 4 of 2012 (Jan 23 to Jan 29), which is calculated with the following:

SELECT
    WEEK('2012-01-23', 3) AS weekNumber,
    DATE_SUB('2012-01-23', INTERVAL DAYOFWEEK('2012-01-23') - 2 DAY) AS startOfWeek,
    DATE_ADD('2012-01-23', INTERVAL 8 - DAYOFWEEK('2012-01-23') DAY) as endOfWeek;

Week 4 of 2011 is Jan 24 to Jan 30 (also confirmable by calendar). While it is true that the same date may not fall within the current week number and last year's week, I don't suspect they would be wildly different so a simple IF() that adjusts the calculation by an extra week seems sufficient.

SELECT
    IF (WEEK('2012-01-23', 3) = WEEK(DATE_SUB('2012-01-23', INTERVAL 1 YEAR), 3),
        DATE_SUB(DATE_SUB('2012-01-23', INTERVAL 1 YEAR),
            INTERVAL DAYOFWEEK(DATE_SUB('2012-01-23', INTERVAL 1 YEAR)) - 2 DAY),
        DATE_SUB(DATE_SUB(DATE_ADD('2012-01-23', INTERVAL 6 DAY), INTERVAL 1 YEAR),
            INTERVAL DAYOFWEEK(DATE_SUB(DATE_ADD('2012-01-23', INTERVAL 6 DAY),
            INTERVAL 1 YEAR)) - 2 DAY)) AS datetimeStart;

Now that the start and end of the target year week has been identified, I was able to replace the hard-coded dates with NOW() and apply the offset. Given `@weekDayStart` is 0 - 6 (Sun - Sat):

SELECT
    DATE_ADD(
        IF (WEEK(NOW(), 3) = WEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR), 3),
            DATE_SUB(DATE_SUB(NOW(), INTERVAL 1 YEAR),
                INTERVAL DAYOFWEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR)) - 2 DAY),
            DATE_SUB(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY), INTERVAL 1 YEAR), 
                INTERVAL DAYOFWEEK(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY), 
                INTERVAL 1 YEAR)) - 2 DAY)),
        INTERVAL @weekDayStart - 1 DAY) AS datetimeStart,

    DATE_ADD(
        IF (WEEK(NOW(), 3) = WEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR), 3),
            DATE_SUB(DATE_SUB(NOW(), INTERVAL 1 YEAR), 
                INTERVAL DAYOFWEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR)) - 2 DAY),
            DATE_SUB(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY), INTERVAL 1 YEAR),
                INTERVAL DAYOFWEEK(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY),
                INTERVAL 1 YEAR)) - 2 DAY)), 
        INTERVAL @weekDayStart + 5 DAY) AS datetimeEnd;
Working it through with the input of 2012-01-23 as NOW() and start day of Thursday (4) would yield:
  1. **Jan 23 2012** = week 4
  2. Week 4 of 2011 = 1/24-1/30
  3. Offset Thu - Mon (4 - 1) is +3 which slides the window to **Jan 27 - Feb 2 2011**

It took me a good day to figure out, I cursed the requirements constantly, and even a friend tried to help out by posing the question on Stackoverflow which just ended up being a conversation between ourselves anyway. But in the end I think I came up with an ugly but workable solution. If you know of a better way, feel free to mention it in the comments section below!

Monday, January 30, 2012

Esperanto Accented Characters in Ubuntu

I assure you it's unintentional, but it seems that every 6 months or so I end up dabbling in languages: first it was Prolog in July, 2010; then it was Go in December, 2010; and last June it was Smalltalk. This time around it would seem I'm leaning towards human languages, choosing to brush up on my Esperanto.

Today I got fed up with typing with the x-method, the practice of following letters that would be accented with an X since the accented characters aren't on the typical keyboard. For example, the word "ankaŭ" would be typed as "ankaux." This is the 21st century, though, and there had to be some easy way enter properly-accented characters.

Believe it or not, there is an Esperanto keymap, but I didn't feel like going that extreme since it would make entering other characters that I type on a day-to-day basis more difficult. (I wouldn't want to do anything to jeopardize my 90wpm typing!)

Instead I tracked down how to augment my English (US) keymap with the extra functionality I needed, and it was easier than I had expected it to be. So if you want to set up your keyboard to type Esperanto accented characters, here's the steps.

First, find the Keyboard Layout applet in Ubuntu/Gnome's System Settings window.

Then, select the keymap you want to modify (here there's only one) and click the Options button in the bottom-right corner of the screen.

The Keyboard Layout Options window that opens has a list of options that you can use to fine-tune the behavior of the keymap. The two options of interest are:

  • Adding Esperanto circumflexes (supersigno) – To the corresponding key in a Qwerty keyboard
  • Keys to choose 3rd level – Right Alt

Each key can be thought of having multiple levels. For example, the first level of the C key would be a lower-case "c". The second level would an upper-case "C" (the second level is accessible the Shift key as a modifier). The supersigno option maps the accented characters to the third and fourth levels of the keys of their respective base glyphs. That is, the third level of the C key is now "ĉ” and the fourth is "Ĉ".

Just as a modifier key is needed to access the second level (Shift), a modifier is also used to access the higher levels. In my case I set this as the Right Alt key, a key traditionally used for this purpose.

I want to give a great big shout out to Maxx Solomon in whose blog I found the original directions. His method was much easier to follow and configure then all the stale X.Org setting tutorials I found from the late 90's. I figured I'd do my own write up as well to increase the changes of others finding it when they search, and because I was feeling guilty about not writing anything in a couple months in my blog. :) Feel free to stop over to his blog and say hi; I'm sure he wouldn't mind the extra traffic.

Wednesday, October 26, 2011

Please God, Give Me Something New!

"Here... check out this (link). But you have to use the most recent version of Chrome." Sigh. Haven't we been through this already? 15 years have passed and we're headed right back to the same place we started... "Best viewed in <someone's favorite browser>." It's sad, really.

Don't get me wrong. Years of standardization work on HTML, JavaScript, the DOM, CSS, etc. cleaned up a lot of messy lose ends and yes it was indeed necessary, but stringent standardization also stifles creativity. And now that HTML5 and friends have loosened some of the restrictions the pendulum has started to swing back in the opposite direction. People have started to innovate again. This time around it's Firefox vs Chrome instead of Internet Explorer vs Netscape.

There's a systemic problem that goes beyond browser wars, however. Remember AOL? This time it's Facebook trying to be “The Internet.” Remember mainframes and terminals? After pushing everything to the desktop, now we're pushing everything back “to the cloud.” All the fourth and fifth generation programming languages have come and gone and the best we have now is Java and Clojure?! Remember Ajax, er I mean DHTML, er I mean JavaScript? We're all "innovating" but nobody is really doing anything new, exciting, and unique.

Hell, Tesla plugged 200 light bulbs into the ground and lit them up 25-miles away from a power source in 1899 and the best we have now is Solyndra and the Prius? What gives?!

The sine-like cycle of technological advancements wouldn't be bad if each cycle actually gave us something new; you know, an advancement. Maybe that's what has me so jaded. Each cycle seems to rehash the previous cycle and there's nothing really new and exciting anymore. We're moving in circles, not traveling in spirals. The more things change, the more they stay the same. Is Quindlen right, and every story has already been told?

Think about HTML5's canvas element which everyone is saying how great and wonderful it is, an area for 2D drawing that can be manipulated with JavaScript. If browsers had actually implemented decent support for SVG 10 years ago we wouldn't need canvas now. That's right, we've had this “hot new technology” for 10 years already. And sadly, SVG is superior in many ways. And sadly, we've had 3D capability with VRML/X3D since the mid-90s. And sadly, we've settled for something less and are grinning from ear to ear.

Wouldn't you like to have technology from 10 years into the future and have it now? It sounds tempting, but I'm not so sure I really would... because it'd probably be the same as what I've already had for the past 10 years just with a new marketing campaign.

Sunday, September 18, 2011

Top-10 PHP String Functions

By day I work as a programmer at ShoreGroup, Inc. By night I'm a freelance developer and now the managing editor for SitePoint's latest site, PHPMaster.com. Helping out with the site has been pretty fun so far; my Australian counterparts are all pretty cool, and I've met some really great new authors too. If you haven't visited yet, take a moment and check out PHPMaster.com (there's still some wrinkles to iron out on the site, but we're working to identify and fix them all as soon as we can).

Part of my duties as a managing editor include working with authors to make sure the site's content is well balanced. PHPMaster.com is targeting PHP programmers of all skill levels, so there should be a good mix of basic, beginner, intermediate, and advanced content. Planning for a beginner article that demonstrates basic string handling functions, I wondered which function to highlight. I wanted to show ones that would be most relevant, not necessarily ones that were my favorite, so I decided to do some static analysis of popular open-source projects to find out which string functions were used the most. The results were surprising, so I thought I'd share my "research."

I used the source of a closed-source PHP project that I have access to and the following open-source (or open-source-ish) projects as code samples for the analysis:

Then I ran the following PHP to tally the functions:

#! /usr/bin/env php
<?php
if ($_SERVER["argc"] != 4) {
    $script = basename(__FILE__);
    fprintf(STDERR, "usage: %s directory max exts\n", $script);
    fprintf(STDERR, "\tdirectory - directory to start traversal\n");
    fprintf(STDERR, "\tmax - maximum number of results to return\n");
    fprintf(STDERR, "\texts - comma-separated list of file extensions\n");
    fprintf(STDERR, "example: %s /var/www 20 php,inc\n", $script);
    exit(1);
}
// no error-checking... don't be stupid
$directory = $_SERVER["argv"][1];
$max = $_SERVER["argv"][2];
$extsRegex = "/(" . str_replace(",", "|", $_SERVER["argv"][3]) . ')$/';

$dirIter = new RecursiveDirectoryIterator($directory);
$recIter = new RecursiveIteratorIterator($dirIter);
$iter = new RegexIterator($recIter, $extsRegex);

$funcs = array();
foreach ($iter as $file) {
    $tokens = token_get_all(file_get_contents($file));
    foreach ($tokens as $t) {
        if (is_array($t) && $t[0] == T_STRING && function_exists($t[1])) {
            if (!isset($funcs[$t[1]])) {
                $funcs[$t[1]] = 0;
            }
            $funcs[$t[1]]++;
        }
    }
}
arsort($funcs);

$max = min(count($funcs), $max);
if ($max) {
    list($funcs) = array_chunk($funcs, $max, true);
}
print_r($funcs);

I took the resulting list of functions and extracted the string-specific ones to come up with this top-10 list (sorted in decreasing order of most-used):

  1. substr() - 6,605
  2. sprintf() - 5,604
  3. implode()/join() - 4,829
  4. strlen() - 4,557
  5. chr() - 4,122
  6. str_replace() - 4,009
  7. explode() - 3,401
  8. strpos() - 3,238
  9. htmlspecialchars() - 3,171
  10. trim() - 2,998

I expected functions like substr() and trim() to be on the list, but chr() was a surprise. Before this I probably would have laughed at you if you told me chr() is used almost twice as much as strtolower() (which came in 12th place with 2,267). Interesting results indeed!

Sunday, July 10, 2011

Avoid Fetch-Object Abuse

Lately I'm finding a lot of instances of the mysql_fetch_object() function being used in a particular codebase I help maintain. Unfortunately, I've yet to see it used correctly. It always seems to be used to retrieve a stdClass object from a query result where mysql_fetch_array() or mysql_fetch_assoc() would be the more appropriate choice.
$row = mysql_fetch_object($result);
$kitten = new Kitten();
$kitten->setName($row->name);
$kitten->setColor($row->color);
...
Put aside the argument that the code should be using PDO or the MySQLi extension instead of the legacy MySQL extension. mysqli_result::fetch_object() and PDOStatement::fetchObject() have the same potential for abuse. The above code is wrong because the returned result is an object but treated like it's an array.

The mysql_fetch_array() and mysql_fetch_assoc() functions are used to retrieve a row from the query's result set. I prefer to use mysql_fetch_assoc() so I can access the values using column names.
  • mysql_fetch_array() - return an array corresponding to the fetched row and move the internal data pointer to the next. The array is indexed numerically if MYSQL_NUM is specified, associatively if MYSQL_ASSOC is specified, or both by default or if MYSQL_BOTH is specified.
  • mysql_fetch_assoc() - return an associative array corresponding to the fetched row and move the internal data pointer to the next. It is the same as calling mysql_fetch_array() using MYSQL_ASSOC.
mysql_fetch_object() on the other hand was designed to create and populate an instance of an object using the row of data.
$kitten = mysql_fetch_object($result, "Kitten");
If mysql_fetch_object() is called without the second argument, the name of the object type to create, then a stdClass object is returned that has public properties named after the result columns to expose the data. There isn't any benefit to this over an array. It is just an object for the sake of having an object, and not a very useful object at that. Incidentally, an array can always be casted to a stdObject later if, for some odd reason, it's needed:
$obj = (object)$array;
An object is a data structure that encapsulates variables to maintain state and related functions to manipulate the state. A stdClass data object just collects values and then exposes them as public properties. It does not observe proper encapsulation, there is no state, and there are no methods to interact with the object. It is no more than an array that uses object-syntax. My rule of thumb is: if you call mysql_fetch_object() without specifying a class name, you're "doing it wrong."

A related point worth mentioning is even a good programmer may run into some difficulty when trying to use mysql_fetch_object() correctly since the function is intrinsically broken (though some will argue differently). The function first creates an instance of the specified class, then populates its internal variables, and lastly invokes the constructor. The purpose of a constructor is to initialize the object's values and resources, but since the constructor is called last the values set from the result row may be overwritten. The programmer must take this into account and guard against it if the objects might be instantiated by both mysql_fetch_object() and new.
class Kitten
{
    private $name;
    private $color;
    ...
    public function __construct() {
        if (!isset($this->name)) {
            $this->name = "Unknown";
        }
        ...
If you're like me and have a large codebase with the misuse of mysql_fetch_object() deeply entangled throughout, it may not be practical to find and fix each instance. The best advice I can offer is to educate yourself and others how the function should be used so its abuse isn't perpetuated. Then, be cautious when using mysql_fetch_object() correctly and understand the process it follows to create and return an object. If not for yourself, then do it for the kittens.

Monday, July 4, 2011

Smalltalk Challenge: Post 10 - Conclusions

So here I am, more than 20 hours with Smalltalk and 10 blog posts about my experiences as required by the terms of my challenge. Josh said he chose Smalltalk for me because of the language's history of helping giving rise to object-oriented programming, which he thinks I hate. (For the record I don't hate OO; I hate the awkwardness and complexity its over-zealous application inflicts.) But I think secretly he was hoping it would endear me to Java in the same way as I was hoping OCaml would help him see past Java. Smalltalk gave me a newfound respect, not for OO, but for the language's innovations and lasting influence. If anything in relation to current-day OO practices, my experience confirmed how botched OO adoption has been given what it was intended to be. Regardless, while I don't think I'll be using Smalltalk in the foreseeable future, it was definitely fun to explore and gave me a lot of things to think about.

Smalltalk isn't a popular "mainstream" language like Java or PHP, but it was highly influential on many of them. It's a language that hasn't been afraid to try bold, new ideas. Other languages adopted its innovations; some did so successfully, others not so much. Will new innovations from Smalltalk and its community be just as influential in the future? I can't help but think that if Squeak has successfully abandoned the MVC architecture in favor of Morphic, maybe the days of MVC web frameworks such as Struts, Zend Framework, and Rails are numbered.

Message passing is a key concept in Smalltalk and provides the flexibility of duck typing. But late/dynamic binding prohibits many compiler-time optimizations that could otherwise be performed on statically-compiled code. Is there continued benefit to such an approach now that we're inundated with so many dynamic languages that have (or at least seem to emulate reasonably well) direct invocation, and thus better performance, such as Lua and JavaScript? Perhaps a greater benefit could be realized if message passing was handled asynchronously.

In Smalltalk, everything is an object. Even basic language constructs such as if-statements and while loops have been distilled down to objects and message passing. I respect Kay for intentionally taking an extreme position to force new ways of looking at things, though I believe the "everything is a" mindset can lead to awkward perceptions. The human mind is adept at categorizing things into different classifications exactly because not everything is the same. Ruby claims everything is an object too, which is misleading because it still has basic language constructs. I wonder where the sweet spot between abstraction and reality is, and in what ways can programming languages more accurately reflect how we view the world so we don't end up with with shimmed, over-architectured, spaghetti code.

Overall I found the core concepts of Smalltalk pretty easy to learn (since there are so few), and it was fun to write Smalltalk code and explore objects and Morphs in Squeak's environment. I encourage others who may not be familiar with Smalltalk to spend a little time with it as I did. I can't say it made me a better programmer, but in some sense it made me a better person because I now have a better understanding of history and a lot of things to think about.

Sunday, July 3, 2011

Smalltalk Challenge: Post 9 - Koans

Besides tinkering with turtles and hashes in Squeak, I secretly went back to GNU Smalltalk and went through some of the Smalltalk Koans. Sssh... don't tell Josh!

Programming koans are a series of failing unit tests that a student reads through and corrects. Each test demonstrates a particular concept in the language. They can be a fun way to review one's understanding of a language, and occasionally learn something new. Here's an of a koan:
testSingleCharacterFromString [
    | string |

    string := 'Smalltalk'.

    self expect: (self fillMeIn) toEqual: (string at: 1).
    self expect: (self fillMeIn) toEqual: (string at: 6).
]
When the test suite is run, it displays:
Do not lose hope.  Expected value should equal actual value.

Expected : FILL ME IN
Actual   : $S  (an instance of Character)

TestString#testSingleCharacterFromString has damaged your karma
(in src/koans/TestString.st)
The name of the method indicates it is possible access the characters that make up a string. The test shows how the at: message is passed to a string to obtain a character at the given index. The programmer must replace (self fillMeIn) with the correct value which will allow the test to pass and the student to proceed.
self expect: $S toEqual: (string at: 1).
self expect: $t toEqual: (string at: 6).
This demonstrates that in Smalltalk character instances are preceded by a dollar-sign, and indexes start at 1, not 0 as in many other languages.

Going through the koans, I thought the ones from TestString.st, TestMessage.st, and TestDictionary.st were exceptionally good. My favorite was this one from TestMessage.st, which demonstrates an unintuitive edge-case resulting from Smalltalk's everything-is-an-object and message passing philosophies.
testMessageCascading [
    | value |

    value := 3 + 2; * 100.  "';' separates each message sent to '3'"

    self expect: (self fillMeIn) toEqual: value.

    "Think about it: we are sending multiple messages to '3'."
]
The correct answer is 300. Pretty evil, eh?