Wednesday, December 3, 2014

PHP Frameworks Don't Save Time

Experience has shown me frameworks can be useful for maintaining structure in large code base developed by multiple teams. Every developer has different abilities and a framework enforces structure and consistency throughout the code. But I've not experienced saving any substantial amount of time on a PHP project because of a framework.

The other day someone posted in the PHP subreddit asking for advice. He was about to begin work on a small project and wanted to know whether he should use a framework, and if so then which framework would be appropriate. I should have known better than to offer my two cents but I did anyway.

Slim + NotORM + Twig is nice. If it's a simple project, you probably don't need much more than that. I'm not a fan of frameworks in the slightest but I do enjoy the aforementioned combination. They're lightweight and stay out of my way, allowing me to write my functionality.

Another redditor picked up on my distaste for frameworks and asked:

So you're okay with being slower than someone with your same basic skill set? Serious question...

A serious question deserves a serious answer and so I replied, attempting to explain developer skill sets are not always the same and the differences in how we each might approach a problem has a greater effect of development time. If you like you can read my original response in the post's comments thread. Otherwise, here's a more refined presentation of my argument.

With regard to skill set, I'm a PHP programmer who has been coding in pure PHP for the better part of 13 years. I have an intimate relationship with the language and can probably write PHP code in my sleep. But as soon as a framework is introduced, I'm faced with a learning curve. Frustration inevitably ensues because simple things suddenly seem difficult, either because I'm unfamiliar with the new API and have to follow the framework's particular philosophy.

Many of my peers use frameworks, both co-workers and friends in the community. They've taken the time to learn the ins and outs of a given framework and probably can code in their sleep with it just as I do with pure PHP. But what happens when the need arises to go outside the bounds of the framework and they need to write something raw? That's when they confront their learning curve and have to dig into PHP's documentation.

We obviously don't all share the same basic skill set. Yes, we're all working in PHP, but my peers are experienced with a framework and I'm experienced with the nuances of the language itself. They're as fast writing their framework-based code as I am writing PHP; they're as slow writing pure PHP code as I am working with a framework.

But even if everything was equal on the skill side of the equation, there's still a human variable. Sharing exactly the same skills as someone else doesn't mean you'll share the same way of thinking about things or the same approach to solving a problem. Remember, there's more to programming than writing code; a large amount of time is spent simply on thinking about how to solve a problem. I can spend 6 hours planning and 2 hours coding, and a coworker can spend 7 hours planing and 1 hour coding, and although the coworker was technically faster at writing code, neither of us was actually more or less productive than the other. We both put in the same amount of time to the problem.

It's also noted how horribly fragmented the PHP ecosystem is. The world of a PHP programmer is not like the world of Python programmer where the community has largely settled on Django, or the world of a C# programmer where there's the .NET framework. Knowledge of Django and .NET is transferable across most Python and C# projects. But with PHP, a developer can learn ZF2, another developer can learn Yii, another may learn Larvel, and still another would learn Symfony... and little of the knowledge and experience they gain is transferable if the next project doesn't use their preferred framework. We face a potential learning curve before we even make our first keystroke on any project, and that takes time.

Promoting framework adoption is fine but I simply don't believe the time element is the proper argument for it. I probably wasn't as clear as I could have been in my initial response, so hopefully this clarifies things. Feel free to use the comments if I'm still just spewing senseless babble!

PS: Thanks to the kind redditor who felt my blathering response was worth Reddit Gold. You rock!

Thursday, May 1, 2014

New Writers Guide now on GitHub

Writing can be a fun and rewarding way to share your knowledge, experience, and opinions with others. Unfortunately, it can also be intimidating or frustrating for some people. When I was managing editor for SitePoint's PHPMaster property, I prepared a guide to help alleviate some of the frustration and self-doubt that new writers (and even experienced writers) might experience.

The guide wasn't something commissioned by SitePoint; I wrote it on my own for my authors. And though it's been about eight months since PHPMaster was absorbed into the main SitePoint site and I stepped down as managing editor, people continue to ask me about it. So, I've decided to make the guide publicly available.

The New Writers Guide offers advice for finding inspiration, structuring an article's content, growing one's self-confidence, and overcoming other challenges that programming writers may face. Hopefully it'll continue to help people write awesome articles and realize the many benefits of writing in their life.

You can find a copy of the guide on GitHub at github.com/tboronczyk/WritersGuide.

Friday, April 25, 2014

Ajax File Uploads with JavaScript's File API

Developers have been using Ajax techniques for years to create dynamic web forms, but handling file uploads using Ajax was always problematic. The crux of the problem was security – it's not a good idea to allow arbitrary code access to any file it wants on a user's system so JavaScript was intentionally restricted in how it could interact with things like file input elements. Uploading a file with JavaScript was essentially a standard form submission that targeted a hidden iframe. It felt dirty but it got the job done.

The W3C began work on standardizing a File API for JavaScript sometime between 2006 and 2009 and we're now at the point with browser support where developers can take advantage of it. Developers supporting web apps on IE8 and 9 still need to use iframes, but those of us targeting newer browsers can finally take a pure JavaScript approach to file uploads. And as more users migrate from IE8/9, the iframe approach will eventually be left in the dustbin.

The interesting things defined by the W3C's File API are:

  • Blob – an object to represent a sequence of bytes and is consumed by FileReader. Its size property lists the size of the sequence in bytes and its type property is a lower-case MIME-type string if such information is available.
  • File – an object that extends Blob and offers additional properties to make the file's metadata available. Its name property holds the filename (no path information) and lastModifiedDate holds a Date object instance set to when the file was last modified.
  • FileReader – an object that reads the byte sequence of a Blob or File object.
  • FileList – a property given to file input elements which essentially is a list of File objects.

The API is designed so that byte sequences are loaded asynchronously by default. This makes sense since there are several things that can cause the read process to take a while to complete: it might be a large file, the file might be on a mounted network share, etc. Reading files asynchronously ensures the main execution thread is free and the browser doesn't lock up.

So what does a basic upload look like using the API? At a high level, the steps are:

  1. Provide a file input for the user.
  2. When the user sets a file, retrieve its File object from the input's files property.
  3. Create a FileReader instance and register a callback for its onload event. This callback will have access to the read data.
  4. Initiate the read process with the FileReader methods readAsText() or readAsDataURL().

I like to use readAsDataURL() to initiate the read process, especially for binary files like images and PDFs, since the data will be base64 encoded. The ASCII URI string can then be safely sent to the server just like any other string.

I also recommend using POST for the HTTP method; yes, the encoded contents as a data URI which can be used in a GET parameter, but doing so increases the risk of getting an HTTP/414 error because of the resulting size of the request. Base64 encodes binary content to safe ASCII which increases the data's size by roughly 130%.

<form>
 <input id="fileInput" type="file" />
</form>

<script>
document.getElementById("fileInput").onchange = function () {
    // retrieve File from input
    var file = this.files[0];

    // set FileReader's onload event
    var reader = new FileReader();
    reader.onload = function () {
        // the results of the read is available with the FileReader's
        // result property when the callback is executed
        var fileContent = this.result;

        // send fileContent to server via Ajax request
        // ...
    };
    // initiate reading
    reader.readAsDataURL(file);
};
</script>

Handling the upload once it reaches the server is different than working with traditional file uploads in PHP since the file comes into the system as “normal” user input. That is, you won't be using the $_FILES superglobal or functions like move_uploaded_file(). Instead the content will be available straight from $_POST.

The data URI format is defined by RFC 2397 looks like the following:

data:[<mediatype>][;base64],<data>

You're free to existing libraries to parse the URI or parse it yourself. The media type is optional. If present, the value is a MIME type string. If it's missing, the default value text/plain;charset=US-ASCII should be assumed. If ;base64 is present then the data is base64 encoded.

<?php
// parse out file data
list($front, $data) = explode(',', $dataUri, 2);
if (stristr($front, ';base64') !== false) {
    $data = base64_decode($data);
}

// test whether the file is a valid image
try {
    $image = new \Imagick();
    $image->readImageBlob($data);
}
catch (\ImagickException $e) {
    header('HTTP/1.0 400 Bad Request');
    exit;
}

// do something with $image
// ...

Posting a file as data URI protects you from some of the security vulnerabilities that are typically inherent when dealing with files. Data URIs don't account for filenames, for instance, so you're safe from directory traversal attacks by maliciously named files. Still, you should treat the URI as you would any other piece of user-supplied data. Your application will obviously dictate how you filter and validate the file.

A secondary concern is the possibility of a malicious person using large file posts as a vector for a denial of service attack. The traditional upload approaches must mitigate this risk, and an Ajax approach must do so as well. Make certain you review the memory_limit and post_max_size entries in your php.ini, and keep in mind the tradeoff between size and ASCII-safety when using base64 encoding.

This isn't the first post on the Internet to deal with Ajax file uploads or JavaScript's File API, but many of them provide little beyond code samples. Hopefully I've remedied the situation by providing a succinct overview of the API's important objects/interfaces and discussing how receiving the file is different using this approach. If there's something I've neglected, feel free to leave a comment!

Thursday, February 20, 2014

Fixing "MySQL server has gone away" Errors in C

I ran across an old question on Stack Overflow the other day in which a user was having issues maintaining his connection to MySQL from C. I left a brief answer there for anyone else who might stumble across the same problem in the future, but I felt it was worth expanding on a bit more.

The error "MySQL server has gone away" means the client's connection to the MySQL server was lost. This could be because of many reasons; perhaps MySQL isn't running, perhaps there's network problems, or perhaps there was no activity after a certain amount of time and the server closed the connection. Detailed information on the error is available in the MySQL documentation.

It's possible for the client to attempt to re-connect to the server when it's "gone away" although it won't try to by default. To enable the reconnecting behavior, you need to set the MYSQL_OPT_RECONNECT option to 1 using the mysql_options() function. It should be set after mysql_init() is called and before calling mysql_real_connect(). This should solve the problem if the connection was closed by the server because of a time-out.

The MySQL documentation that discusses the reconnect behavior points out that only one re-connect attempt will be made, which means the query can still fail if the server is stopped or inaccessible. I ran across this problem myself while writing a daemon in C that would periodically pull data from MySQL. The daemon was polling at set intervals far less than the time-out period, so any such errors were the result of an unreachable or stopped server. I simply jumped execution to just prior to my work loop's sleep() call and the daemon would periodically try to re-connect until the server came back up.

#define DBHOSTNAME localhost
#define DBHOSTNAME dbuser
...

MYSQL *db = mysql_init(NULL);
if (db == NULL) {
    fprintf(stderr, "Insufficient memory to allocate MYSQL object.");
    exit(EXIT_FAILURE);
}

/* enable re-connect behavior */
my_bool reconnect = 1;
int success = mysql_options(db, MYSQL_OPT_RECONNECT, &reconnect);
assert(success == 0);

if (mysql_real_connect(db, DBHOSTNAME, DBUSERNAME, DBPASSWORD, DBDATABASE,
    0, NULL, 0) == NULL) {
    fprintf(stderr, "Connection attempt failed: %s\n", mysql_error(db));
    exit(EXIT_FAILURE);
}

for (;;) {
    success = mysql_query(db, "<MYSQL QUERY HERE>");
    if (success != 0) {
        /* The error is most likely "gone away" since the query is
         * hard-coded, doesn't return much data, and the result is
         * managed properly. */
        fprintf(stderr, "Unable to query: %s\n", mysql_error(db));
        goto SLEEP;
    }

    /* call mysql_use_result() and do something with data */
    ...

    SLEEP:
    sleep(SLEEP_SECONDS);
}

Thursday, February 13, 2014

Generating C Code and Compiling from STDIN

Lately I've been exploring some syslog configurations and needed to generate some log messages to verify they were routed correctly. Of course doing so programmatically would provide an easy and repeatable method to generate a batch of fresh log messages whenever I needed, but because of the number of facilities and priorities defined by the syslog protocol, it made sense to write a code generator to iterate the different permutations.

The following Lua script generates boilerplate C code for each of the 64 messages needed to test LOG_LOCAL 0-7 with all priorities. I chose generating the code in this manner over writing a nested facilities/priorities loop directly in C so I could easily include a textual representation of the facility and priority constants in the log message (this seemed like a cleaner solution to me than having to maintain a mapping of constants to char* strings as well). And why Lua? Well, it seemed a better idea than M4. :)

#! /usr/bin/env lua

local facilities = {
    "LOG_LOCAL0",
    "LOG_LOCAL1",
    "LOG_LOCAL2",
    "LOG_LOCAL3",
    "LOG_LOCAL4",
    "LOG_LOCAL5",
    "LOG_LOCAL6",
    "LOG_LOCAL7"
}

local priorities = {
    "LOG_DEBUG",
    "LOG_INFO",
    "LOG_NOTICE",
    "LOG_WARNING",
    "LOG_ERR",
    "LOG_CRIT",
    "LOG_ALERT",
    "LOG_EMERG"
}

print([[
#include <stdlib.h>
#include <syslog.h>
#include <libgen.h>

int main(int argc, char *argv[])
{
    char *appName = basename(argv[0]);
]])

for _, facility in pairs(facilities) do 
    for _, priority in pairs(priorities) do
        print(string.format(
[[
    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, %s);
    syslog(%s, "Test %s.%s message.\n");
    closelog();
]],
            facility, priority, facility, priority
        ))
    end
end

print([[
    return EXIT_SUCCESS;
}]])

Running the script will output the desired C code, which looks like this:

#include <stdlib.h>
#include <syslog.h>
#include <libgen.h>

int main(int argc, char *argv[])
{
    char *appName = basename(argv[0]);

    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
    syslog(LOG_DEBUG, "Test LOG_DEBUG message.\n");
    closelog();

    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
    syslog(LOG_INFO, "Test LOG_INFO message.\n");
    closelog();

    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
    syslog(LOG_NOTICE, "Test LOG_NOTICE message.\n");
    closelog();
...

If I wanted to inspect or tweak the generated code, I could pipe the script's output to a file before compiling it:

./gen-syslog-tests.lua > syslog-tests.c
gcc -o syslog-tests syslog-tests.c

But if I just wanted the compiled binary and had no need to modify the code, it seems inelegant to write things out to a file. Here's where I learned it's possible for gcc to compile code piped in on STDIN.

./gen-syslog-tests.lua | gcc -o syslog-tests -xc -

The two things of note are: gcc can't deduce the programming language from the file extension (since there is no file) so the -x flag is necessary to identify the language, and - is used as the file name (a convention commonly used to indicate reading from STDIN as a file).

Monday, December 16, 2013

Esperanto Accented Characters in Windows

It's not as easy to set up as clicking a checkbox like Ubuntu/Gnome, but it is possible to type proper Esperanto characters in Windows using Right Alt as a modifier key. You need to create and install an alternate keyboard layout and then set the new layout active.

The program Keyboard Layout Creator is used to create the layout, and is available for free from Microsoft. Once it's downloaded and installed, start the program. Navigate File > Load Existing Keyboard and then select your primary keyboard layout (standard US layout in my case). You'll use this as a base and augment it with the Esperanto characters.

For each key that will should an accented character, right-click its position on the virtual keyboard and click "Properties for VK_? in all shift states". A dialog will appear in which the necessary Unicode code points can be entered.

The code points for the accented Esperanto letters are shown below, as well as for the Euro and Spesmilo just for fun:

If you don't want to enter the Unicode values yourself, feel free to use a copy of my keyboard definition file.

When you're finished setting the code points for each letter, navigate Project > Test Keyboard Layout to test them. Then, navigate Project > Properties to provide the necessary name and other descriptive information for the new layout. The name cannot be longer than eight characters, so I simply named mine "EO".

Once you're satisfied with the layout, navigate Project > Build DLL and Setup Package. The keyboard layout will be compiled to a binary format usable by Windows and be saved to your hard drive. Run the setup.exe installer that was written to disk install the layout. The installer will detect your system's architecture and launch the appropriate sub-installer.

Restart your computer once the installer is finished. You'll then be able to toggle between your original layout and the Esperanto layout using the Language Bar.

I set the augmented layout as my default keyboard layout (although I don't recommend this unless you're computer savvy). To do this on Windows 7, go to the Start menu, type "language" in the search bar, and select "Change keyboard and input methods". Click the "Change keyboards" button and you'll see the Text Services and Input Languages dialog. Under the General tab, set the new layout as the default input language and remove the entry for your original layout in the installed services tree.

On Windows 8, start typing "language" on the Start screen and then select "Change input methods" from the Settings group.

The Windows 8 Language panel more or less provides the same functionality as its Windows 7 counterpart but with a less user-friendly manner. The Input method is accessible through the options link.

Friday, November 29, 2013

Password Woes

Happy belated International Change-Your-Password Week! Earlier this month, thanks to the generous sponsorship by the great folks at Adobe, people all around the world were changing their passwords and tech blogs were parroting guidelines for choosing a strong password. But let’s be honest – passwords are a hassle. And, as Adobe was so kind to remind us, even the strongest unique password can be an open door if the company storing it isn’t doing so competently.

As someone who is a programmer, I’m aware of several technical solutions to our password woes. As someone who suffers from cynical realism, I believe the barrier to adopting these solutions to be red-tape and human nature (ego and laziness). There’s no reason for every website to require their own login credentials when OpenID and OAuth exist. Perhaps we should increase liability for password storers and provide incentives to the crackers who hack them. A smart company would migrate to an SSO-provider to mitigate their responsibility and the provider would be diligent in protecting the hashes.

But as much as anyone would like to mitigate responsibility, the fact remains that it’s the individual who’s most affected by password breeches, not corporations. Are there secure ways to ease the burden of password management?

I’ve been trying out KeePass this past week and my overall impression of the program is fair to middling. I’m storing the encrypted password database to Dropbox for the computers I use the most, and keep a duplicate copy of the database on a thumbdrive with a portable version of KeePass for when I need to use someone else’s computer. Although the premise seems secure, and I trust their implementation to be solid, some of the program’s incidentals frustrate me.

KeePass is fine on Windows but almost unusable on Linux. Unfortunately in this case, a good 90% of my day is spent using Linux. I've also noticed that the Auto-Fill feature toggles back to the most recently used window, so if an IM dialog pops up while I'm toggling to KeePass, the password is leaked. I could spend some time scripting in the advanced sections to safe guard against this, but that seems like a hassle.

I’ve also pondered the idea, so long as it contained accented characters, whether I might be able to get away with using the same password for everything. If the website is using proper encryption practices (Blowfish with scalable cost – i.e. Bcrypt – and random salt) then a rainbow table attack is going to be useless. Those sites that aren't have already proven their incompetence, so they probably don't know how to handle UTF-8 correctly either. The password value would be corrupted, truncated, or filtered, and most likely result in differing hashes between different sites... almost like using the site’s algorithm as your own salt! And brute-force crackers probably aren’t using Esperanto dictionaries; “@D0B3.fuŝ1s!” seems secure, doesn’t it?

Ultimately, programs like KeePass only serve as a bandage and don’t address the core problem, and ubiquitous use of SSO-providers is still a pipe-dream. While we’re all stuck in Password Hell, waiting for the next password-change holiday, the best we can do is keep Clifford Stoll’s advice in mind: “Treat your password like your toothbrush. Don't let anybody else use it, and get a new one every six months.”