Thursday, May 1, 2014

New Writers Guide now on GitHub

Writing can be a fun and rewarding way to share your knowledge, experience, and opinions with others. Unfortunately, it can also be intimidating or frustrating for some people. When I was managing editor for SitePoint's PHPMaster property, I prepared a guide to help alleviate some of the frustration and self-doubt that new writers (and even experienced writers) might experience.

The guide wasn't something commissioned by SitePoint; I wrote it on my own for my authors. And though it's been about eight months since PHPMaster was absorbed into the main SitePoint site and I stepped down as managing editor, people continue to ask me about it. So, I've decided to make the guide publicly available.

The New Writers Guide offers advice for finding inspiration, structuring an article's content, growing one's self-confidence, and overcoming other challenges that programming writers may face. Hopefully it'll continue to help people write awesome articles and realize the many benefits of writing in their life.

You can find a copy of the guide on GitHub at github.com/tboronczyk/WritersGuide.

Friday, April 25, 2014

Ajax File Uploads with JavaScript's File API

Developers have been using Ajax techniques for years to create dynamic web forms, but handling file uploads using Ajax was always problematic. The crux of the problem was security – it's not a good idea to allow arbitrary code access to any file it wants on a user's system so JavaScript was intentionally restricted in how it could interact with things like file input elements. Uploading a file with JavaScript was essentially a standard form submission that targeted a hidden iframe. It felt dirty but it got the job done.

The W3C began work on standardizing a File API for JavaScript sometime between 2006 and 2009 and we're now at the point with browser support where developers can take advantage of it. Developers supporting web apps on IE8 and 9 still need to use iframes, but those of us targeting newer browsers can finally take a pure JavaScript approach to file uploads. And as more users migrate from IE8/9, the iframe approach will eventually be left in the dustbin.

The interesting things defined by the W3C's File API are:

  • Blob – an object to represent a sequence of bytes and is consumed by FileReader. Its size property lists the size of the sequence in bytes and its type property is a lower-case MIME-type string if such information is available.
  • File – an object that extends Blob and offers additional properties to make the file's metadata available. Its name property holds the filename (no path information) and lastModifiedDate holds a Date object instance set to when the file was last modified.
  • FileReader – an object that reads the byte sequence of a Blob or File object.
  • FileList – a property given to file input elements which essentially is a list of File objects.

The API is designed so that byte sequences are loaded asynchronously by default. This makes sense since there are several things that can cause the read process to take a while to complete: it might be a large file, the file might be on a mounted network share, etc. Reading files asynchronously ensures the main execution thread is free and the browser doesn't lock up.

So what does a basic upload look like using the API? At a high level, the steps are:

  1. Provide a file input for the user.
  2. When the user sets a file, retrieve its File object from the input's files property.
  3. Create a FileReader instance and register a callback for its onload event. This callback will have access to the read data.
  4. Initiate the read process with the FileReader methods readAsText() or readAsDataURL().

I like to use readAsDataURL() to initiate the read process, especially for binary files like images and PDFs, since the data will be base64 encoded. The ASCII URI string can then be safely sent to the server just like any other string.

I also recommend using POST for the HTTP method; yes, the encoded contents as a data URI which can be used in a GET parameter, but doing so increases the risk of getting an HTTP/414 error because of the resulting size of the request. Base64 encodes binary content to safe ASCII which increases the data's size by roughly 130%.

<form>
 <input id="fileInput" type="file" />
</form>

<script>
document.getElementById("fileInput").onchange = function () {
    // retrieve File from input
    var file = this.files[0];

    // set FileReader's onload event
    var reader = new FileReader();
    reader.onload = function () {
        // the results of the read is available with the FileReader's
        // result property when the callback is executed
        var fileContent = this.result;

        // send fileContent to server via Ajax request
        // ...
    };
    // initiate reading
    reader.readAsDataURL(file);
};
</script>

Handling the upload once it reaches the server is different than working with traditional file uploads in PHP since the file comes into the system as “normal” user input. That is, you won't be using the $_FILES superglobal or functions like move_uploaded_file(). Instead the content will be available straight from $_POST.

The data URI format is defined by RFC 2397 looks like the following:

data:[<mediatype>][;base64],<data>

You're free to existing libraries to parse the URI or parse it yourself. The media type is optional. If present, the value is a MIME type string. If it's missing, the default value text/plain;charset=US-ASCII should be assumed. If ;base64 is present then the data is base64 encoded.

<?php
// parse out file data
list($front, $data) = explode(',', $dataUri, 2);
if (stristr($front, ';base64') !== false) {
    $data = base64_decode($data);
}

// test whether the file is a valid image
try {
    $image = new \Imagick();
    $image->readImageBlob($data);
}
catch (\ImagickException $e) {
    header('HTTP/1.0 400 Bad Request');
    exit;
}

// do something with $image
// ...

Posting a file as data URI protects you from some of the security vulnerabilities that are typically inherent when dealing with files. Data URIs don't account for filenames, for instance, so you're safe from directory traversal attacks by maliciously named files. Still, you should treat the URI as you would any other piece of user-supplied data. Your application will obviously dictate how you filter and validate the file.

A secondary concern is the possibility of a malicious person using large file posts as a vector for a denial of service attack. The traditional upload approaches must mitigate this risk, and an Ajax approach must do so as well. Make certain you review the memory_limit and post_max_size entries in your php.ini, and keep in mind the tradeoff between size and ASCII-safety when using base64 encoding.

This isn't the first post on the Internet to deal with Ajax file uploads or JavaScript's File API, but many of them provide little beyond code samples. Hopefully I've remedied the situation by providing a succinct overview of the API's important objects/interfaces and discussing how receiving the file is different using this approach. If there's something I've neglected, feel free to leave a comment!

Thursday, February 20, 2014

Fixing "MySQL server has gone away" Errors in C

I ran across an old question on Stack Overflow the other day in which a user was having issues maintaining his connection to MySQL from C. I left a brief answer there for anyone else who might stumble across the same problem in the future, but I felt it was worth expanding on a bit more.

The error "MySQL server has gone away" means the client's connection to the MySQL server was lost. This could be because of many reasons; perhaps MySQL isn't running, perhaps there's network problems, or perhaps there was no activity after a certain amount of time and the server closed the connection. Detailed information on the error is available in the MySQL documentation.

It's possible for the client to attempt to re-connect to the server when it's "gone away" although it won't try to by default. To enable the reconnecting behavior, you need to set the MYSQL_OPT_RECONNECT option to 1 using the mysql_options() function. It should be set after mysql_init() is called and before calling mysql_real_connect(). This should solve the problem if the connection was closed by the server because of a time-out.

The MySQL documentation that discusses the reconnect behavior points out that only one re-connect attempt will be made, which means the query can still fail if the server is stopped or inaccessible. I ran across this problem myself while writing a daemon in C that would periodically pull data from MySQL. The daemon was polling at set intervals far less than the time-out period, so any such errors were the result of an unreachable or stopped server. I simply jumped execution to just prior to my work loop's sleep() call and the daemon would periodically try to re-connect until the server came back up.

#define DBHOSTNAME localhost
#define DBHOSTNAME dbuser
...

MYSQL *db = mysql_init(NULL);
if (db == NULL) {
    fprintf(stderr, "Insufficient memory to allocate MYSQL object.");
    exit(EXIT_FAILURE);
}

/* enable re-connect behavior */
my_bool reconnect = 1;
int success = mysql_options(db, MYSQL_OPT_RECONNECT, &reconnect);
assert(success == 0);

if (mysql_real_connect(db, DBHOSTNAME, DBUSERNAME, DBPASSWORD, DBDATABASE,
    0, NULL, 0) == NULL) {
    fprintf(stderr, "Connection attempt failed: %s\n", mysql_error(db));
    exit(EXIT_FAILURE);
}

for (;;) {
    success = mysql_query(db, "<MYSQL QUERY HERE>");
    if (success != 0) {
        /* The error is most likely "gone away" since the query is
         * hard-coded, doesn't return much data, and the result is
         * managed properly. */
        fprintf(stderr, "Unable to query: %s\n", mysql_error(db));
        goto SLEEP;
    }

    /* call mysql_use_result() and do something with data */
    ...

    SLEEP:
    sleep(SLEEP_SECONDS);
}

Thursday, February 13, 2014

Generating C Code and Compiling from STDIN

Lately I've been exploring some syslog configurations and needed to generate some log messages to verify they were routed correctly. Of course doing so programmatically would provide an easy and repeatable method to generate a batch of fresh log messages whenever I needed, but because of the number of facilities and priorities defined by the syslog protocol, it made sense to write a code generator to iterate the different permutations.

The following Lua script generates boilerplate C code for each of the 64 messages needed to test LOG_LOCAL 0-7 with all priorities. I chose generating the code in this manner over writing a nested facilities/priorities loop directly in C so I could easily include a textual representation of the facility and priority constants in the log message (this seemed like a cleaner solution to me than having to maintain a mapping of constants to char* strings as well). And why Lua? Well, it seemed a better idea than M4. :)

#! /usr/bin/env lua

local facilities = {
    "LOG_LOCAL0",
    "LOG_LOCAL1",
    "LOG_LOCAL2",
    "LOG_LOCAL3",
    "LOG_LOCAL4",
    "LOG_LOCAL5",
    "LOG_LOCAL6",
    "LOG_LOCAL7"
}

local priorities = {
    "LOG_DEBUG",
    "LOG_INFO",
    "LOG_NOTICE",
    "LOG_WARNING",
    "LOG_ERR",
    "LOG_CRIT",
    "LOG_ALERT",
    "LOG_EMERG"
}

print([[
#include <stdlib.h>
#include <syslog.h>
#include <libgen.h>

int main(int argc, char *argv[])
{
    char *appName = basename(argv[0]);
]])

for _, facility in pairs(facilities) do 
    for _, priority in pairs(priorities) do
        print(string.format(
[[
    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, %s);
    syslog(%s, "Test %s.%s message.\n");
    closelog();
]],
            facility, priority, facility, priority
        ))
    end
end

print([[
    return EXIT_SUCCESS;
}]])

Running the script will output the desired C code, which looks like this:

#include <stdlib.h>
#include <syslog.h>
#include <libgen.h>

int main(int argc, char *argv[])
{
    char *appName = basename(argv[0]);

    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
    syslog(LOG_DEBUG, "Test LOG_DEBUG message.\n");
    closelog();

    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
    syslog(LOG_INFO, "Test LOG_INFO message.\n");
    closelog();

    openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
    syslog(LOG_NOTICE, "Test LOG_NOTICE message.\n");
    closelog();
...

If I wanted to inspect or tweak the generated code, I could pipe the script's output to a file before compiling it:

./gen-syslog-tests.lua > syslog-tests.c
gcc -o syslog-tests syslog-tests.c

But if I just wanted the compiled binary and had no need to modify the code, it seems inelegant to write things out to a file. Here's where I learned it's possible for gcc to compile code piped in on STDIN.

./gen-syslog-tests.lua | gcc -o syslog-tests -xc -

The two things of note are: gcc can't deduce the programming language from the file extension (since there is no file) so the -x flag is necessary to identify the language, and - is used as the file name (a convention commonly used to indicate reading from STDIN as a file).

Monday, December 16, 2013

Esperanto Accented Characters in Windows

It's not as easy to set up as clicking a checkbox like Ubuntu/Gnome, but it is possible to type proper Esperanto characters in Windows using Right Alt as a modifier key. You need to create and install an alternate keyboard layout and then set the new layout active.

The program Keyboard Layout Creator is used to create the layout, and is available for free from Microsoft. Once it's downloaded and installed, start the program. Navigate File > Load Existing Keyboard and then select your primary keyboard layout (standard US layout in my case). You'll use this as a base and augment it with the Esperanto characters.

For each key that will should an accented character, right-click its position on the virtual keyboard and click "Properties for VK_? in all shift states". A dialog will appear in which the necessary Unicode code points can be entered.

The code points for the accented Esperanto letters are shown below, as well as for the Euro and Spesmilo just for fun:

If you don't want to enter the Unicode values yourself, feel free to use a copy of my keyboard definition file.

When you're finished setting the code points for each letter, navigate Project > Test Keyboard Layout to test them. Then, navigate Project > Properties to provide the necessary name and other descriptive information for the new layout. The name cannot be longer than eight characters, so I simply named mine "EO".

Once you're satisfied with the layout, navigate Project > Build DLL and Setup Package. The keyboard layout will be compiled to a binary format usable by Windows and be saved to your hard drive. Run the setup.exe installer that was written to disk install the layout. The installer will detect your system's architecture and launch the appropriate sub-installer.

Restart your computer once the installer is finished. You'll then be able to toggle between your original layout and the Esperanto layout using the Language Bar.

I set the augmented layout as my default keyboard layout (although I don't recommend this unless you're computer savvy). To do this on Windows 7, go to the Start menu, type "language" in the search bar, and select "Change keyboard and input methods". Click the "Change keyboards" button and you'll see the Text Services and Input Languages dialog. Under the General tab, set the new layout as the default input language and remove the entry for your original layout in the installed services tree.

On Windows 8, start typing "language" on the Start screen and then select "Change input methods" from the Settings group.

The Windows 8 Language panel more or less provides the same functionality as its Windows 7 counterpart but with a less user-friendly manner. The Input method is accessible through the options link.

Friday, November 29, 2013

Password Woes

Happy belated International Change-Your-Password Week! Earlier this month, thanks to the generous sponsorship by the great folks at Adobe, people all around the world were changing their passwords and tech blogs were parroting guidelines for choosing a strong password. But let’s be honest – passwords are a hassle. And, as Adobe was so kind to remind us, even the strongest unique password can be an open door if the company storing it isn’t doing so competently.

As someone who is a programmer, I’m aware of several technical solutions to our password woes. As someone who suffers from cynical realism, I believe the barrier to adopting these solutions to be red-tape and human nature (ego and laziness). There’s no reason for every website to require their own login credentials when OpenID and OAuth exist. Perhaps we should increase liability for password storers and provide incentives to the crackers who hack them. A smart company would migrate to an SSO-provider to mitigate their responsibility and the provider would be diligent in protecting the hashes.

But as much as anyone would like to mitigate responsibility, the fact remains that it’s the individual who’s most affected by password breeches, not corporations. Are there secure ways to ease the burden of password management?

I’ve been trying out KeePass this past week and my overall impression of the program is fair to middling. I’m storing the encrypted password database to Dropbox for the computers I use the most, and keep a duplicate copy of the database on a thumbdrive with a portable version of KeePass for when I need to use someone else’s computer. Although the premise seems secure, and I trust their implementation to be solid, some of the program’s incidentals frustrate me.

KeePass is fine on Windows but almost unusable on Linux. Unfortunately in this case, a good 90% of my day is spent using Linux. I've also noticed that the Auto-Fill feature toggles back to the most recently used window, so if an IM dialog pops up while I'm toggling to KeePass, the password is leaked. I could spend some time scripting in the advanced sections to safe guard against this, but that seems like a hassle.

I’ve also pondered the idea, so long as it contained accented characters, whether I might be able to get away with using the same password for everything. If the website is using proper encryption practices (Blowfish with scalable cost – i.e. Bcrypt – and random salt) then a rainbow table attack is going to be useless. Those sites that aren't have already proven their incompetence, so they probably don't know how to handle UTF-8 correctly either. The password value would be corrupted, truncated, or filtered, and most likely result in differing hashes between different sites... almost like using the site’s algorithm as your own salt! And brute-force crackers probably aren’t using Esperanto dictionaries; “@D0B3.fuŝ1s!” seems secure, doesn’t it?

Ultimately, programs like KeePass only serve as a bandage and don’t address the core problem, and ubiquitous use of SSO-providers is still a pipe-dream. While we’re all stuck in Password Hell, waiting for the next password-change holiday, the best we can do is keep Clifford Stoll’s advice in mind: “Treat your password like your toothbrush. Don't let anybody else use it, and get a new one every six months.”

Tuesday, September 10, 2013

Urba Semajnfino: Sirakuso a Success

The following is an English translation of an article I wrote for La Ondo de Esperanto to share the Urban Weekend: Syracuse event. Thank you to everyone who attended and helped make the event a success.

Urban Weekend: Syracuse, the third Urban Weekend event to happen in the United States, took place during the weekend of August 31 in Syracuse, New York. Esperantists came from near and far to meet new friends and explore the city. As the main organizer, I was a bit nervous. I had never organized an Esperanto event before. Would the weather hold out? Would anyone come? Would they enjoy their time together? But indeed the weather was beautiful, and people came from Rochester NY, Virginia, and even Brazil. Everyone had fun and Urban Weekend: Syracuse was a success!

A little before noon on Saturday, four of us met the city's central park and then walked to a nearby restaurant for lunch. The restaurant is popular for its beer, brewed on-site, and also for its support of Central New York agriculture by using locally-grown ingredients.

After lunch we walked about in the city for a bit and made our way to two museums. The first, the Erie Canal Museum, remembers the Erie Canal which connected Lake Erie to the Hudson River, and there we met two more esperantists. The canal no longer exists in its current form, but it has historical significance to both the region and the United States because it opened the Great Lakes to the Atlantic Ocean and enabled westward migration. Everyone enjoyed learning how the canal helped shape the country and seeing how life was like for those who travelled it almost 200 years ago.

The second museum, the Everson Museum of Art, is an art gallery known for its ceramics, pottery, and film exhibits. The collection may not be as impressive as the ones found in larger museums, but it has its several pieces worth enjoying. And perhaps even more special, the museum building was designed by the internationally acclaimed architect IM Pei who also designed the Pyramide du Louvre in Paris.

After exploring some of the art and history of Syracuse, we were hungry and were ready to eat. The six of us went to a Mexican restaurant occupying a former church building. Even this building had significance; the church was a station in the Underground Railroad in the 19th century. A secret tunnel under the church was a refuge for slaves running north in search of their freedom.

To finish the first day, we socialized and watched a film - House of Ghosts, a comical horror film dubbed with Esperanto voice and subtitles.

Most of the day Sunday was spent visiting the zoo, home to over 700 animals. A family of five esperantists who couldn't attend the first day joined us. The children in the group loved looking at the elephants, penguins, and lions. It was also a good opportunity for the adults to improve their animal-related vocabulary.

We ate lunch after the zoo in a near-by popular Irish restaurant; the food was great, and there were some local musicians playing in the pub that we enjoyed. The neighborhood where the restaurant is located was settled by Irish immigrants who came to work on the Erie Canal, and near the restaurant is the famous “green on top” traffic light. As the story goes, the settlers wouldn't allow red (the color of the British) to sit above green, and they threw tones at the light in protest anytime the city tried to hang the light correctly.

Weekend events similar to Urban Weekend are good for busy esperantists who are not able to attend the longer major events, and like all Esperanto gatherings, is a good opportunity to meet new friends, explore new places, and take part in Esperantujo. If one is held near you, I highly recommend that you participate. If not, why not organize your own? It's easier than you might think (I speak from experience!). The Manlibro pri Urba Semajnfino is a good place to start.