Friday, April 25, 2014

Ajax File Uploads with JavaScript's File API

Developers have been using Ajax techniques for years to create dynamic web forms, but handling file uploads using Ajax was always problematic. The crux of the problem was security – it's not a good idea to allow arbitrary code access to any file it wants on a user's system so JavaScript was intentionally restricted in how it could interact with things like file input elements. Uploading a file with JavaScript was essentially a standard form submission that targeted a hidden iframe. It felt dirty but it got the job done.

The W3C began work on standardizing a File API for JavaScript sometime between 2006 and 2009 and we're now at the point with browser support where developers can take advantage of it. Developers supporting web apps on IE8 and 9 still need to use iframes, but those of us targeting newer browsers can finally take a pure JavaScript approach to file uploads. And as more users migrate from IE8/9, the iframe approach will eventually be left in the dustbin.

The interesting things defined by the W3C's File API are:

  • Blob – an object to represent a sequence of bytes and is consumed by FileReader. Its size property lists the size of the sequence in bytes and its type property is a lower-case MIME-type string if such information is available.
  • File – an object that extends Blob and offers additional properties to make the file's metadata available. Its name property holds the filename (no path information) and lastModifiedDate holds a Date object instance set to when the file was last modified.
  • FileReader – an object that reads the byte sequence of a Blob or File object.
  • FileList – a property given to file input elements which essentially is a list of File objects.

The API is designed so that byte sequences are loaded asynchronously by default. This makes sense since there are several things that can cause the read process to take a while to complete: it might be a large file, the file might be on a mounted network share, etc. Reading files asynchronously ensures the main execution thread is free and the browser doesn't lock up.

So what does a basic upload look like using the API? At a high level, the steps are:

  1. Provide a file input for the user.
  2. When the user sets a file, retrieve its File object from the input's files property.
  3. Create a FileReader instance and register a callback for its onload event. This callback will have access to the read data.
  4. Initiate the read process with the FileReader methods readAsText() or readAsDataURL().

I like to use readAsDataURL() to initiate the read process, especially for binary files like images and PDFs, since the data will be base64 encoded. The ASCII URI string can then be safely sent to the server just like any other string.

I also recommend using POST for the HTTP method; yes, the encoded contents as a data URI which can be used in a GET parameter, but doing so increases the risk of getting an HTTP/414 error because of the resulting size of the request. Base64 encodes binary content to safe ASCII which increases the data's size by roughly 130%.

<form>
 <input id="fileInput" type="file" />
</form>

<script>
document.getElementById("fileInput").onchange = function () {
    // retrieve File from input
    var file = this.files[0];

    // set FileReader's onload event
    var reader = new FileReader();
    reader.onload = function () {
        // the results of the read is available with the FileReader's
        // result property when the callback is executed
        var fileContent = this.result;

        // send fileContent to server via Ajax request
        // ...
    };
    // initiate reading
    reader.readAsDataURL(file);
};
</script>

Handling the upload once it reaches the server is different than working with traditional file uploads in PHP since the file comes into the system as “normal” user input. That is, you won't be using the $_FILES superglobal or functions like move_uploaded_file(). Instead the content will be available straight from $_POST.

The data URI format is defined by RFC 2397 looks like the following:

data:[<mediatype>][;base64],<data>

You're free to existing libraries to parse the URI or parse it yourself. The media type is optional. If present, the value is a MIME type string. If it's missing, the default value text/plain;charset=US-ASCII should be assumed. If ;base64 is present then the data is base64 encoded.

<?php
// parse out file data
list($front, $data) = explode(',', $dataUri, 2);
if (stristr($front, ';base64') !== false) {
    $data = base64_decode($data);
}

// test whether the file is a valid image
try {
    $image = new \Imagick();
    $image->readImageBlob($data);
}
catch (\ImagickException $e) {
    header('HTTP/1.0 400 Bad Request');
    exit;
}

// do something with $image
// ...

Posting a file as data URI protects you from some of the security vulnerabilities that are typically inherent when dealing with files. Data URIs don't account for filenames, for instance, so you're safe from directory traversal attacks by maliciously named files. Still, you should treat the URI as you would any other piece of user-supplied data. Your application will obviously dictate how you filter and validate the file.

A secondary concern is the possibility of a malicious person using large file posts as a vector for a denial of service attack. The traditional upload approaches must mitigate this risk, and an Ajax approach must do so as well. Make certain you review the memory_limit and post_max_size entries in your php.ini, and keep in mind the tradeoff between size and ASCII-safety when using base64 encoding.

This isn't the first post on the Internet to deal with Ajax file uploads or JavaScript's File API, but many of them provide little beyond code samples. Hopefully I've remedied the situation by providing a succinct overview of the API's important objects/interfaces and discussing how receiving the file is different using this approach. If there's something I've neglected, feel free to leave a comment!