tag:blogger.com,1999:blog-32240088083454293902024-03-17T03:51:51.901-04:00ZaemisThe Blog of Timothy Boronczyk - running my mouth off one blog post at a timeTimothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.comBlogger89125tag:blogger.com,1999:blog-3224008808345429390.post-3099518655818871532016-02-16T23:20:00.000-05:002017-03-21T21:49:56.692-04:00Safely Identify Dependencies for Chrooting<img src="https://i.imgur.com/2dSUnFW.jpg">
<p>The most difficult part of setting up a chroot environment is identifying dependencies for the programs you want to copy to the jail. For example, to make <tt>cp</tt> available, not only do you need to copy its binary from <tt>/bin</tt> and any shared libraries it depends on, but the dependencies can have their own dependencies too that need to be copied.
<p>The internet suggests using <tt>ldd</tt> to list a binary’s dependencies, but that has its own problems. The man page for <tt>ldd</tt> warns not to use the script for untrusted programs because it works by setting a special environment variable and then executes the program. What’s a security-conscious systems administrator to do?
<p>The <tt>ldd</tt> man page recommends <tt>objdump</tt> as a safe alternative. <tt>objdump</tt> outputs information about an object file, including what shared libraries it links against. It doesn’t identify the dependencies’ dependencies, but it’s still a good start because it doesn’t try to execute the target file. We can overcome the dependencies of dependencies problem later using recursion.
<p>First, let’s look at the output of <tt>objdump</tt> to see what we have to work with.
<pre>$ <b>objdump -p /bin/cp</b>
/bin/cp: file format elf64-x86-64
Program Header:
PHDR off 0x00004000 vaddr 0x00400040 paddr 0x00400040 align 2**3
fliesz 0x000001f8 memsz 0x000001f8 flags r-x
INTERP off 0x00000238 vaddr 0x00400238 paddr 0x00400238 align 2**0
fliesz 0x0000001c memsz 0x0000001c flags r-x
...
Dynamic Section:
NEEDED libselinux.so.1
NEEDED libacl.so.1
NEEDED libattr.so
NEEDED libc.so.6
INIT 0x00402bb8
...</pre>
<p>The libraries we’re interested in are listed under Dynamic Section and preceded by NEEDED. We can fetch the list using <tt>awk</tt> to match those lines and return the second column.
<pre>$ <b>objdump -p /bin/cp | awk '/NEEDED/ { print $2 }'</b>
libselinux.so.1
libacl.so.1
libattr.so.1
libc.so.6</pre>
<p>Next, we need to find the actual libraries within the filesystem because the paths are needed to find their dependencies with <tt>objdump</tt>. We can do this with find to search the root filesystem for each item and print its location.
<pre>$ <b>shared=$(objdump -p /bin/cp | awk '/NEEDED/ { print $2 }')</b>
$ <b>for s in $shared; do</b>
> <b> find / -name "$s" -executable -print -quit</b>
> <b>done</b>
/usr/lib/64/libselinux.so.1
/usr/lib/64/libacl.so.1
/usr/lib/64/libattr.so.1
/usr/lib/64/libc.so.6</pre>
<p>The hard part is behind us—finding the program’s dependencies. The next step is to create a recursive function to identify the dependencies of each dependency.
<pre>$ <b>deplibs()(</b>
> <b>shared=$(objdump -p "$1" | awk '/NEEDED/ { print $2 }')</b>
> <b>for s in $shared; do</b>
> <b>dep=$(find / -name "$s" -executable -print -quit)</b>
> <b>echo "$dep"</b>
> <b>deplibs "$dep"</b>
> <b>done</b>
><b>)</b>
$ <b>deplibs /usr/bin/cp</b>
/usr/lib64/libselinux.so.1
/usr/lib64/libpcre.so.1
/usr/lib64/libpthread.so.0
/usr/lib64/libc.so.6
/usr/lib64/ld-linux-x86-64.so.2
/usr/lib64/ld-linux-x86-64.so.2
/usr/lib64/libc.so.6
...</pre>
<p>Invoking the function now gives us a full list... well, almost too full of a list. Notice there are some libraries listed multiple times. They’re a dependency of multiple items and are identified repeatedly by the recursive calls. It’s trivial to eliminate the duplicates with <tt>sort</tt>.
<pre>$ <b>deplibs /usr/bin/cp | sort -u</b>
/usr/lib64/ld-linux-x86-64.so.2
/usr/lib64/libacl.so.1
/usr/lib64/libattr.so.1
/usr/lib64/libc.so.6
/usr/lib64/libdl.so.2
/usr/lib64/liblzma.so.5
/usr/lib64/libpcre.so.1
/usr/lib64/libpthread.so.0
/usr/lib64/libselinux.so.1</pre>
<p>Now we have a safe alternative to <tt>lld</tt>.
<p>To see how you might take this a step further and use <tt>deplibs</tt> in a shell script, check out <a title="Copy a command and its dependencies to the specified chroot filesystem" href="https://gist.github.com/tboronczyk/00d77b1baafd13daab3b">my gist on GitHub</a> of a script to find and copy commands and their dependencies to a chroot filesystem.Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-37463137960879079392015-08-06T19:43:00.004-04:002017-03-21T21:53:02.082-04:00A Unicode fgetc() in PHP<img src="https://i.imgur.com/llb72oE.jpg">
<p>In preparation for a presentation I’m giving at this month’s Syracuse PHP Users Group meeting, I found the need to read in Unicode characters in PHP one at a time. Unicode is still second-class in PHP; PHP6 failed and we have to fallback to extensions like the <a href="http://php.net/manual/en/ref.mbstring.php">mbstring extension</a> and/or libraries like <a href="hhttp://pageconfig.com/post/portable-utf8">Portable UTF-8</a>. And even with those, I didn’t see a unicode-capable <tt>fgetc()</tt> so I wrote my own.
<p>Years ago, I wrote a post describing <a href="http://zaemis.blogspot.com/2011/06/reading-unicode-utf-8-by-hand-in-c.html">how to read Unicode characters in C</a>, so the logic was already familiar. As a refresher, UTF-8 is a multi-byte encoding scheme capable of representing over 2 million characters using 4 bytes or less. The first 128 characters are encoded the same as 7-bit ASCII with 0 as the most-significant bit. The other characters are encoded using multiple bytes, each byte with 1 as the most-significant bit. The bit pattern in the first byte of a multi-byte sequence tells us how many bytes are needed to represent the character.
<p>Here’s what the function looks like:
<pre>function ufgetc($fp)
{
// mask values for first byte's bit patterns
static $mask = [
192, // 110xxxxx
224, // 1110xxxx
240 // 11110xxx
];
// read first byte
$ch = fgetc($fp);
if ($ch === false) {
// return false on EOF
return false;
}
// single-byte character
if ((ord($ch) & $mask[0]) != $mask[0]) {
return $ch;
}
// multi-byte character
$buf = $ch;
for ($i = 0; $i < count($mask); $i++) {
if ((ord($ch) & $mask[$i]) != $mask[$i]) {
break;
}
$buf .= fgetc($fp);
}
return $buf;
}</pre>
PHP’s <tt>fgetc()</tt> reads in 8 bits at a time just like it’s counterpart in C, but these bytes are represented as a single-character string in PHP’s type system so we need to use the byte’s integer value for the mask check to succeed.Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com1tag:blogger.com,1999:blog-3224008808345429390.post-91245451636215057712015-07-26T19:33:00.000-04:002017-03-21T21:53:56.242-04:00Some Go Irks and Quirks<p>Now that <a href="http://www.amazon.com/Jump-Start-MySQL-Timothy-Boronczyk/dp/0992461286">Jump Start MySQL</a> is published, I’m taking advantage of the spare time I have on my hands while it lasts. I’ve helped organize the <a href="http://www.meetup.com/PHPSyracuse">Syracuse PHP Users Group</a>, reconnected with some old friends, and gave some love to <a href="https://github.com/tboronczyk/Kiwi">Kiwi</a>, my forever-project programming language. Moreover, I decided to rewrite Kiwi using Go as it’s one of those languages I found interesting but never had a reason to use in any serious fashion. And now that I’ve got some real experience with it, while I still find myself impressed by some of Go’s features, some things have become really annoying.
<p>I still really like Go’s data typing; it’s static, but it feels dynamic because the compiler is smart enough to deduce a value’s type. If you write your code well then you’ll rarely see a type name outside of a function signature or struct or interface definition. It’s nice to have type safety without the verbosity (yes I’m looking at you, PHP7).
<p>I wish <tt>:=</tt> behaved slightly different, though. Instead of always an allocation, it’d be nice if it could also perform basic assignments. Then we could write code like this:
<pre>foo, bar := baz()
foo.x, fizz := quux()
</pre>
But as it is now, the best we can do is:
<pre>foo, bar := baz()
var fizz MyType
foo.x, fizz = quux()
</pre>
If there’s a go-ism that works around this that you know of, feel free to let me know.
<p>The dangling comma in a list, but only when its closing brace is on a new line, is also irritating. No, it’s not a formatting issue; <tt>gofmt</tt> won’t enforce one brace placement over the other. Rather, the presence or lack of a comma is a parsing error. We can write:
<pre>{foo,
bar,
baz}
</pre>
And we can write:
<pre>{foo,
bar,
baz,
}
</pre>
But we can’t write:
<pre>{foo,
bar,
baz
}
</pre>
Perhaps it was because I was writing my own parser at the time that this bothered me. It should be trivial to accommodate the desired pattern, especially since structs and interface definitions are brace-delimited and don’t use commas at all.
<p>Go elides some traditional constructs, for example <tt>for</tt> handles for, foreach, and while loops, so why <tt>make</tt> and <tt>new</tt> still exist side-by-side, even when Rob Pike <a href="https://groups.google.com/forum/#!topic/golang-nuts/kWXYU95XN04/discussion">proposed merging them</a>, leaves me scratching my head. <tt>&Foo{}</tt> is equivalent to <tt>new(Foo)</tt>, so if there’s no need for <tt>while</tt> then there’s no need for <tt>new</tt>.
<p>I recognize these gripes are largely syntactic, but the syntax of a language is its API. Programmers are immersed in it every day and it can have an effect on how we think about things.
<p>Surprisingly though, and perhaps this is my biggest complaint, the tooling around Go is still immature. In the 6+ years after its release there is still no killer IDE. Code coverage can only be generated for one package at a time, not and entire project. It’s possible to script coverage for project-wide results but that’s just a hack. Debugging with GDB is brutal and I could not get Delve to work for me.</p>
<p>None of these irks will stop me from using Go in the future if I have the opportunity, but I’d like to suggest Go at work as the go to language (pun intended) for some of the work we do now in C. I can probably make some good technical arguments to sway our old-time C programmers, yet convincing management and the programmers fresh out of college to use Go without viable tooling is going to be a hard sell.
Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com2tag:blogger.com,1999:blog-3224008808345429390.post-29171973216700138222014-12-03T06:00:00.000-05:002017-03-21T18:12:29.509-04:00PHP Frameworks Don't Save Time<p>Experience has shown me frameworks can be useful for maintaining structure in large code base developed by multiple teams. Every developer has different abilities and a framework enforces structure and consistency throughout the code. But I've not experienced saving any substantial amount of time on a PHP project because of a framework.</p>
<p>The other day someone posted in the <a href="http://www.reddit.com/r/PHP/comments/2mzoiz/simple_project_no_framework/">PHP subreddit</a> asking for advice. He was about to begin work on a small project and wanted to know whether he should use a framework, and if so then which framework would be appropriate. I should have known better than to offer my two cents but I did anyway.</p>
<p><i><q>Slim + NotORM + Twig is nice. If it's a simple project, you probably don't need much more than that. I'm not a fan of frameworks in the slightest but I do enjoy the aforementioned combination. They're lightweight and stay out of my way, allowing me to write my functionality.</q></i></p>
<p>Another redditor picked up on my distaste for frameworks and asked:</p>
<p><i><q>So you're okay with being slower than someone with your same basic skill set? Serious question...</q></i>
<p>A serious question deserves a serious answer and so I replied, attempting to explain developer skill sets are not always the same and the differences in how we each might approach a problem has a greater effect of development time. If you like you can read my original response in the post's comments thread. Otherwise, here's a more refined presentation of my argument.</p>
<p>With regard to skill set, I'm a PHP programmer who has been coding in pure PHP for the better part of 13 years. I have an intimate relationship with the language and can probably write PHP code in my sleep. But as soon as a framework is introduced, I'm faced with a learning curve. Frustration inevitably ensues because simple things suddenly seem difficult, either because I'm unfamiliar with the new API and have to follow the framework's particular philosophy.</p>
<p>Many of my peers use frameworks, both co-workers and friends in the community. They've taken the time to learn the ins and outs of a given framework and probably can code in their sleep with it just as I do with pure PHP. But what happens when the need arises to go outside the bounds of the framework and they need to write something raw? That's when they confront their learning curve and have to dig into PHP's documentation.</p>
<p>We obviously don't all share the same basic skill set. Yes, we're all working in PHP, but my peers are experienced with a framework and I'm experienced with the nuances of the language itself. They're as fast writing their framework-based code as I am writing PHP; they're as slow writing pure PHP code as I am working with a framework.</p>
<p>But even if everything was equal on the skill side of the equation, there's still a human variable. Sharing exactly the same skills as someone else doesn't mean you'll share the same way of thinking about things or the same approach to solving a problem. Remember, there's more to programming than writing code; a large amount of time is spent simply on thinking about how to solve a problem. I can spend 6 hours planning and 2 hours coding, and a coworker can spend 7 hours planing and 1 hour coding, and although the coworker was technically faster at writing code, neither of us was actually more or less productive than the other. We both put in the same amount of time to the problem.</p>
<p>It's also noted how horribly fragmented the PHP ecosystem is. The world of a PHP programmer is not like the world of Python programmer where the community has largely settled on Django, or the world of a C# programmer where there's the .NET framework. Knowledge of Django and .NET is transferable across most Python and C# projects. But with PHP, a developer can learn ZF2, another developer can learn Yii, another may learn Larvel, and still another would learn Symfony... and little of the knowledge and experience they gain is transferable if the next project doesn't use their preferred framework. We face a potential learning curve before we even make our first keystroke on any project, and that takes time.</p>
<p>Promoting framework adoption is fine but I simply don't believe the time element is the proper argument for it. I probably wasn't as clear as I could have been in my initial response, so hopefully this clarifies things. Feel free to leave a message in the comments section if I'm still spewing senseless babble!
<p><b>PS:</b> Thanks to the kind redditor who felt my blathering response was worth Reddit Gold. You rock!</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com1tag:blogger.com,1999:blog-3224008808345429390.post-59940478680846642842014-05-01T21:39:00.000-04:002014-05-01T21:39:42.697-04:00New Writers Guide now on GitHub<p>Writing can be a fun and rewarding way to share your knowledge, experience, and opinions with others. Unfortunately, it can also be intimidating or frustrating for some people. When I was managing editor for SitePoint's PHPMaster property, I prepared a guide to help alleviate some of the frustration and self-doubt that new writers (and even experienced writers) might experience.</p>
<p>The guide wasn't something commissioned by SitePoint; I wrote it on my own for my authors. And though it's been about eight months since PHPMaster was absorbed into the main SitePoint site and I stepped down as managing editor, people continue to ask me about it. So, I've decided to make the guide publicly available.</p>
<p>The <em>New Writers Guide</em> offers advice for finding inspiration, structuring an article's content, growing one's self-confidence, and overcoming other challenges that programming writers may face. Hopefully it'll continue to help people write awesome articles and realize the many benefits of writing in their life.</p>
<p>You can find a copy of the guide on GitHub at <a href="https://github.com/tboronczyk/WritersGuide/">github.com/tboronczyk/WritersGuide</a>.</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-87501639686577770722014-04-25T22:16:00.000-04:002017-03-21T18:14:33.502-04:00Ajax File Uploads with JavaScript's File API<p>Developers have been using Ajax techniques for years to create
dynamic web forms, but handling file uploads using Ajax was always
problematic. The crux of the problem was security – it's not a good idea
to allow arbitrary code access to any file it wants on a user's system
so JavaScript was intentionally restricted in how it could interact with
things like file input elements. Uploading a file with JavaScript was
essentially a standard form submission that targeted a hidden iframe. It
felt dirty but it got the job done.</p>
<p>The W3C began work on
standardizing a <a href="http://www.w3.org/TR/FileAPI/">File API for JavaScript</a> sometime between 2006 and 2009
and we're now at the point with <a href="http://caniuse.com/fileapi">browser support</a> where developers can
take advantage of it. Developers supporting web apps on IE8 and 9 still
need to use iframes, but those of us targeting newer browsers can
finally take a pure JavaScript approach to file uploads. And as more
users migrate from IE8/9, the iframe approach will eventually be left in
the dustbin.</p>
<p>The interesting things defined by the W3C's File API are:</p>
<ul>
<li><tt>Blob</tt> – an object to represent a sequence of bytes and is consumed by
<tt>FileReader</tt>. Its <tt>size</tt> property lists the size of the sequence in bytes
and its <tt>type</tt> property is a lower-case MIME-type string if such
information is available.</li>
<li><tt>File</tt> – an object that extends <tt>Blob</tt> and offers additional properties
to make the file's metadata available. Its <tt>name</tt> property holds the
filename (no path information) and <tt>lastModifiedDate</tt> holds a <tt>Date</tt> object
instance set to when the file was last modified.</li>
<li><tt>FileReader</tt> – an object that reads the byte sequence of a <tt>Blob</tt> or <tt>File</tt> object.</li>
<li><tt>FileList</tt> – a property given to file input elements which essentially is a list of <tt>File</tt> objects.</li>
</ul>
<p>The API is designed so that byte sequences are loaded
asynchronously by default. This makes sense since there are several
things that can cause the read process to take a while to complete: it
might be a large file, the file might be on a mounted network share,
etc. Reading files asynchronously ensures the main execution thread is
free and the browser doesn't lock up.</p>
<p>So what does a basic upload look like using the API? At a high level, the steps are:</p>
<ol>
<li>Provide a file input for the user.</li>
<li>When the user sets a file, retrieve its File object from the input's <tt>files</tt> property.</li>
<li>Create a <tt>FileReader</tt> instance and register a callback for its <tt>onload</tt> event. This callback will have access to the read data.</li>
<li>Initiate the read process with the <tt>FileReader</tt> methods <tt>readAsText()</tt> or <tt>readAsDataURL()</tt>.</li>
</ol>
<p>I like to use <tt>readAsDataURL()</tt> to initiate the read process,
especially for binary files like images and PDFs, since the data will be
base64 encoded. The ASCII URI string can then be safely sent to the
server just like any other string.</p>
<p>I also recommend using
POST for the HTTP method; yes, the encoded contents as a data URI which
can be used in a GET parameter, but doing so increases the risk of
getting an HTTP/414 error because of the resulting size of the request.
Base64 encodes binary content to safe ASCII which <a href="http://stackoverflow.com/a/4715480">increases the data's
size</a> by roughly 130%.</p>
<pre style="font-size:80%;"><form>
<input id="fileInput" type="file" />
</form>
<script>
document.getElementById("fileInput").onchange = function () {
// retrieve File from input
var file = this.files[0];
// set FileReader's onload event
var reader = new FileReader();
reader.onload = function () {
// the results of the read is available with the FileReader's
// result property when the callback is executed
var fileContent = this.result;
// send fileContent to server via Ajax request
// ...
};
// initiate reading
reader.readAsDataURL(file);
};
</script></pre>
<p>Handling the upload once it reaches the
server is different than working with traditional file uploads in PHP
since the file comes into the system as “normal” user input. That is,
you won't be using the <tt>$_FILES</tt> superglobal or functions like
<tt>move_uploaded_file()</tt>. Instead the content will be available straight
from <tt>$_POST</tt>.</p>
<p>The data URI format is defined by <a href="http://tools.ietf.org/html/rfc2397">RFC 2397</a> looks like the following:</p>
<p><tt>data:[<mediatype>][;base64],<data></tt></p>
<p>You're free to existing libraries to parse the URI or parse it yourself. The
media type is optional. If present, the value is a MIME type string. If
it's missing, the default value <tt>text/plain;charset=US-ASCII</tt> should be
assumed. If <tt>;base64</tt> is present then the data is base64 encoded.</p>
<pre style="font-size:80%;"><?php
// parse out file data
list($front, $data) = explode(',', $dataUri, 2);
if (stristr($front, ';base64') !== false) {
$data = base64_decode($data);
}
// test whether the file is a valid image
try {
$image = new \Imagick();
$image->readImageBlob($data);
}
catch (\ImagickException $e) {
header('HTTP/1.0 400 Bad Request');
exit;
}
// do something with $image
// ...</pre>
<p>Posting a file as data URI protects you from some
of the security vulnerabilities that are typically inherent when
dealing with files. Data URIs don't account for filenames, for instance,
so you're safe from directory traversal attacks by maliciously named
files. Still, you should treat the URI as you would any other piece of
user-supplied data. Your application will obviously dictate how you
filter and validate the file.</p>
<p>A secondary concern is the
possibility of a malicious person using large file posts as a vector for
a denial of service attack. The traditional upload approaches must
mitigate this risk, and an Ajax approach must do so as well. Make
certain you review the <tt>memory_limit</tt> and <tt>post_max_size</tt> entries in your
<tt>php.ini</tt>, and keep in mind the tradeoff between size and ASCII-safety
when using base64 encoding.</p>
<p>This isn't the first post on
the Internet to deal with Ajax file uploads or JavaScript's File API,
but many of them provide little beyond code samples. Hopefully I've
remedied the situation by providing a succinct overview of the API's
important objects/interfaces and discussing how receiving the file is
different using this approach. If there's something I've neglected, feel
free to leave a comment!</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com2tag:blogger.com,1999:blog-3224008808345429390.post-72200263255805567392014-02-20T01:17:00.000-05:002017-03-21T18:17:24.422-04:00Fixing "MySQL server has gone away" Errors in C<p>I ran across an <a title="mysql server has gone away - can't fix it" href="http://stackoverflow.com/q/5974498/322819">old question on Stack Overflow</a> the other day in which a user was having issues maintaining his connection to MySQL from C. I left a brief answer there for anyone else who might stumble across the same problem in the future, but I felt it was worth expanding on a bit more.</p>
<p>The error "MySQL server has gone away" means the client's connection to the MySQL server was lost. This could be because of many reasons; perhaps MySQL isn't running, perhaps there's network problems, or perhaps there was no activity after a certain amount of time and the server closed the connection. <a title="B.5.2.9 MySQL server has gone away" href="http://dev.mysql.com/doc/refman/5.7/en/gone-away.html">Detailed information on the error</a> is available in the MySQL documentation.</p>
<p>It's possible for the client to attempt to re-connect to the server when it's "gone away" although it won't try to by default. To enable the reconnecting behavior, you need to set the <tt>MYSQL_OPT_RECONNECT</tt> option to <tt>1</tt> using the <tt>mysql_options()</tt> function. It should be set after <tt>mysql_init()</tt> is called and before calling <tt>mysql_real_connect()</tt>. This should solve the problem if the connection was closed by the server because of a time-out.</p>
<p>The MySQL documentation that <a title="26.8.16 Controlling Automatic Reconnection Behavior" href="http://dev.mysql.com/doc/refman/5.7/en/auto-reconnect.html">discusses the reconnect behavior</a> points out that only one re-connect attempt will be made, which means the query can still fail if the server is stopped or inaccessible. I ran across this problem myself while writing a daemon in C that would periodically pull data from MySQL. The daemon was polling at set intervals far less than the time-out period, so any such errors were the result of an unreachable or stopped server. I simply jumped execution to just prior to my work loop's <tt>sleep()</tt> call and the daemon would periodically try to re-connect until the server came back up.</p>
<pre style="font-size:80%;">
#define DBHOSTNAME localhost
#define DBHOSTNAME dbuser
...
MYSQL *db = mysql_init(NULL);
if (db == NULL) {
fprintf(stderr, "Insufficient memory to allocate MYSQL object.");
exit(EXIT_FAILURE);
}
/* enable re-connect behavior */
my_bool reconnect = 1;
int success = mysql_options(db, MYSQL_OPT_RECONNECT, &reconnect);
assert(success == 0);
if (mysql_real_connect(db, DBHOSTNAME, DBUSERNAME, DBPASSWORD, DBDATABASE,
0, NULL, 0) == NULL) {
fprintf(stderr, "Connection attempt failed: %s\n", mysql_error(db));
exit(EXIT_FAILURE);
}
for (;;) {
success = mysql_query(db, "<MYSQL QUERY HERE>");
if (success != 0) {
/* The error is most likely "gone away" since the query is
* hard-coded, doesn't return much data, and the result is
* managed properly. */
fprintf(stderr, "Unable to query: %s\n", mysql_error(db));
goto SLEEP;
}
/* call mysql_use_result() and do something with data */
...
SLEEP:
sleep(SLEEP_SECONDS);
}</pre>
Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-31212675201316309272014-02-13T01:08:00.000-05:002017-03-21T18:19:22.270-04:00Generating C Code and Compiling from STDIN<p>Lately I've been exploring some <a title="Syslog" href="http://en.wikipedia.org/wiki/Syslog">syslog</a> configurations and needed to generate some log messages to verify they were routed correctly. Of course doing so programmatically would provide an easy and repeatable method to generate a batch of fresh log messages whenever I needed, but because of the number of facilities and priorities defined by the <a href="http://tools.ietf.org/html/rfc5424" title="The Syslog Protocol">syslog protocol</a>, it made sense to write a code generator to iterate the different permutations.</p>
<p>The following Lua script generates boilerplate C code for each of the 64 messages needed to test <tt>LOG_LOCAL</tt> 0-7 with all priorities. I chose generating the code in this manner over writing a nested facilities/priorities loop directly in C so I could easily include a textual representation of the facility and priority constants in the log message (this seemed like a cleaner solution to me than having to maintain a mapping of constants to <tt>char*</tt> strings as well). And why Lua? Well, it seemed a better idea than M4. :)</p>
<pre style="font-size:80%;">#! /usr/bin/env lua
local facilities = {
"LOG_LOCAL0",
"LOG_LOCAL1",
"LOG_LOCAL2",
"LOG_LOCAL3",
"LOG_LOCAL4",
"LOG_LOCAL5",
"LOG_LOCAL6",
"LOG_LOCAL7"
}
local priorities = {
"LOG_DEBUG",
"LOG_INFO",
"LOG_NOTICE",
"LOG_WARNING",
"LOG_ERR",
"LOG_CRIT",
"LOG_ALERT",
"LOG_EMERG"
}
print([[
#include <stdlib.h>
#include <syslog.h>
#include <libgen.h>
int main(int argc, char *argv[])
{
char *appName = basename(argv[0]);
]])
for _, facility in pairs(facilities) do
for _, priority in pairs(priorities) do
print(string.format(
[[
openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, %s);
syslog(%s, "Test %s.%s message.\n");
closelog();
]],
facility, priority, facility, priority
))
end
end
print([[
return EXIT_SUCCESS;
}]])</pre>
<p>Running the script will output the desired C code, which looks like this:</p>
<pre style="font-size:80%;">#include <stdlib.h>
#include <syslog.h>
#include <libgen.h>
int main(int argc, char *argv[])
{
char *appName = basename(argv[0]);
openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
syslog(LOG_DEBUG, "Test LOG_DEBUG message.\n");
closelog();
openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
syslog(LOG_INFO, "Test LOG_INFO message.\n");
closelog();
openlog(appName, LOG_CONS|LOG_NDELAY|LOG_PID, LOG_LOCAL0);
syslog(LOG_NOTICE, "Test LOG_NOTICE message.\n");
closelog();
...</pre>
<p>If I wanted to inspect or tweak the generated code, I could pipe the script's output to a file before compiling it:</p>
<pre style="font-size:80%;">./gen-syslog-tests.lua > syslog-tests.c
gcc -o syslog-tests syslog-tests.c</pre>
<p>But if I just wanted the compiled binary and had no need to modify the code, it seems inelegant to write things out to a file. Here's where I learned it's possible for gcc to compile code piped in on STDIN.</p>
<pre style="font-size:80%;">./gen-syslog-tests.lua | gcc -o syslog-tests -xc -</pre>
<p>The two things of note are: gcc can't deduce the programming language from the file extension (since there is no file) so the <tt>-x</tt> flag is necessary to identify the language, and <tt>-</tt> is used as the file name (a convention commonly used to indicate reading from STDIN as a file).</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-20432623305976154182013-12-16T21:24:00.000-05:002017-03-21T19:42:29.829-04:00Esperanto Accented Characters in Windows<p>It's not as easy to set up as <a href="http://zaemis.blogspot.com/2012/01/esperanto-accented-characters-in-ubuntu.html" title="Esperanto Accented Characters in Ubuntu">clicking a checkbox like Ubuntu/Gnome</a>, but it is possible to type proper Esperanto characters in Windows using Right Alt as a modifier key. You need to create and install an alternate keyboard layout and then set the new layout active.
<p>The program <a href="http://www.microsoft.com/en-us/download/details.aspx?id=22339" title="Microsoft Keyboard Layout Creator 1.4">Keyboard Layout Creator</a> is used to create the layout, and is available for free from Microsoft. Once it's downloaded and installed, start the program. Navigate File > Load Existing Keyboard and then select your primary keyboard layout (standard US layout in my case). You'll use this as a base and augment it with the Esperanto characters.
<p><img src="https://i.imgur.com/ubiP5vi.png">
<p>For each key that will should an accented character, right-click its position on the virtual keyboard and click "Properties for VK_? in all shift states". A dialog will appear in which the necessary Unicode code points can be entered.
<p><img src="https://i.imgur.com/Lap63NJ.png">
<p>The code points for the accented Esperanto letters are shown below, as well as for the Euro and <a title="Spesmilo" href="http://en.wikipedia.org/wiki/Spesmilo">Spesmilo</a> just for fun:
<p><img src="https://i.imgur.com/kMUK6oT.png">
<p>If you don't want to enter the Unicode values yourself, feel free to use a copy of <a href="https://dl.dropboxusercontent.com/u/14624025/blogspot/eo.klc">my keyboard definition file</a>.
<p>When you're finished setting the code points for each letter, navigate Project > Test Keyboard Layout to test them. Then, navigate Project > Properties to provide the necessary name and other descriptive information for the new layout. The name cannot be longer than eight characters, so I simply named mine "EO".
<p><img src="https://i.imgur.com/ttYicqn.png">
<p>Once you're satisfied with the layout, navigate Project > Build DLL and Setup Package. The keyboard layout will be compiled to a binary format usable by Windows and be saved to your hard drive. Run the setup.exe installer that was written to disk install the layout. The installer will detect your system's architecture and launch the appropriate sub-installer.
<p>Restart your computer once the installer is finished. You'll then be able to toggle between your original layout and the Esperanto layout using the Language Bar.
<p><img src="https://i.imgur.com/gSJSWgx.png">
<p>I set the augmented layout as my default keyboard layout (although I don't recommend this unless you're computer savvy). To do this on Windows 7, go to the Start menu, type "language" in the search bar, and select "Change keyboard and input methods". Click the "Change keyboards" button and you'll see the Text Services and Input Languages dialog. Under the General tab, set the new layout as the default input language and remove the entry for your original layout in the installed services tree.
<p><img src="https://i.imgur.com/3VWnEeP.png">
<p>On Windows 8, start typing "language" on the Start screen and then select "Change input methods" from the Settings group.
<p><img src="https://i.imgur.com/foFOrpT.png">
<p>The Windows 8 Language panel more or less provides the same functionality as its Windows 7 counterpart but with a less user-friendly manner. The Input method is accessible through the options link.
<p><img src="https://i.imgur.com/YUK87iB.png">
Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-23238728781360666282013-11-29T18:24:00.000-05:002017-03-21T18:41:42.560-04:00Password Woes<p>Happy belated International Change-Your-Password Week! Earlier this month, thanks to the <a href="http://www.theverge.com/2013/11/7/5078560/over-150-million-breached-records-from-adobe-hack-surface-online" title="Over 150 million breached records from Adobe hack have surfaced online">generous sponsorship by the great folks at Adobe</a>, people all around the world were changing their passwords and tech blogs were parroting guidelines for choosing a strong password. But let’s be honest – passwords are a hassle. And, as Adobe was so kind to remind us, even the strongest unique password can be an open door if the company storing it isn’t doing so competently.
<p>As someone who is a programmer, I’m aware of several technical solutions to our password woes. As someone who suffers from cynical realism, I believe the barrier to adopting these solutions to be red-tape and human nature (ego and laziness). There’s no reason for every website to require their own login credentials when OpenID and OAuth exist. Perhaps we should increase liability for password storers and provide incentives to the crackers who hack them. A smart company would migrate to an SSO-provider to mitigate their responsibility and the provider would be diligent in protecting the hashes.
<p>But as much as anyone would like to mitigate responsibility, the fact remains that it’s the individual who’s most affected by password breeches, not corporations. Are there secure ways to ease the burden of password management?
<p>I’ve been trying out <a href="http://keepass.info/">KeePass</a> this past week and my overall impression of the program is fair to middling. I’m storing the encrypted password database to Dropbox for the computers I use the most, and keep a duplicate copy of the database on a thumbdrive with a portable version of KeePass for when I need to use someone else’s computer. Although the premise seems secure, and I trust their implementation to be solid, some of the program’s incidentals frustrate me.<br />
<p>KeePass is fine on Windows but almost unusable on Linux. Unfortunately in this case, a good 90% of my day is spent using Linux. I've also noticed that the Auto-Fill feature toggles back to the most recently used window, so if an IM dialog pops up while I'm toggling to KeePass, the password is leaked. I could spend some time scripting in the advanced sections to safe guard against this, but that seems like a hassle.<br />
<p>I’ve also pondered the idea, so long as it contained accented characters, whether I might be able to get away with using the same password for everything. If the website is using proper encryption practices (Blowfish with scalable cost – i.e. Bcrypt – and random salt) then a rainbow table attack is going to be useless. Those sites that aren't have already proven their incompetence, so they probably don't know how to handle UTF-8 correctly either. The password value would be corrupted, truncated, or filtered, and most likely result in differing hashes between different sites... almost like using the site’s algorithm as your own salt! And brute-force crackers probably aren’t using Esperanto dictionaries; “@D0B3.fuŝ1s!” seems secure, doesn’t it?
<p>Ultimately, programs like KeePass only serve as a bandage and don’t address the core problem, and ubiquitous use of SSO-providers is still a pipe-dream. While we’re all stuck in Password Hell, waiting for the next password-change holiday, the best we can do is keep <a href="http://en.wikipedia.org/wiki/Clifford_Stoll" title="Clifford Stoll">Clifford Stoll</a>’s advice in mind: “Treat your password like your toothbrush. Don't let anybody else use it, and get a new one every six months.”Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-43894255953721631732013-09-10T16:19:00.000-04:002017-03-21T18:53:29.115-04:00Urba Semajnfino: Sirakuso a Success<p><em>The following is an translation of the article I wrote for <a href="http://esperanto-ondo.ru/Ind-ondo.htm" title="La Onda de Esperanto" lang="eo">La Ondo de Esperanto</a> to share the Urban Weekend: Syracuse event. Thank you to everyone who attended and helped make the event a success.</em></p>
<p>Urban Weekend: Syracuse, the third Urban Weekend event to happen in the United States, took place during the <a href="http://syracuseesperanto.org/usfino3" title="Syracuse Esperanto - Urban Weekend: Syracuse">weekend of August 31</a> in Syracuse, New York. Esperantists came from near and far to meet new friends and explore the city.</p>
<p>As the main organizer, I was a bit nervous. I had never organized an Esperanto event before. Would the weather hold out? Would anyone come? Would they enjoy their time together? But indeed the weather was beautiful, and people came from Rochester NY, Virginia, and even Brazil! Everyone had fun and Urban Weekend: Syracuse was a success.</p>
<p>A little before noon on Saturday, four of us met the city's central park and then walked to a nearby restaurant for lunch. The restaurant is popular for its beer, brewed on-site, and also for its support of Central New York agriculture by using locally-grown ingredients.</p>
<p>After lunch we walked about in the city for a bit and made our way to two museums. The Erie Canal Museum remembers the <a href="http://en.wikipedia.org/wiki/Erie_Canal" title="Erie Canal">Erie Canal</a> which connected Lake Erie to the Hudson River, and there we met two more esperantists. The canal no longer exists in its current form, but it has historical significance to both the region and the United States because it opened the Great Lakes to the Atlantic Ocean and enabled westward migration. Everyone enjoyed learning how the canal helped shape the country and seeing how life was like for those who traveled it almost 200 years ago.</p>
<p>The second museum, the Everson Museum of Art, is an art gallery known for its ceramics, pottery, and film exhibits. The collection may not be as impressive as the ones found in larger museums, but it has its several pieces worth enjoying. And perhaps even more special, the museum building was designed by the internationally acclaimed architect <a href="http://en.wikipedia.org/wiki/I._M._Pei" title="IM Pei">IM Pei</a> who also designed the <span lang="fr">Pyramide du Louvre</span> in Paris.</p>
<p>After exploring some of the art and history of Syracuse, we were hungry and were ready to eat. The six of us went to a Mexican restaurant occupying a former church building. Even this building had significance; the church was a station in the <a href="http://en.wikipedia.org/wiki/Underground_Railroad" title="Underground Railroad">Underground Railroad</a> in the 19th century. A secret tunnel under the church was a refuge for slaves running north in search of their freedom.</p>
<p>To finish the first day, we socialized and watched a film - <a title="House of Ghosts" href="http://sainteuphoria.com/ghosts.html">House of Ghosts</a>, a comical horror film dubbed with Esperanto voice and subtitles.</p>
<p>Most of the day Sunday was spent visiting the zoo, home to over 700 animals. A family of five esperantists who couldn't attend the first day joined us. The children in the group loved looking at the elephants, penguins, and lions. It was also a good opportunity for the adults to improve their animal-related vocabulary.</p>
<p>We ate lunch after the zoo in a near-by popular Irish restaurant; the food was great, and there were some local musicians playing in the pub that we enjoyed. The neighborhood where the restaurant is located was settled by Irish immigrants who came to work on the Erie Canal, and near the restaurant is the famous <a href="http://www.syracuse.com/kirst/index.ssf/2005/03/rocks_against_red_lift_green_o.html" title="An Irish lesson for the prime minister: Rocks against red lift green on Tipp Hill">green-on-top traffic light</a>. As the story goes, the settlers wouldn't allow red (the color of the British) to sit above green, and they threw tones at the light in protest anytime the city tried to hang the light correctly.</p>
<p>Weekend events similar to Urban Weekend are good for busy esperantists who are not able to attend the longer major events, and like all Esperanto gatherings, is a good opportunity to meet new friends, explore new places, and take part in <em lang="eo">Esperantujo</em>. If one is held near you, I highly recommend that you participate. If not, why not organize your own? It's easier than you might think (I speak from experience!). The <a lang="eo" href="http://tinyurl.com/mm43ylf" title="Manlibro pri Urba Semajnfino">Manlibro pri Urba Semajnfino</a> is a good place to start.</p>
Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-22207127910128477102013-06-21T00:34:00.001-04:002017-03-21T18:57:39.928-04:00Building an Array with array_reduce?<p>Whether it’s twisting a function or taking advantage of side effects and flexible language constructs, it’s no secret I occasionally take joy in writing bastard PHP code. The other day I was in an evil mood and used <tt>array_reduce()</tt> to construct an array.</p>
<p>If you’re not familiar with <tt>array_reduce()</tt>, it’s a function that iteratively reduces a given array to a single value using a callback function. For example, suppose the function <tt>array_sum()</tt> didn’t exist. We could achieve the desired functionality using <tt>array_reduce()</tt> like so:</p>
<pre style="font-size:80%;"><?php
$nums = [1, 2, 3, 4, 5];
$sum = array_reduce(
$nums,
function ($acc, $val) { return $acc + $val; },
0
);</pre>
<p><tt>array_reduce()</tt> executes the callback function for each element in the array, passing to it an accumulator and the current array member. The returned value is used as the accumulator value for the next iteration. This is roughly the functional equivalent of this iterative approach:</p>
<pre style="font-size:80%;"><?php
$acc = 0;
foreach ($nums as $val) {
$acc = $acc + $val;
}</pre>
<p>All and all this is pretty straightforward. <tt>array_reduce()</tt> is nothing more than a mechanism that iterates over a list with an available accumulator. But what happens when you realize that nothing mandates the single result value must be a scalar? For functional programmers, this is obvious. For most PHP programmers with a procedural or OO background, this is a jaw-dropping realization.</p>
<p>Suppose we need to build an array using data from the Unicode Consortium’s supplemental <a title="windowsZones.xml" href="http://unicode.org/repos/cldr/trunk/common/supplemental/windowsZones.xml">windowsZones.xml</a> file. To make things interesting, the keys need to be the values of the <tt>mapZone</tt>s’ <tt>type</tt> properties, and the members need to be the value in the leading comments. That is, the resulting array must look like this:</p>
<pre style="font-size:80%;">Array (
["Etc/GMT+12"] => "(UTC-12:00) International Date Line West",
["Etc/GMT+11"] => "(UTC-11:00) Coordinated Universal Time-11",
["Pacific/Pago_Pago"] => "(UTC-11:00) Coordinated Universal Time-11",
["Pacific/Niue"] => "(UTC-11:00) Coordinated Universal Time-11",
["Pacific/Midway"] => "(UTC-11:00) Coordinated Universal Time-11",
["Pacific/Honolulu"] => "(UTC-10:00) Hawaii",
...</pre>
<p>The standard XML-processing strategies become cumbersome because of the requirement pertaining to the comment values, and the next best solution is to script a rudimentary stateful parser. We can iterate each line and extract the textual value if it’s a comment, or extract the attribute value if it’s a <tt>mapZone</tt> element, and assign to the array when we have both pieces of information available.</p>
<pre style="font-size:80%;"><?php
$comValue = '';
$zones = array_reduce(
file('windowsZones.xml'),
function ($acc, $line) use (&$comValue) {
$line = trim($line);
if (strpos($line, '<!-- (') !== false) {
$comValue = trim($line, '<!-> ');
}
elseif ($pos = strpos($line, 'type="')) {
$typeValues = substr($line, $pos + 6, -3);
foreach (explode(' ', $typeValues) as $value) {
$acc[$value] = $comValue;
}
}
return $acc;
},
[]
);</pre>
<p>This approach is going to be an order of magnitude slower than an iterative <tt>foreach</tt> loop that builds up the <tt>$zones</tt> array directly because the PHP run-time just isn't optimized for abuse like this. We’d gain speed with <tt>foreach</tt>, but then we’d miss an opportunity to explore how things work, blend different concepts together, and just have fun. Ultimately it's little excursions like this that help one grow and become a better programmer.</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com1tag:blogger.com,1999:blog-3224008808345429390.post-77434101343420308472013-05-27T02:52:00.001-04:002017-03-21T22:11:41.015-04:00Composing Music with PHP<p>I’m not an expert on probability theory, artificial intelligence, and machine learning. And even my Music 201 class from years ago has been long forgotten. But if you’ll indulge me for the next 10 minutes, I think you’ll find that even just a little knowledge can yield impressive results if creatively woven together. I’d like to share with you how to teach PHP to compose music.
<p>Here’s an example:
<p><img src="https://i.imgur.com/RwDji4q.png" alt="A melody composed by PHP">
<p>You’re looking at a melody generated by PHP. It’s not the most memorable, but it’s not unpleasant either. And surprisingly, the code to generate such sequences is rather brief.
<p>So what’s going on? The script calculates a probability map of melodic intervals and applies a Markov process to generate a new sequence. In friendlier terms, musical data is analyzed by a script to learn which intervals make up pleasing melodies. It then creates a new composition by selecting pitches based on the possibilities it’s observed. .
<h2>Standing on Shoulders</h2>
<p>Composition doesn’t happen in a vacuum. Bach was fond of Buxtehude and Vivaldi; Chopin influenced Lizt and Wagner; Mozart and Hayden taught Beethoven. The same melodic phrases are found in different pieces of work. For example, <i lang="it">Orfeo ed Euridice</i> by Gluck and the hymn tune <i lang="la">Non Dignus</i> both share a common phrase.
<p><img src="https://i.imgur.com/7uP6bzF.png" alt="The same melodic phrase is found in Orfeo ed Euridice and Non Dignus">
<p>But if you ask PHP to compose blindly, the results aren’t pretty. Here's a melody generated by mapping random values returned by successive calls to <tt>rand()</tt> to notes on a staff.
<pre>$notes = ['C4','D4','E4','F4','G4','A4','B4','C5','D5','E5','F5','G5'];
$melody = array_rand($notes, 12);
foreach ($melody as $note) {
draw($notes[$note]);
}
</pre>
<p><img src="https://i.imgur.com/34PP6Uq.png" alt="random notes don't sound pretty.">
<p>Unless you’re keen on <a href="http://en.wikipedia.org/wiki/Twelve-tone_technique" title="Twelve-tone technique">twelve-tone</a>, it's better to draw inspiration from earlier compositions.
<p>I transcribed the melody of several pieces of music using <a href="http://soundcalledmusic.com/scientific-pitch-notation/" title="Scientific pitch notation">Scientific Pitch Notation</a>. I didn't concern myself with note duration. Rather, I focused on the notes themselves. A middle C on paper was entered as C4 (C is the note name and 4 is its octave), a semitone above that was C#4, the next semitone D4, and so on until a melody (the first 8 measures of <i lang="la">Tantum Ergo</i> by Bottazzo shown here) was encoded:</p>
<p><img src="https://i.imgur.com/YvFs8fp.png" alt="The first 8 measures of Tantum Ergo">
<pre>A4 C5 G4 A4 G4 A#4 D5 A4 A#4 A4 C5 D5 C5 A4 B4 B4 C5</pre>
<p>With an easily parsable sequence, we can now perform some basic analysis. For example, given any instance of A4, what is the next probable note to follow?
<pre><b><u>A4 C5</u></b> G4 <b><u>A4 G4</u></b> A#4 D5 <b><u>A4 A#4</u></b> <b><u>A4 C5</u></b> D5 C5 <b><u>A4 B4</u></b> B4 C5</pre>
<p>or:
<p><pre>C5 G4 A#4 C5 B4</pre>
<p>There’s a 40% chance that the next note will be C5, a 20% chance it will be G4, a 20% chance it will be A#4, and a 20% chance it will be B4.
<p>This process translates warm flowing music into something the computer, understanding things only within the context of cold, unfeeling mathematics, can reason about.
<h2>Paging Doctor Markov</h2>
<p>You're probably familiar with deterministic systems—systems where the same input will always generate the same output. Addition is a deterministic function, with inputs 2 and 4 always yielding 6. Stochastic systems on the other hand behave with some level of randomness. Identical inputs can result in wildly different outputs, such as the function <tt>array_rand()</tt>. There is an element of randomness in composition, or else all compositions starting on F4 would end up the same, making generations of composers irrelevant and filling the coffers of the RIAA. But the randomness is tempered, even if at a subconscious level, by the composer with the knowledge of what combinations are pleasing.</p>
<p>A prime example of a stochastic system, one which is also relevant to the composition script, is a Markov process (named after the mathematician <a href="http://en.wikipedia.org/wiki/Andrey_Markov">Andrey Markov</a> who not only studied them but also had an amazing beard). As <a href="http://www.doctornerve.org/nerve/pages/interact/markhelp.htm">Nick Didkovsky explains</a>:</p>
<p><i><q>Markov analysis looks at a sequence of events, and analyzes the tendency of one event to be followed by another. Using this analysis, you can generate a new sequence of random but related events, which will look similar to the original. A Markov process is useful for analyzing dependent random events - that is, events whose likelihood depends on what happened last.</q></i>
<p>The <a href="http://en.wikipedia.org/wiki/Examples_of_Markov_chains#A_very_simple_weather_model">traditional example</a> used to illustrate the concept is a weather predicting graph. Suppose the day following a sunny day has a 90% chance of also being sunny and the one following a rainy day has a 50% chance of being rainy. The graph looks like this:</p>
<p><img src="https://i.imgur.com/NTkL4sO.png">
<p>Walking the graph for 5 iterations we might find ourselves transitioning with Sunny the first day, Rainy the next, Sunny after that, then Sunny, and Sunny, or we might find ourselves transitioning Sunny, Sunny, Sunny, Rainy, Rainy another.</p>
<p>Hopefully it's obvious where I'm going with all of this; it's possible to constrain the random process of "next note selection" using the weighted probabilities learned by analyzing melodies for a better sounding result. This process allows us to generate passable melodies in infinitely less time than it would take for a <a href="http://en.wikipedia.org/wiki/Infinite_monkey_theorem">monkey hitting random keys</a> on an organ to play the complete works of Messiaen.</p>
<p><img src="https://i.imgur.com/FI91Ca3.png">
<p><strong style="font-size:120%">Robot Composers (the singularity is near)</strong></p>
<p>At this point you hopefully have a cursory understanding of the key concepts. Even if you don't, you've survived long enough and will now be rewarded with some code.</p>
<p><pre><?php
namespace Zaemis;
class Composer
{
private $pitchProb;
public function __construct() {
$this->pitchProb = [];
}
public function train($noteData) {
$numNotes = count($noteData);
for ($i = 0; $i < $numNotes - 1; $i++) {
$current = $noteData[$i];
$next = $noteData[$i + 1];
$this->pitchProb[$current][] = $next;
}
}
public function compose($note, $numNotes) {
$melody = [$note];
while (--$numNotes) {
$i = array_rand($this->pitchProb[$note], 1);
$note = $this->pitchProb[$note][$i];
$melody[] = $note;
}
return $melody;
}
}
$noteData = trim(file_get_contents('../data.txt'));
$noteData = explode(' ', $noteData);
$c = new Composer();
$c->train($noteData);
$melody = $c->compose($_GET['note'], $_GET['count']);
echo '<img src="img/notes/clef.png" alt="Treble Clef">';
foreach ($melody as $note) {
echo '<img src="img/notes/' . urlencode($note) . '.png" alt="' .
$note . '">';
}</pre></p>
<p>The learning process takes place in the <tt>train()</tt> method which accepts an array of training notes (the encoded melody string split on spaces). The code is simple, quick, and dirty; the notes are pushed to a 2-dimensional array with their probabilities indirectly implied by the quantity of elements themselves. When populated, the array looks similar to:</p>
<p><pre>array(9) {
["A4"]=> array(13) {
[0]=> string(2) "C5"
[1]=> string(2) "G4"
[2]=> string(3) "A#4"
[3]=> string(2) "C5"
[4]=> string(2) "B4"
[5]=> string(3) "A#4"
[6]=> string(2) "G4"
[7]=> string(2) "A4"
[8]=> string(2) "D5"
[9]=> string(2) "G4"
[10]=> string(2) "C5"
[11]=> string(2) "C5"
[12]=> string(2) "G4"
}
["C5"]=> array(11) {
...</pre></p>
<p>Looking at the data, a randomly selected note to follow A4 has approximately a 31% chance of being C5 since 4 out of the 13 members of the list hold that value. Maintaining a list like this can be memory-exhausting for large sets, and there are better ways to perform weighted selection. You can find an excellent write up (using Python) at <a href="http://www.electricmonk.nl/log/2009/12/23/weighted-random-distribution/">electricmonk.nl</a>.</p>
<p>The <tt>compose()</tt> method encapsulates the logic to generate the melodic sequence. A starting note the desired length is given, and the method randomly selects a value for the following note from the array until the desired number of notes has been retrieved.</p>
<p>Of course we humans would rather see the result notated on a staff as opposed to a list of note values, so I created a set of note images to accompany the script. Each image displays a note on the appropriate position on a staff, and the files are named according to the note name. Looping through the melody to emit some IMG elements was an effective rendering method for my needs.</p>
<h2>Harder, Better, Faster, Stronger</h2>
It is impressive that such simple concepts can be used to create a script capable of emulating a composer. Of course, there is infinitely more that can be done to build and improve. Consider this your first exploration into musical intelligence.
David Cope, who has been exploring computer composition since 1981, has <a href="http://artsites.ucsc.edu/faculty/cope/experiments.htm">this to say</a>:</p>
<p><i><q>Simply breaking a musical work into smaller parts and randomly combining them into new orders almost certainly produces gibberish. Effective recombination requires extensive musical analysis and very careful recombination to be effective at even an elemental level.</q></i>
<p>Beyond the obvious changes, such as changing the pitch matrix to maintain probabilities, how would you improve things? Maybe replace this naive approach with a completely different mechanism for analyzing music? Parse input from MIDI files? What would be needed to identify harmonies? How about chord progressions? Note durations? Composer "signatures"? Could your script learn from itself by analyzing and feeding pleasing melodies it produced back into its knowledge base? In what ways could you recombine samples to form new works?</p>
<p>I look forward to hearing about your own experiments in AI-driven composition in the comments below.</p>
<p><strong>Update 6/10/13:</strong> I've tossed some code <a href="https://github.com/tboronczyk/MusicComposer">up on GitHub</a> if anyone's interested.</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com7tag:blogger.com,1999:blog-3224008808345429390.post-53230414125936151572012-10-20T06:49:00.002-04:002017-03-24T11:37:22.989-04:00PHP Assertions<p>I stumbled upon assertions in PHP today, though why I didn’t know they exist after working in the language for so long and what I was originally looking for when I came across them are both mysteries. And with the increasing focus on software quality in the PHP community, I wondered why I hadn’t seen them used by others. I decided to ask around.</p>
<p>I asked a few friends if they knew about assertions. They did. I asked if they used them. They didn’t.</p>
<p><a href="http://www.wolerized.com/">Remi Woler</a>: <q>I think nobody has found a good use case. It weaves tests into code. How are you going to recover from a failed assertion?</q>
<p><a href="http://daveyshafik.com/">Davey Shafik</a>: <q>They kinda suck. For example: <tt>assert('mysql_query("")')</tt> It's a string of code that gets eval’d.</q>
<p>So, PHP assert didn’t get stellar endorsements from people whose opinions I respect.</p>
<p>My main experience with assertions comes from C where they are defined as macros. Its argument must evaluate true, otherwise the program terminates with an error. These checks can be stripped at compile time with <tt>-DNDEBUG</tt> if desired, although there is some disagreement on the wisdom of doing so.</p>
<p>PHP asserts are implemented differently. First, they’re configurable in <tt>php.ini</tt> or by using <tt>assert_options()</tt>. A failure doesn’t necessarily have to abort the script—you can bail if you want to, or disable them, or convert them to run-time warnings, or even invoke a callback to handle them. This makes them very flexible and much less black-and-white than in C.</p>
<p>The actual <tt>assert()</tt> function accepts either a string or a boolean for its condition. So, you can write either:</p>
<pre>
assert(is_string($foo));</pre>
<p>or:</p>
<pre>
assert('is_string($foo)');</pre>
<p>In the first example, the statement is evaluated and the resulting boolean is passed to <tt>assert()</tt>. While perhaps a little more traditional, it’s not efficient as you will see momentarily.</p>
<p>In the second example, the string is passed to <tt>assert()</tt> directly which eval’s it to determine the truthiness. This is a better approach for two reasons:</p>
<ol><li><tt>assert()</tt> <a href="http://lxr.php.net/xref/PHP_5_4/ext/standard/assert.c#140">immediately returns true</a> when assertions are disabled. The code string is not evaluated and any performance hit from executing unnecessary statements is minimized.</li>
<li>If the assertion fails, the code string is passed to a callback (if one is used) and can be included in any output or logging.</li>
</ol>
<p>I’m not convinced Davey’s eval concerns are entirely well-founded in this instance because of the above reasons and the fact that it’s static code to be evaluated by PHP. It’s a controlled environment, not <tt>eval($randomUserSuppliedCode)</tt>.</p>
<p>PHP 5.4 also added a second parameter to <tt>assert()</tt>—a string description to annotate the test. If present, the string is also passed to the callback.</p>
<p><a href="http://www.php.net/assert">The PHP manual</a> offers some guidance on using assertions:</p>
<p><i><q>Assertions should be used as a debugging feature only. You may use them for sanity-checks that test for conditions that should always be true and that indicate some programming errors if not or to check for the presence of certain features like extension functions or certain system limits and features.<br><br>
Assertions should not be used for normal runtime operations like input parameter checks. As a rule of thumb your code should always be able to work correctly if assertion checking is not activated.</q></i>
<p>Both are good advice, but contradictory; your code may not work if assertion checking is disabled and you are using them to test system limitations.</p>
<p><a href="http://en.wikipedia.org/wiki/Assertion_(computing)#Comparison_with_error_handling">Wikipedia explains</a> the difference between assertions and error handling:</p>
<p><i><q>Assertions should be used to document logically impossible situations and discover programming errors — if the impossible occurs, then something fundamental is clearly wrong. This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely to occur in practice. Using assertions as a general-purpose error handling mechanism is unwise: assertions do not allow for recovery from errors; an assertion failure will normally halt the program’s execution abruptly. Assertions also do not display a user-friendly error message.</q></i>
<p>So at this point I disregarded the manual’s and Wikipedia’s advice and tinkered with them. PHP assertions don’t behave like their C brethren, so perhaps the traditional C way of thinking (asserts are debugging only) might be restrictive? What I found was that PHP assertions, with a bit of creativity, could be used to write readable, quality code.</p>
<p>Consider a naive Active Record implementation. You might have code that resembles:</p>
<pre><?php
class User
{
protected $id;
...
public function setId($id) {
if (!is_null($this->id)) {
throw new BadMethodCallException('ID already set for user.');
}
if (!is_int($id) || $id < 1) {
throw new InvalidArgumentException('ID for user is invalid.');
}
$this->id = $id;
}
...
}</pre>
<p>It is possible to use <tt>assert()</tt> to test the <tt>$id</tt> argument (disregarding the manual’s advice) and a callback to throw the exceptions (ignoring Wikipedia).</p>
<pre><?php
assert_options(ASSERT_CALLBACK, function ($file, $line, $code, $desc) {
list($exClass, $msg) = explode(':', $desc, 2);
throw new $exClass($msg);
});
class User
{
protected $id;
...
public function setId($id) {
assert('is_null($this->id)',
'BadMethodCallException:ID already set for user.');
assert('is_int($id) && $id > 1',
'InvalidArgumentException:ID for user is invalid.');
$this->id = $id;
}
...
}</pre>
<p>This isn’t how assertions are intended to be used, but it does address Remi’s concern about recovery. One doesn’t typically recover from an assertion but now the condition has been converted into an exception so recovery is possible to the same extent that recovery from the exception would be.</p>
<p>If assertions have been turned off then the code won’t work, so if you needed to rely on this then you have to add <tt>assert_options(ASSERT_ACTIVE, true)</tt> to your bootstrap file.</p>
<p>Now don't get me wrong, I'm not about to start doing this in my projects. But it’s fun to play and there’s still some questions worth pondering.</p>
<p>If you were to use <tt>assert()</tt> properly instead of something along the lines of my bastardized exception example, what type of things would be worth asserting? </p>
<p>Assertions are meant to identify program logic/design bugs, not as a run-time error handling mechanism. Isn’t this why we do unit testing? Playing devil’s advocate here, what’s wrong with pushing unit tests directly into your code if we have doc comments that are extracted for documentation?</p>
<p>Feel free to let me know your thoughts in the comments section below. Do you constrain yourself to the classical interpretation of assertions, or do you take advantage of the flexibility of PHP’s implementation? Where and when do you use them in your code?</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com3tag:blogger.com,1999:blog-3224008808345429390.post-34726178241625754292012-09-29T22:00:00.003-04:002017-03-21T19:14:25.540-04:00PHP_EOL: Most Worthless Constant?<p><tt>PHP_EOL</tt> may very well be the most worthless general-purpose constant in modern PHP. It's supposed to be helpful for cross-platform developing, for example you could write a PHP-powered shell script that says:</p>
<pre style="font-size:80%;"><?php
echo "Operation Successful!" . PHP_EOL;</pre>
<p>and then expect the proper newline to terminate the output string based on the platform PHP is running on.</p>
<p>That's all well and good, but the following is functionally equivalent:</p>
<pre style="font-size:80%;"><?php
echo "Operation Successful!\n";</pre>
<p>Try it out and you'll see. In console output on Windows, Linux, and Mac they all are displayed with the expected newline terminating the output string.</p>
<p>I don't see it being useful for writing data or log output to a file either. If you're writing and reading on the same platform then newline discrepancies won't be an issue, and if you're writing on one platform and reading on another then you'll want to standardize on a newline anyway.</p>
<p>Has <tt>PHP_EOL</tt>'s time come and gone? Do you use it in your code, and if so why? </p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com7tag:blogger.com,1999:blog-3224008808345429390.post-55281313680919285912012-07-24T23:43:00.001-04:002017-03-21T19:16:29.378-04:00PHP Recursive Directory Traversal<p>It sounds like a simple enough task: Generate an array that mirrors a directory structure. Directories may have subdirectories (arbitrary nesting), and entries should be alphabetized with directories grouped first. The image below shows what the array should look like given a sample directory.</p>
<p><img src="https://i.imgur.com/ygfwIjG.png">
<p>While not terribly difficult, there are a few snags that can trip you up if you're not careful. For me, the first snag was trying to do it “the right way.”</p>
<p>The <tt>RecursiveDirectoryIterator</tt> “provides an interface for iterating recursively over filesystem directories” (<a href="http://www.php.net/recursivedirectoryiterator">php.net</a>), so this was my first approach. I hacked together this code after a short while:</p>
<pre style="font-size:80%;"><?php
function getDirectoryList($dir) {
$dirList = [];
$dirIter = new RecursiveDirectoryIterator($dir,
FilesystemIterator::SKIP_DOTS);
$iterIter = new RecursiveIteratorIterator($dirIter);
foreach ($iterIter as $entry) {
$path = substr($entry->getPath(), strlen($dir) - 1);
$keys = "['" . join("']['", explode("/", $path)) . "']";
eval('$dirList' . $keys . '[]="' . $entry->getFilename() . '";');
}
return $dirList;
}</pre>
<p>The function gets the nesting right for files, but empty directories are missing and the ordering is wrong. I could have spent some time trying to fix those issues, but the use of <tt>eval()</tt> bothered me enough to abandon the approach completely. A straight iteration wasn't going to build up the array correctly without it, so I needed to take a true recursive approach.</p>
<p>In addition to doing away with <tt>eval()</tt>, the recursive approach also afforded me an easy way to implement the necessary sorting. I was able to queue the directory names and file names separately, sort them, and then return their union.</p>
<pre style="font-size:80%;"><?php
function getDirectoryList($dir) {
$dirList = $fileList = [];
if ($dfp = opendir($dir)) {
while (($entry = readdir($dfp)) !== false) {
if ($entry[0] != ".") { // catches dot dirs and hidden files
$path = "$dir/$entry";
if (is_file($path)) {
$fileList[] = $entry;
}
else if (is_dir($path)) {
$dirList[$entry] = getDirectoryList($path);
}
}
}
closedir($dfp);
uksort($dirList, "strnatcmp");
natsort($fileList);
}
return $dirList + $fileList;
}</pre>
<p>Interestingly enough, PHP doesn't have a <tt>natksort()</tt> function. I had to mock my own implementation using <tt>uksort()</tt> and <tt>strnatcmp()</tt>.</p>
<p>I ran the solution past a few friends of mine and the response from one was:</p>
<blockquote>you... bring shame to our profession.</blockquote>
<p>His efforts to show me “the right way” again with <tt>RecursiveDirectoryIterator</tt> were short lived however when he came across the same issues I did and gave up to eat a leftover burrito.</p>
<p>So I guess there are a couple morals to my tale. One, that despite our fancy modern OOP APIs, sometimes the procedural approach is a better fit for the task at hand. We abstract everything so we don't have to re-invent the wheel but then have a mass of code that is too generic to actually do something that should be trivial. Two, we should be careful about being pompous. It's hard to eat a burrito with your foot in your mouth.</p>
<p>Of course, if you can come up with a better way then let me know. I might just buy you a new burrito. :)</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com6tag:blogger.com,1999:blog-3224008808345429390.post-48028111865250035292012-05-19T15:20:00.001-04:002017-03-21T19:23:31.505-04:00Writing a Minimal PSR-0 Autoloader<p>An excellent overview of <a href="http://phpmaster.com/autoloading-and-the-psr-0-standard/">autoloading in PHP and the PSR-0 standard</a> was written by <a href="http://harikt.com/">Hari K T</a> over at <a href="http://phpmaster.com">PHPMaster.com</a>, and it's definitely worth the read. But maybe you don't like some of the bloated, heavier autoloader offerings provided by various PHP frameworks, or maybe you just like to roll your own solutions. Is it possible to roll your own minimal loader and still be compliant?</p>
<p>First, let's look at what PSR-0 mandates, taken directly from the <a href="https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-0.md">standards document on GitHub</a>:</p>
<ul>
<li>A fully-qualified namespace and class must have the following structure <tt>\<Vendor Name>\(<Namespace>\)*<Class Name></tt></li>
<li>Each namespace must have a top-level namespace ("Vendor Name").</li>
<li>Each namespace can have as many sub-namespaces as it wishes.</li>
<li>Each namespace separator is converted to a <tt>DIRECTORY_SEPARATOR</tt> when loading from the file system.</li>
<li>Each "_" character in the CLASS NAME is converted to a <tt>DIRECTORY_SEPARATOR</tt>. The "_" character has no special meaning in the namespace.</li>
<li>The fully-qualified namespace and class is suffixed with ".php" when loading from the file system.
Alphabetic characters in vendor names, namespaces, and class names may be of any combination of lower case and upper case.</li>
</ul>
<p>The first two and the last points are aimed at module/library authors, and the third point is of little consequence. The remaining three are the important points relevant to writing the autoloading mechanism. Of course standards have to be wordy by their very nature, but if you boil the relevant mandates down they essentially say the following: “replace namespace separators and class-name underscores with a directory separator and append a .php suffix.”</p>
<p>The standard doesn't describe what support functionality must be provided by a PSR-0 compliant autoloader (registration methods, configuration options, etc.). If it can automatically find a class definition in the <tt>\<Vendor Name>\(<Namespace>\)</tt> pattern, then it's PSR-0 compliant. Furthermore, it doesn't specify the parent directory for <tt><Vendor Name></tt>. The extra “fluff” of most autoloader implementations is convenient if you need to specify the location via code, but most of the times unnecessary if you simply use a directory already within PHP's include path.</p>
<p>With modern namespacing support in in PHP, it's probably not necessary to encapsulate the logic as a class, like most libraries/frameworks do, either. A single function can perform the necessary transformations on a class path and be namespaced properly so it doesn't pollute the global namespace. Instead of creating an instance of an autoloader object and then invoking the instances register() method, one can simply register a function directly with <tt>spl_autoload_register()</tt>.</p>
<p>Or if you want to be even more minimal, you can register an anonymous function with <tt>spl_autoload_register()</tt>. Put the code in an include file, include that file, and you have no-muss-no-fuss PSR-0 autoloading instantly at your disposal.</p>
<pre style="font-size:80%;"><?php
spl_autoload_register(function ($classname) {
$classname = ltrim($classname, "\\");
preg_match('/^(.+)?([^\\\\]+)$/U', $classname, $match);
$classname = str_replace("\\", "/", $match[1])
. str_replace(["\\", "_"], "/", $match[2])
. ".php";
include_once $classname;
});</pre>
<p>The magic here is in the regex which splits the incoming name into its constituent parts; the class name will always be in <tt>$match[2]</tt>, and <tt>$match[1]</tt> the namespace name which may or may not be an empty string. It's necessary to identify the parts because the underscore has no special meaning in the namespace portion making a blind replace on underscores and backslashes incorrect.</p>
<p>Oh, and before you start jumping all over me about <tt>DIRECTORY_SEPARATOR</tt>, I'd like to point out that a hard-coded slash is equivalent for the purpose here. From the <a href="http://www.php.net/basename">PHP manual</a>:</p>
<p><i><q>On Windows, both slash (/) and backslash (\) are used as directory separator character. In other environments, it is the forward slash (/).</q></i>
<p>So YES, it is possible to write a minimal and elegant PSR-0 compliant autoloader. The only extra requirement is that the <tt><Vendor Name></tt> directories already be in PHP's include path to negate the need for additional path registering functions, which I would argue is good practice anyway.</p>
<p>Perhaps someday the group could sponsor something that mandates the path requirement (and maybe name it PSR-0a)?</p>
<p>Of course, maybe I'm just crazy.</p>
<p><b>Special thanks</b> to <a href="http://http://grahamc.com/">Graham Christensen</a> for his efforts in proofing my concept.</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com6tag:blogger.com,1999:blog-3224008808345429390.post-71259258847950449622012-05-08T21:37:00.000-04:002017-03-21T19:18:28.969-04:00Doing a 180 on Grid 960<p>I've never been a fan of CSS frameworks; They just seem unnecessary to me. Every project can benefit from a reset.css file and maybe basic typography styles, but a whole framework? Meh.</p>
<p>Then I read an excellent argument in favor of grid-layout frameworks in some book which I've since forgotten the name of and changed my mind (a tremendous feat indeed). I decided I'd make use of a grid-layout framework in my next project.</p>
<p>I chose <a href="http://960.gs/">Grid 960</a> for the project since that was the one mentioned in the book, I had heard about it before, and it seemed to me the most mature and stable. My experiences with Grid 960 weren't bad per se... I mean, it didn't sour me back to my original mindset... but a few points will have me looking for another framework.</p>
<ol>
<li>The extra markup required is basically reminiscent of tables. Instead of <tt><tr></tt> or <tt><td></tt> though now you've got <tt><div class="container_12"></tt> and <tt><div class="grid_3"></tt>.</li>
<li>Borders, margins, and padding will throw your grids off. While it makes sense and is ultimately unavoidable, it highlights the fact grid-systems are not necessarily as intuitive as they claim to be.</li>
<li>I found 960px still a bit wide. More screen-real estate is available than there was a few years ago, but people don't necessarily view sites full screen like they did back in the 800x600 days.</li>
<li>Grid 960 isn't scalable. I'm not talking about "responsive web design" here, rather just using <tt>em</tt>s or <tt>rem</tt>s instead of <tt>px</tt>s so things can scale properly.</li>
</ol>
<p>Researching beyond 960 I saw there are few fluid and responsive ones. And I saw a <a href="http://1kbgrid.com/">1KB framework</a> which was cool. It lacked push/pull functionality, but would be sufficient for most of my work I think.</p>
<p>So 960 wasn't my cup of tea, but I haven't given up on grid-frameworks yet. Maybe I'll find something more to my liking for my next project... or even roll my own.</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com4tag:blogger.com,1999:blog-3224008808345429390.post-45681433000592871752012-05-04T11:03:00.000-04:002012-09-30T02:29:37.943-04:00Are Coding Standards Futile?<p>Unless the visual layout of a program's code affects its execution, there will always be programmers who circumvent the established coding standards. I admit, I've done it myself from time to time. There's no scientific survey that such standards really reduce cognitive friction when reading someone else's code as far as I know, and aesthetic matters are generally subjective. Make the argument for tabs over spaces until you're blue in the face; someone will just come along touting the benefits of spaces.</p>
<p><a href="https://groups.google.com/forum/?fromgroups#!searchin/php-standards/boronczyk/php-standards/wFg-WbpbgFI/QLGbFOZYprIJ">I warned</a> achieving a consensus on PHP Coding Standards as PSR-1 would be difficult and that the group's efforts would be better spent discussing more "meatier" topics, such as object caching. Two months later, the <a href="https://groups.google.com/forum/?fromgroups#!searchin/php-standards/tied/php-standards/5BKre9H5p9A/ObfpjNnbSGkJ">proposal failed</a> to garner enough votes for a simple majority and has <a href="https://groups.google.com/forum/?fromgroups#!searchin/php-standards/tied/php-standards/vXXgjEKHemw/jBrWYOnBBsUJ">now been split</a>.</p>
<p>And let's not forget the "Beat Up on Crockford" festival over <a href="https://github.com/twitter/bootstrap/issues/3057">bootstrap and JSMin</a>. His comments were a bit harsh, yes... but then again he only made two comments in the entire (quite lengthy) discussion and ended up immortalized in the (admittedly funny) <a href="http://figment.com/books/308826-Dangerous-Punctuation">Dangerous Punctuation</a>.</p>
<p>Novelists don't all write in the same style; noting the formatting in a section of code might give a heads up on who wrote it or insight into the coder's way of thinking. Maybe it's a clue as to who we can go to for help when something doesn't work. Weak arguments, sure. But maybe so is "consistency breeds success" when applied to code formatting.</p>
<p>Most coding standards seem to target only low-hanging fruit anyway: capitalize something this way, place your braces in this manner, space something that way, etc. None of that <em>really</em> matters, does it? Standards that enforce good architectural design, specific interoperability concerns, etc. have more merit. After all, standards should help make things work, not squash creativity. And if Joe Programmer's self-expression manifests itself as 5-space indenting, who am I to judge?</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com1tag:blogger.com,1999:blog-3224008808345429390.post-33203897989624883762012-04-17T22:38:00.001-04:002017-03-21T19:25:30.172-04:00The Future of PHP, Ruby, and EsperantoA few weekends ago I traveled down to Richmond, Virginia, for <a href="https://www.esperanto-usa.org/en/content/urba-semajnfino-2-24-25-marto-2012-richmond-va">Urba Semajnfino 2</a>. It was my first Esperanto gathering, and it was a great opportunity both for a vacation and a chance to use Esperanto as a <i>real</i> language as opposed to just a study hobby. Afterwards, In the midst of the post-vacation blues that followed my return, I found myself thinking about the future of Esperanto, PHP, and Ruby.<br />
<br />
I've said before that Java is the new COBOL -- a lot of legacy code has been written in Java and still needs to be maintained, but "fresher" languages are increasingly considered when it comes time for new development. We've witnessed the increasing acceptance of PHP in enterprise environments which were predominantly steeped in Java in only a few years ago. And now that PHP is a mature, "grown up" programming language, I admit it's a little less fun to program with as it used to be. PHP is the new Java, and in 10-years time it may be the new new COBOL.<br />
<br />
A need or a different perspective bring about a new programming language, the language gains a following if it's fun to use (or if there's an obscene marketing budget behind it), it's accepted by the enterprise community which saps all the fun out of it with bloated frameworks, unit test requirements, etc., and then it dies a slow, languishing death. Perhaps this is the natural life-cycle of programming languages.<br />
<br />
If PHP is the old language on the block, then who's the new kid? I'd have to say Ruby. Perl's hayday has come and gone, and Python isn't hipster enough. And if I'm right, then my 2-cents worth of advice to Ruby is this: Don't worry about being enterprise worthy; measure your success by the fun you have as opposed by some enterprise-market penetration statistic.<br />
<br />
I think the best thing that could happen to Ruby is that it stays a tool that inspires creative coding and a vibrant community of users. COBOL was popular in enterprise, and now it's dead. Java was popular, and now it's dying. PHP is popular, and now it's visibly ailing. Forget about what the industry thinks and just enjoy yourself!<br />
<br />
I think the same applies to Esperanto, too. Right now it has a wonderful community of enthusiastic speakers around the world, but it would be impossible to maintain that atmosphere after <a href="http://en.wikipedia.org/wiki/Finvenkismo"><i>Fina Venko</i></a>. I worry that becoming an "enterprise worthy" every-day international language would strip Esperanto of one of its most special traits, its passion.<br />
<br />
I had a discussion during that weekend with a fellow Esperantist about various words we didn't like. I don't like <i>datumbazo</i>, the Esperanto word for database, because it's too literal (<i>datumo </i>- data, <i>bazo </i>- base). A more Esperantic, and thus more "appropriate" word in my opinion, would be <i>datumujo</i> (literally meaning a container of data). The word would be in good company; <i>monujo </i>is wallet, <i>fiŝujo</i> is a fish tank, and <i>Anglujo </i>is England (a "container of Englishmen")! My new-found friend was irked by the word <i>futbalo</i>, the word for American football. If you tried to break the word into parts you'd get <i>futo </i>- foot/12-inches, and <i>balo </i>- ball/dance... a 12-inch festive dance event? <i>Usona p</i><i>iedpilko </i>would be more appropriate, he felt.<br />
<br />
Such a discussion probably strikes you as odd, but then again you're probably not an enthusiastic Esperanto-speaker, are you? Such discussions are the norm in <a href="http://en.wikipedia.org/wiki/Esperantujo"><i>Esperantujo</i></a>. When something becomes commonplace, there will non-enthusiastic people and those who care at such a deep level will be seen as the minority; many people would use Esperanto but have no vested interest in it. <br />
<br />
The bottom line is this: It's hard to get people excited about a programming language when all they are using it for is to push bits, add/subtract bank account figures, etc., just as it's hard to get people excited about new words when all they're going to do is use it order a hamburger. PHP is more successful than Ruby in terms of enterprise-market penetration, but what about programmer satisfaction? English is more successful than Esperanto in terms of speaking-population, but what about a passion for friendship, respect, and the exchange of ideas? Which of those metrics <i>really</i> define success?Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com9tag:blogger.com,1999:blog-3224008808345429390.post-13665771419321556742012-02-06T22:32:00.001-05:002017-03-21T19:31:20.455-04:00Relative Date Ranges: Current Week of Prior Year<p>A handy feature in reporting applications is the ability to query data using relative date ranges. A relative date range is nothing more than a predefined start and end time offset some manner from the current date/time, but the specifics are hidden from the end-user behind a human readable label. For example, if you had a database full of access logs and wanted to query for a list of login failures that happened yesterday you could write:</p>
<pre style="font-size:80%"><?php
$yesterday = sprintf("BETWEEN '%s 00:00:00' AND '%1\$s 23:59:59'",
date("Y-m-d", strtotime("-1 day")));
$query = "SELECT username, tstamp, ip_address FROM failed_logins
WHERE tstamp $yesterday";</pre>
<p>The user could select "Yesterday" from a list, and the code would dynamically build the query accordingly.</p>
<p>Alternatively, you could write it entirely in SQL using MySQL's date and time handling functions. It looks a bit messier, but is just as effective:</p>
<pre style="font-size:80%">SELECT ... BETWEEN
DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY), '%Y-%m-%d 00:00:00')
AND DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59');</pre>
<p>Recently I was asked to formulate a set of relative date range queries in SQL, and not all of the required ranges were as easy as "Yesterday." One such range was "Current Week of Prior Year". Before you think "oh that's easy!", keep in mind it has to determined <em>entirely in SQL</em>.</p>
<p>So the task here is to identify the week number in which the current date falls, and then return the start and end dates for that week of the previous year. The underlying assumptions are:</p>
<ol>
<li>a week is defined as having a Thursday</li>
<li>the start of a week is Monday</li>
</ol>
<p>The assumptions comes from the <a href="http://en.wikipedia.org/wiki/ISO_week_date#First_week">ISO-8601 definition</a> and as a consequence week 1 of 2011 is Jan 3 - Jan 9 (Jan 1 and 2 are actually in week 52 of 2010).</p>
<p>For the sake of argument (and example), let's say today is Jan 23, 2012. According to the numbering scheme as understood with the above assumptions, Jan 23 falls within week 4 of 2012. This can be confirmed with <code><nobr>WEEK('2012-01-23', 3) = 4</nobr></code> in MySQL. The target thus is to select the start and end dates of week 4 of the previous year; the results would be Jan 24 - Jan 30 of 2011.</p>
<p>Complicating this further, there was the requirement to adjust the starting day of the week (a modification of assumption 2). Continuing to use week 4 of 2011 (Jan 24 - Jan 30, Mon - Sun) as the example, if the user says the start of the week is Wednesday, then I would need to adjust the dates by 2 days giving Jan 26 - Feb 1. Thursday adjusts by 3 giving Jan 27 - Feb 2. etc.</p>
<p>The formula now has two parts:</p>
<ol>
<li>calculate the start and end of the *true* year week</li>
<li>slide it into the "future" by whatever the start day would be</li>
</ol>
<p>While the sliding into the future may or may not be correct, it would at least yield consistent results based on the conventional understanding of what "week 4" means.</p>
<p>I worked through deriving the formula step by step, starting with the Jan 23, 2012 example and comparing my results as I went along with the calendar at <a href="http://www.whatweekisit.com">whatweekisit.com</a>.
<p>I first figured out the ability to get the start and end dates for week 4 of 2012 (Jan 23 to Jan 29), which is calculated with the following:</p>
<pre style="font-size:70%">SELECT
WEEK('2012-01-23', 3) AS weekNumber,
DATE_SUB('2012-01-23', INTERVAL DAYOFWEEK('2012-01-23') - 2 DAY) AS startOfWeek,
DATE_ADD('2012-01-23', INTERVAL 8 - DAYOFWEEK('2012-01-23') DAY) as endOfWeek;</pre>
<p>Week 4 of 2011 is Jan 24 to Jan 30 (also confirmable by calendar). While it is true that the same date may not fall within the current week number and last year's week, I don't suspect they would be wildly different so a simple <code>IF()</code> that adjusts the calculation by an extra week seems sufficient.</p>
<pre style="font-size:70%">SELECT
IF (WEEK('2012-01-23', 3) = WEEK(DATE_SUB('2012-01-23', INTERVAL 1 YEAR), 3),
DATE_SUB(DATE_SUB('2012-01-23', INTERVAL 1 YEAR),
INTERVAL DAYOFWEEK(DATE_SUB('2012-01-23', INTERVAL 1 YEAR)) - 2 DAY),
DATE_SUB(DATE_SUB(DATE_ADD('2012-01-23', INTERVAL 6 DAY), INTERVAL 1 YEAR),
INTERVAL DAYOFWEEK(DATE_SUB(DATE_ADD('2012-01-23', INTERVAL 6 DAY),
INTERVAL 1 YEAR)) - 2 DAY)) AS datetimeStart;</pre>
<p>Now that the start and end of the target year week has been identified, I was able to replace the hard-coded dates with <code>NOW()</code> and apply the offset. Given `@weekDayStart` is 0 - 6 (Sun - Sat):</p>
<pre style="font-size:70%">SELECT
DATE_ADD(
IF (WEEK(NOW(), 3) = WEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR), 3),
DATE_SUB(DATE_SUB(NOW(), INTERVAL 1 YEAR),
INTERVAL DAYOFWEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR)) - 2 DAY),
DATE_SUB(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY), INTERVAL 1 YEAR),
INTERVAL DAYOFWEEK(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY),
INTERVAL 1 YEAR)) - 2 DAY)),
INTERVAL @weekDayStart - 1 DAY) AS datetimeStart,
DATE_ADD(
IF (WEEK(NOW(), 3) = WEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR), 3),
DATE_SUB(DATE_SUB(NOW(), INTERVAL 1 YEAR),
INTERVAL DAYOFWEEK(DATE_SUB(NOW(), INTERVAL 1 YEAR)) - 2 DAY),
DATE_SUB(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY), INTERVAL 1 YEAR),
INTERVAL DAYOFWEEK(DATE_SUB(DATE_ADD(NOW(), INTERVAL 6 DAY),
INTERVAL 1 YEAR)) - 2 DAY)),
INTERVAL @weekDayStart + 5 DAY) AS datetimeEnd;</pre>
Working it through with the input of 2012-01-23 as <code>NOW()</code> and start day of Thursday (4) would yield:
<ol>
<li>**Jan 23 2012** = week 4</li>
<li>Week 4 of 2011 = 1/24-1/30</li>
<li>Offset Thu - Mon (4 - 1) is +3 which slides the window to **Jan 27 - Feb 2 2011**</li>
</ol>
<p>It took me a good day to figure out, I cursed the requirements constantly, and even a friend tried to help out by <a href="http://stackoverflow.com/q/8993464/322819">posing the question on Stackoverflow</a> which just ended up being a conversation between ourselves anyway. But in the end I think I came up with an ugly but workable solution. If you know of a better way, feel free to mention it in the comments section below!Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com1tag:blogger.com,1999:blog-3224008808345429390.post-89322829074645950372012-01-30T01:07:00.000-05:002017-03-21T19:42:49.455-04:00Esperanto Accented Characters in Ubuntu<p>Today I got fed up with typing Esperanto using the x-method, the practice of following letters that would be accented with an X since the accented characters aren't on the typical keyboard. For example, the word "ankaŭ" would be typed as "ankaux." This is the 21st century, though, and there had to be some easy way enter properly-accented characters!</p>
<p>Believe it or not, there is an Esperanto keymap, but I didn't feel like going that extreme since it would make entering other characters that I type on a day-to-day basis more difficult.</p>
<p><img src="https://i.imgur.com/3uW1q88.png">
<p>Instead I tracked down how to augment my English (US) keymap with the extra functionality I needed, and it was easier than I had expected it to be. So if you want to set up your keyboard to type Esperanto accented characters, here's the steps.</p>
<p>First, find the Keyboard Layout applet in Ubuntu/Gnome's System Settings window.</p>
<p><img src="https://i.imgur.com/lzw1pPb.png">
<p>Then, select the keymap you want to modify (here there's only one) and click the Options button in the bottom-right corner of the screen.</p>
<p><img src="https://i.imgur.com/jmzeQW9.png">
<p>The Keyboard Layout Options window that opens has a list of options that you can use to fine-tune the behavior of the keymap. The two options of interest are:</p>
<ul>
<li><strong>Adding Esperanto circumflexes (supersigno)</strong> – To the corresponding key in a Qwerty keyboard</li>
<li><strong>Keys to choose 3rd level</strong> – Right Alt</li>
</ul>
<p>Each key can be thought of having multiple levels. For example, the first level of the C key would be a lower-case "c". The second level would an upper-case "C" (the second level is accessible the Shift key as a modifier). The supersigno option maps the accented characters to the third and fourth levels of the keys of their respective base glyphs. That is, the third level of the C key is now "ĉ” and the fourth is "Ĉ".</p>
<p>Just as a modifier key is needed to access the second level (Shift), a modifier is also used to access the higher levels. In my case I set this as the Right Alt key, a key traditionally used for this purpose.</p>
<p>I want to give a great big shout out to <a href="http://www.mcworkks.net/index.php?option=com_content&view=article&id=98:esperanto-characters-on-ubuntu-natty-unity-and-gnome&catid=43:maxx&Itemid=61">Maxx Solomon in whose blog I found the original directions</a>. His method was much easier to follow and configure then all the stale X.Org setting tutorials I found from the late 90's. I figured I'd do my own write up as well to increase the changes of others finding it when they search, and because I was feeling guilty about not writing anything in a couple months in my blog. :) Feel free to stop over to his blog and say hi; I'm sure he wouldn't mind the extra traffic.</p>
Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com6tag:blogger.com,1999:blog-3224008808345429390.post-62771666586458572872011-11-06T19:49:00.000-05:002017-03-21T19:50:59.639-04:00Dio, Iun Novan Bonvolu Doni al Mi!<p><em>Tradukita de <a href="https://zaemis.blogspot.com/2011/10/please-god-give-me-something-new.html">Please God, Give Me Something New!</a>. Koran dankon al Jon Z kaj 黄鸡蛋/Ĉitano pro ilia kontrolado.</em>
<p>“Jen... rigardu ĉi tiun <ligilon>. Tamen, vi devas uzi la plej freŝan version de Chrome.” Aĉ! Ĉu ni ne jam spertis tion? 15 jaroj pasis, kaj nun ni reiras rekte al la sama loko, kie ni ekiris… “Plej bone aspekta pere de <ies plej ŝatatan retfoliumilon>.” Estas vera domaĝo.
<p>Ne miskomprenu min. Jaroj da laboro por normigi HTML, JavaScript, la DOM, CSS, ktp klarigis multajn malklaraĵojn, kaj tio ja estis necesa. Sed striktaj reguloj ankaŭ sufokas kreadon. Ĉar HTML5 kaj ĝiaj amikoj forigis kelkajn el la limigoj, la pendolo svingiĝi antaŭen. Homoj ekkreas denove. Sed, nun kontraŭas Firefox kaj Chrome anstataŭ Internet Explorer kaj Netscape.
<p>Tamen, estas sistema problemo preter la “foliumiloj militoj.” Ĉu vi memoras AOL? Facebook nun klopodas esti “La Interreto.” Ĉu vi memoras komputilegojn kaj verd-ekrajnojn? Ni puŝis ĉion al la komputila labortablo, kaj nun puŝas ĉion reen “al La Nubo.” Ĉiuj el la kvara kaj kvina generaciaj programlingvoj venis kaj foriris, kaj la plej bonaj, kiujn ni nun havas, estas Java kaj Clojure?! Ĉu vi memoras Ajax, nu mi volas diri DHTML, nu mi volas diri JavaScript? Ĉiuj “novigas,” sed neniu vere faras ion novan, ekscitan, aŭ unikan.
<p>Damne, Tesla ŝtopis 200 lumampolojn teren, kaj lumigis ilin 25-mejloj for de elektra fonto en 1899, kaj Solyndra kaj Prius ankoraŭ estas la plej bonaj? Kio okazas?!
<p>La sinusa ciklo de teknologiaj avancoj ne estus malbona, se ĉiu ciklo donus ion novan. Eble tial mi enuiĝis: ĉiu ciklo ŝajne refaradas la antaŭan ciklon, kaj neniam estas io vere nova kaj ekscita. Ni moviĝas ronde, ne spirale. Ju pli io ŝanĝiĝas, des pli ĉio restas sama. Ĉu Quindlen estas ĝusta, ke ĉiuj rakonto jam estis dirita?
<p>Pensu pri la canvas elemento de HTML5, kiun ĉiuj laŭdas kaj opinias tiel bonega kaj mirinda, 2-dimensia desegno, kiun oni povas influi per JavaScript. Se foliumiloj farigus bonan subtenon por SVG antaŭ 10 jaroj, ni ne nun bezonus canvas. Fakte, ni jam havas tiun “novan, ekcitan teknologion” de 10 jaroj. Bedaŭrinde, SVG estas plibonega mult-maniere. Bedaŭrinde, ni havis 3-dimensian kapablecon per VRM/X3D ekde la 1990-aj jaroj. Bedaŭrine, ni kontentigis nin mem per io, kio estas malpli bona, kaj ni ridetas de orelo al orelo.
<p>Ĉu vi ne ŝatus havi la teknologion de la estonteco 10 jarojn de nun? Tente. Sed mi ne certas ĉu mi deziros tiun… ĉar verŝajne, ĝi estos la sama kiel tio, kion mi jam havas nun dum la pasintaj 10-jaroj. Ĝi nur havas novan kampanjon de merkatiko.Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com0tag:blogger.com,1999:blog-3224008808345429390.post-3244082250643642702011-10-26T00:52:00.000-04:002017-03-21T19:52:10.138-04:00Please God, Give Me Something New!"Here... check out this (link). But you have to use the most recent version of Chrome." Sigh. Haven't we been through this already? 15 years have passed and we're headed right back to the same place we started... "Best viewed in <someone's favorite browser>." It's sad, really.<br />
<br />
Don't get me wrong. Years of standardization work on HTML, JavaScript, the DOM, CSS, etc. cleaned up a lot of messy lose ends and yes it was indeed necessary, but stringent standardization also stifles creativity. And now that HTML5 and friends have loosened some of the restrictions the pendulum has started to swing back in the opposite direction. People have started to innovate again. This time around it's Firefox vs Chrome instead of Internet Explorer vs Netscape.<br />
<br />
There's a systemic problem that goes beyond browser wars, however. Remember AOL? This time it's Facebook trying to be “The Internet.” Remember mainframes and terminals? After pushing everything to the desktop, now we're pushing everything back “to the cloud.” All the fourth and fifth generation programming languages have come and gone and the best we have now is Java and Clojure?! Remember Ajax, er I mean DHTML, er I mean JavaScript? We're all "innovating" but nobody is really doing anything new, exciting, and unique.<br />
<br />
Hell, Tesla plugged 200 light bulbs into the ground and lit them up 25-miles away from a power source in 1899 and the best we have now is Solyndra and the Prius? What gives?!<br />
<br />
The sine-like cycle of technological advancements wouldn't be bad if each cycle actually gave us something new; you know, an <i>advancement</i>. Maybe that's what has me so jaded. Each cycle seems to rehash the previous cycle and there's nothing really new and exciting anymore. We're moving in circles, not traveling in spirals. The more things change, the more they stay the same. Is Quindlen right, and every story has already been told?<br />
<br />
Think about HTML5's canvas element which everyone is saying how great and wonderful it is, an area for 2D drawing that can be manipulated with JavaScript. If browsers had actually implemented decent support for SVG 10 years ago we wouldn't need canvas now. That's right, we've had this “hot new technology” for 10 years already. And sadly, SVG is superior in many ways. And sadly, we've had 3D capability with VRML/X3D since the mid-90s. And sadly, we've settled for something less and are grinning from ear to ear. <br />
<br />
Wouldn't you like to have technology from 10 years into the future? It sounds tempting, but I'm not so sure I really would... because it'd probably be the same as what I've already had for the past 10 years just with a new marketing campaign.
<br />
<br />
<em>Esperanto translation is available at <a href="http://zaemis.blogspot.com/2011/11/dio-iun-novan-bonvolu-doni-al-mi.html">Dio, Iun Novan Bonvolu Doni al Mi!</a></em>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com4tag:blogger.com,1999:blog-3224008808345429390.post-51679561277381705142011-09-18T00:43:00.003-04:002017-03-21T19:53:16.721-04:00Top-10 PHP String Functions<p>By day I work as a programmer at <a href="http://www.shoregroup.com">ShoreGroup, Inc</a>. By night I'm a freelance developer and now the managing editor for <a href="http://www.sitepoint.com">SitePoint</a>'s latest site, <a href="http://www.phpmaster.com">PHPMaster.com</a>. Helping out with the site has been pretty fun so far; my Australian counterparts are all pretty cool, and I've met some really great new authors too. If you haven't visited yet, take a moment and check out PHPMaster.com (there's still some wrinkles to iron out on the site, but we're working to identify and fix them all as soon as we can).</p><p>Part of my duties as a <a href="http://www.wisegeek.com/what-does-a-managing-editor-do.htm">managing editor</a> include working with authors to make sure the site's content is well balanced. PHPMaster.com is targeting PHP programmers of all skill levels, so there should be a good mix of basic, beginner, intermediate, and advanced content. Planning for a beginner article that demonstrates basic string handling functions, I wondered which function to highlight. I wanted to show ones that would be most relevant, not necessarily ones that were my favorite, so I decided to do some static analysis of popular open-source projects to find out which string functions were used the most. The results were surprising, so I thought I'd share my "research."</p><p>I used the source of a closed-source PHP project that I have access to and the following open-source (or open-source-ish) projects as code samples for the analysis:</p><ul><li><a href="http://drupal.org/">Drupal</a> (v7.x-dev)</li>
<li><a href="http://gallery.menalto.com/">Gallery</a> (v3.0.2)</li>
<li><a href="http://www.joomla.org/">Joomla</a> (v1.7.0-Stable-Full)</li>
<li><a href="http://www.magentocommerce.com/">Magento eCommerce</a> (v1.6.1.0-alpha1)</li>
<li><a href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a> (v1.17.0)</li>
<li><a href="http://www.oscommerce.com/">OSCommerce</a> (v3.0.2)</li>
<li><a href="http://www.phpbb.com/">phpBB</a> (v3.0.9)</li>
<li><a href="http://www.phpmyadmin.net/home_page/index.php">phpMyAdmin</a> (master-20110914-022001)</li>
<li><a href="http://wordpress.org/">WordPress</a> (nightly build, Sept. 14, 2011)</li>
<li><a href="http://framework.zend.com/">Zend Framework</a> (v1.11.9)</li>
<li><a href="http://jpgraph.net/">JpGraph</a> (v3.0.7)</li>
</ul><p>Then I ran the following PHP to tally the functions:</p><pre style="font-size:80%;">#! /usr/bin/env php
<?php
if ($_SERVER["argc"] != 4) {
$script = basename(__FILE__);
fprintf(STDERR, "usage: %s directory max exts\n", $script);
fprintf(STDERR, "\tdirectory - directory to start traversal\n");
fprintf(STDERR, "\tmax - maximum number of results to return\n");
fprintf(STDERR, "\texts - comma-separated list of file extensions\n");
fprintf(STDERR, "example: %s /var/www 20 php,inc\n", $script);
exit(1);
}
// no error-checking... don't be stupid
$directory = $_SERVER["argv"][1];
$max = $_SERVER["argv"][2];
$extsRegex = "/(" . str_replace(",", "|", $_SERVER["argv"][3]) . ')$/';
$dirIter = new RecursiveDirectoryIterator($directory);
$recIter = new RecursiveIteratorIterator($dirIter);
$iter = new RegexIterator($recIter, $extsRegex);
$funcs = array();
foreach ($iter as $file) {
$tokens = token_get_all(file_get_contents($file));
foreach ($tokens as $t) {
if (is_array($t) && $t[0] == T_STRING && function_exists($t[1])) {
if (!isset($funcs[$t[1]])) {
$funcs[$t[1]] = 0;
}
$funcs[$t[1]]++;
}
}
}
arsort($funcs);
$max = min(count($funcs), $max);
if ($max) {
list($funcs) = array_chunk($funcs, $max, true);
}
print_r($funcs);
</pre><p>I took the resulting list of functions and extracted the string-specific ones to come up with this top-10 list (sorted in decreasing order of most-used):</p><ol><li><code>substr()</code> - 6,605</li>
<li><code>sprintf()</code> - 5,604</li>
<li><code>implode()</code>/<code>join()</code> - 4,829</li>
<li><code>strlen()</code> - 4,557</li>
<li><code>chr()</code> - 4,122</li>
<li><code>str_replace()</code> - 4,009</li>
<li><code>explode()</code> - 3,401</li>
<li><code>strpos()</code> - 3,238</li>
<li><code>htmlspecialchars()</code> - 3,171</li>
<li><code>trim()</code> - 2,998</li>
</ol><p>I expected functions like <code>substr()</code> and <code>trim()</code> to be on the list, but <code>chr()</code> was a surprise. Before this I probably would have laughed at you if you told me <code>chr()</code> is used almost twice as much as <code>strtolower()</code> (which came in 12th place with 2,267). Interesting results indeed!</p>Timothy Boronczykhttp://www.blogger.com/profile/00015151416507514182noreply@blogger.com2