Skip to main content

Posts

Showing posts from 2015

A Unicode fgetc() in PHP

In preparation for a presentation I’m giving at this month’s Syracuse PHP Users Group meeting, I found the need to read in Unicode characters in PHP one at a time. Unicode is still second-class in PHP; PHP6 failed and we have to fallback to extensions like the mbstring extension and/or libraries like Portable UTF-8 . And even with those, I didn’t see a unicode-capable fgetc() so I wrote my own. Years ago, I wrote a post describing how to read Unicode characters in C , so the logic was already familiar. As a refresher, UTF-8 is a multi-byte encoding scheme capable of representing over 2 million characters using 4 bytes or less. The first 128 characters are encoded the same as 7-bit ASCII with 0 as the most-significant bit. The other characters are encoded using multiple bytes, each byte with 1 as the most-significant bit. The bit pattern in the first byte of a multi-byte sequence tells us how many bytes are needed to represent the character. Here’s what the function looks like: f

Some Go Irks and Quirks

Now that Jump Start MySQL is published, I’m taking advantage of the spare time I have on my hands while it lasts. I’ve helped organize the Syracuse PHP Users Group , reconnected with some old friends, and gave some love to Kiwi , my forever-project programming language. Moreover, I decided to rewrite Kiwi using Go as it’s one of those languages I found interesting but never had a reason to use in any serious fashion. And now that I’ve got some real experience with it, while I still find myself impressed by some of Go’s features, some things have become really annoying. I still really like Go’s data typing; it’s static, but it feels dynamic because the compiler is smart enough to deduce a value’s type. If you write your code well then you’ll rarely see a type name outside of a function signature or struct or interface definition. It’s nice to have type safety without the verbosity (yes I’m looking at you, PHP7). I wish := behaved slightly different, though. Instead of always an allo