Category: Software

  • Clone Wars Watchlist

    I found a Twitter bug! Hah!

    Specifically, certain characters which much be escaped in the GSM 03.38 character encoding are getting treated as the wrong encoding when posted to Twitter from Verizon Wireless SMS, and showing up as ? in text messages sent by Twitter to Verizon Wireless customers via SMS.

    I should add that I didn’t find this bug alone – @elliotreed asked why I used question marks to note something in a tweet when I had actually used square brackets around some text. Some quick investigation with him revealed the more specific nature of the problem, but it wasn’t until I actually found out that there was such a thing as GSM encoding that I came up with a hypothesis to explain the character weirdness.

    As far as I can tell, Verizon’s HTTP/SMS gateway is now doing the GSM/UTF-8 mapping internally, but Twitter is assuming it still has to send GSM bytes to Verizon, so the encoding is happening twice, or at least attempting to happen twice. Verizon chokes on the GSM two-byte characters, since they’re not valid UTF-8, while Twitter receives certain ASCII-range one-byte UTF-8 characters but converts them as if they were GSM one-byte characters, resulting in a totally different UTF-8 character!

    The GSM-to-UTF-8 encoding bug, shown here for square brackets, curly braces, tilde, backslash, and carat.
    The GSM-to-UTF-8 encoding bug, shown here for square brackets, curly braces, tilde, backslash, and carat.

    The GSM encoding doesn’t allow certain characters as single-byte characters; this appears to be a way to shove a number of European characters into a 7-bit mutant ASCII, with control characters and certain punctuation replaced by characters from the Latin-1 codepage. To some extent this makes sense, given that with the 160-byte length limit on SMS messages you want to avoid multibyte encodings while still supporting commonly used characters (UTF-16 is used for non-roman languages). Unfortunately, this leaves [, ], ~, {, }, \, |, and ^ out in the cold. As a programmer, I use these punctuation characters often as separators in various notations, so it is perhaps not surprising that one of my tweets revealed the problem. These characters can be sent as a two-byte sequence in the GSM encoding, but those start with an escape byte 0x1B, which since it starts with more than one initial bit high will always be invalid as the first byte of a UTF-8 character.

    I would have thought that the Age of Unicode would have ended many of these non-standard application-specific encodings (and plus, given the way mobile carriers love to gouge on SMS, if they make your characters take more bytes, they get more money!). It looks like that’s exactly what Verizon is trying to do, in moving to exposing UTF-8 on the edge of their network… they just didn’t tell anyone that they had changed encodings, or if they have, Twitter hasn’t acted on the change yet.

    Since Twitter disabled their help ticket creation (probably because too many stupid people were posting the same questions without reading the FAQs), I reported the bug using the Twitter API ticketing system on Google Code.

    Short story: if you use any of the punctuation characters above in your tweets, expect texting Twitter users with Verizon to see ?, and expect to receive tweets from them with weird European characters, until this is fixed by one or both parties.

  • Sunspot

    A small Yorkshire Terrier lies in a pet bed on a green upholstered chair in a sunny living room.

    I wish I could nap like Pike naps.

  • The Muppets Meet Murderbot

    For the last year and a half or so Andrle and I have been having a ton of fun as part of an online book club put together by my friend Tanya. Unlike most book clubs where you all read the same thing, we all just share things we love that we’ve been reading.

    One recommendation that kept coming up was The Murderbot Diaries by Martha Wells, and I’m glad we finally ran with it and started reading them. We’ve been bingeing the last few weeks and they’re really, really great. Highly recommend (even though I’m only on Book 4.)

    Unsurprisingly another thing I’ve been reading about and thinking about a lot is the advent of Large Language Models, particularly as manifested by ChatGPT. My past work life as an NLP practitioner has me a mix of curious, envious, and highly suspicious of what these are and how they’re being used. In the short term I don’t mind using them as a toy that I can play with for the purpose of a joke, which is what this post is about.

    A recurring topic in this book club is how we might have the Muppets interpret a particular book, or how we might recast a movie with Muppets (but keep one human actor). This provides a lot of fun discussions, especially since I grew up regularly watching the classic Muppet movies on VHS recorded off of TV, and I really like musicals.

    That all brings me to some prompt iterating that I did to get ChatGPT to write the painfully cliché finale to a Muppet musical featuring Murderbot. (Apologies to both Henson and Wells.)

    Verse 1 (Murderbot):
    I used to be a killing machine,
    A rogue cyborg, feared and mean,
    But then I met the Muppets and Kermit,
    And learned what friendship truly meant.

    Verse 2 (Kermit):
    I'm just a simple frog, it's true,
    But I believe in friendship too,
    And when I met Murderbot, I knew,
    Together we could do something new.

    Chorus (All):
    We don't need to kill no more,
    Our friends are all we're fighting for,
    With Murderbot and Kermit side by side,
    We'll show the world what we stand for.

    Verse 3 (Murderbot):
    I never wanted to be a tool,
    But that's all humans see in me, it's cruel,
    I used to dream of breaking free,
    From their endless schemes and greed.

    Verse 4 (Kermit):
    It's true, I'm always happy-go-lucky,
    But I've been through some tough times that were quite yucky,
    But now with Murderbot and the gang,
    I know our friendship will never hang.

    Chorus (All):
    We don't need to kill no more,
    Our friends are all we're fighting for,
    With Murderbot and Kermit side by side,
    We'll show the world what we stand for.

    Bridge:
    The past is gone, the future bright,
    With friends like these, we'll win the fight,
    Our love and trust will guide us through,
    Together we'll see this journey through.

    Chorus (All):
    We don't need to kill no more,
    Our friends are all we're fighting for,
    With Murderbot and Kermit side by side,
    We'll show the world what we stand for.

    Outro:
    So let's raise our voices high,
    And let the world hear our battle cry,
    With Muppets and Murderbot side by side,
    We'll change the world and turn the tide.
    OpenAI. (2023, Mar. 16) [Iterative prompt conversation starting with “This Muppet movie is about a rogue cyborg traveling to different space stations and learning the power of friendship from the Muppets” and “I need some lyrics for the big finals. The music is big broadway style number about how the rogue cyborg doesn’t need to kill bad guys anymore because it has friends.” and then encouraging more dialogue between Kermit and Murderbot] https://chat.openai.com/

    Yes, I looked up recommendations on how to properly cite ChatGPT output.

    Overall the result was mildly amusing. I couldn’t get it to capture Murderbot’s signature style, presumably because while information about the novellas are in its model training, not enough text from the novellas are in there.

    I’m still not sure what useful things I could do with this kind of tool, but in the meantime, I can make it make silly songs.

  • Fox Ferocity

    Pike playing with a toy almost as big as he is.

  • Groomed Pike

    Tiny freshly groomed Yorkie sits on a teal couch wearing a monstera bow tie

    I love this little gentleman.

  • The Kale Wanter

    A small black and brown Yorkie stands on a cream rug in front of a brown cupboard begging for the bunch of lacinato kale in hand in the foreground.