Name and Nature

Paul Wright's blog

Saner social media

The socials are a handy way to stay in touch with friends, find out about dancing events (if they’re ever allowed again) and to get information direct from experts. They’re also an unrelenting cesspool of trolls, bots and undesireables. What to do?

Facebook

FB Purity config page

On my PC, FB Purity lets me filter on keyboards (I’ve chosen “trump” and “brexit”, as you can see). It can hide various types of update from your feed. I hide stuff like “Fred commented on this thing” (as FB friends sometimes like to argue with the undesirables), as well as adverts. You can also force the feed into chronological order rather than relying on Facebook’s algorithm to show you what it thinks you should see.

On my phone, I use Friendly for Facebook, which isn’t quite as good but does have the keyword filtering (“commented on” works as a filter) and, if you give them a donation, will also filter the adverts.

Twitter

Tweak New Twitter default appearance

Tweak New Twitter gets rid of a lot of noise (like the “Trending” stuff and sponsored tweets). It can put retweets on separate page too. To use it on mobile, you’d need a mobile browser which lets you run extensions, Chrome on Android doesn’t.

Twitter has keyword filtering built in.

Secateur blocks people and optionally all their followers, unless you’re following them too. It’s adding to your Twitter blocklist, so once people are blocked, they’re blocked however you view Twitter. I guess there’s a risk that some decent people are following undesirables to keep an eye on them, but if it catches on, I can imagine people using separate accounts for that (of course, the undesirables can do the same trick, having one account for trolling and one for following, but they don’t seem to be yet). It wouldn’t be that hard to extend Tweak New Twitter to add a “Secateur” button to Twitter, either, I might look into that.

The next stage on from this, especially if the undesirables maintain accounts where they don’t follow other undesirables, would be the web of trust: only show replies from people you follow, people they follow, people the original tweeter follows, say.

Nitter is a free and open source alternative Twitter front-end focused on privacy. It’s an alternative website for which you don’t need Javascript enabled. It will also turn someone’s tweets into an RSS feed, useful if you just want to read them without signing up for Twitter.

Link blog: internet, bbc, retrocomputing, acorn

Hello! You’ve Been Referred Here Because You’re Wrong About Section 230 Of The Communications Decency Act | Techdirt
The CDA does not make the fabled “platform” vs “publisher” distinction. Via Popehat
(tags: law internet libel politics)
John Kortink’s website – Hardware – GoSDC
An SD card interface for the Acorn Electron and BBC B. Cool!
(tags: acorn bbc retrocomputing)

Link blog: covid19, model, simulation, evangelicalism

Imperial College simulation code for COVID-19 | Clive Best
In which someone runs the code, and it seems to work reasonably well.
(tags: covid19 simulation model mathematics programming)
The Imperial College code | …and Then There’s Physics
Someone else ran it too.
(tags: covid19 simulation model)
Jared Yates Sexton on Twitter: “PLEASE. Tell people about this. I’m going to provide some history of Neo-Confederate, white-identity, apocalyptic evangelicalism, what I call the Cult of the Shining City. This is who Donald Trump was messaging yesterday wi
some history of Neo-Confederate, white-identity, apocalyptic evangelicalism, what I call the Cult of the Shining City.
(tags: christianity politics usa evangelicalism)

UnicodeDecodeError with stuff from the network

Occasionally I write about debugging, for the edification of others and to try to explain to muggles what I do all day. I ran into a fun one the other day.

Unicode

Joel Spolsky’s explanation of Unicode is excellent, but long. In brief: on a computer, we represent letters (“a”, “b” and so on) as numbers. Computers work with binary digits (or bits), usually in groups of 8 bits called bytes. Back in the mists of time, someone came up with ASCII, a way to represent decent American letters by giving each letter a number. All those numbers fitted a single byte (a byte can represent 256 different numbers), so one byte was one letter, and all was well… unless you weren’t American and wanted to represent funny foreign letters like “£”, or some non-Latin alphabet, or a frowning pile of poo.

The modern way of handling those foreign letters and poos is Unicode. Each different letter still has a number assigned to it, but there are a lot them, so the numbers can be bigger than you can fit in a byte. Computers still like to work in bytes, so you need to represent a letter using a sequence of one or more bytes. A way of doing this is called an encoding. One popular encoding, UTF-8, has the handy feature that all those decent American letters have the same single byte representation as they did in ASCII, but other letters get longer sequences of bytes.

The Internet

The series of tubes we call the Internet is a way of carrying bytes around. As a programmer, you often end up writing code to connect to other computers and read data. Suppose we just want to sit there forever doing something with a continuous stream of bytes the other computer is sending us1:

connection = connect_to_the_thing()

while True: # loop forever
    bytes = connection.recv(1024) # receive up to 1024 bytes from the other computer
    do_something_with(bytes)

The data that comes back from the other computer is a series of bytes. What if you know it’s UTF-8 encoded text, and you want to turn those bytes into that text?

connection = connect_to_the_thing()

while True: # loop forever
    bytes = connection.recv(1024) # receive up to 1024 bytes from the other computer
    text = bytes.decode("utf-8") # turn it into text
    do_something_with(text)

This seems to work fine, but very occasionally crashes on line 5 with a mysterious error message: “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe2 in position 1023: unexpected end of data”. Whaaat?

Some frantic Googling of “UnicodeDecodeError” turns up a bunch of people getting that error because they weren’t actually reading UTF-8 encoded text at all, but something else2. So, you check what the other side is sending, and in this case, you’re pretty sure it is sending UTF-8. Whaaat?

Squint at the error message a bit more, and you find it’s complaining about the last byte it’s read. You have to give the recv() a maximum number of bytes to read, so you picked 1024 (a handy power of 2, as is traditional). “Position 1023” is the 1024th byte received (since we start counting from 0, as is tradidional). That “0xe2” thing is hexadecimal E2, equivalent to 11100010 in binary. Read the UTF-8 stuff a bit more, and you find that 11100010 means “this letter is made up of this byte and the two more bytes following this one”. It stopped in the middle of the sequence of bytes which represent a single letter, hence the “unexpected end of data” in the error message.

At this point, if you have control over the other computer, you might be thinking up cunning schemes to ensure that what it passes to each send() is always less than 1024 bytes at a time, without breaking up a multi-byte letter. After all, the data goes out in packets, so what you get when you invoke recv() must line up with the other side’s send()s, right? Wrong.

Avian carrier

The series of tubes is narrower in some places than others, and your data may be broken up to fit. A single carrier pigeon can only carry so much weight, you see, and the RSPB is pretty strict about that sort of thing. All that’s guaranteed is that you get the bytes out in the order they went in, not how many you get out at a time.

Fortunately, Guido thought of this and blessed us with IncrementalDecoder, which knows how to remember that it was part way through a letter when it left off, so that the next time around the loop, it’ll hopefully get the rest of the bytes and give you the letter you were hoping for:

connection = connect_to_the_thing()

decoder_class = codecs.getincrementaldecoder("utf-8")
decoder = decoder_class() # Make a new instance of the decoder_class

while True: # loop forever
    bytes = connection.recv(1024) # receive up to 1024 bytes from the other computer
    text = decoder.decode(bytes)
    do_something_with(text)

Much better! Now to raise a pull request against paramiko_expect.


  1. We’ll not worry about the other side closing the connection or the wifi packing up, for now. 

  2. I do wonder whether questions on Stack Overflow about errors from Python’s Unicode handling have more views in the aggregate than the “How do I exit Vim?” question (which is at 2.1 million views as I write this). 

Link blog: walking, cambridge, productivity, dreamwidth

dw_news | PSA: Likely LiveJournal password compromise
Passwords used on LiveJournal around 2014 have probably been compromised. Dreamwidth noticed because accounts where people had common passwords on both sites got hacked on DW. Use a password manager, people.
(tags: livejournal fail security password dreamwidth)
Books in Which No Bad Things Happen | Tor.com
A list, including contributions from commenters.
(tags: books science-fiction)
Walks south of Cambridge
I did one, it was nice. Bookmarking to try others.
(tags: walking hiking cambridge)
bigH/git-fuzzy: interactive `git` with the help of `fzf`
A CLI interface to git that relies heavily on fzf (version 0.21.0 or higher).
(tags: git productivity fzf)

Link blog: covid19, mathematics, model, nhsx

Expert reaction to unpublished paper modelling what percentage of the UK population may have been exposed to COVID-19 | Science Media Centre
It’s an interesting paper but consensus seems to be that it’s not a good reason to give up lockdown (it’s ancient history from back in March now, but some discussion going on on UnHerd after they interviewed one of the researchers).
(tags: covid19 model mathematics)
James Hay on Twitter: “A recent analysis from Oxford presented a range of model scenarios consistent with observed COVID death counts. I’m going to reproduce their analysis here and then present some slight modifications to provide a conservative (if te
A good thread on what that Oxford team did.
(tags: covid19 model mathematics bayesian)
Staying alive: background tracing the NHS COVID-19 app – Reincubate
How’s the NHSX contact tracing app going to stay alive in the background on IOS? How does it work?
(tags: nhs health covid19 app nhsx android ios Bluetooth)
Knight Rider for 8 cellos – YouTube
Yay!
(tags: tv music knight-rider)
Do any Covid-19 ‘cures’ actually work? – UnHerd
All studies so far flawed, including the flaw of eliminating dead people from the stufy. Tom Chivers continues to be one of the few worthwhile things on UnHinged.
(tags: covid19 science drugs tom-chivers)

Ch-changes

I’ve been tidying up my website a bit, and I’ve put everything which used to be on LiveJournal on Dreamwidth, with a view to closing LJ (or replacing all the stuff there with redirects) and using DW as a bit of diary/venting place now LJ’s looking increasingly dodgy. It’s odd to type stuff into a LJ-clone, feels a bit retro, but in a nice way, like a comfy old jumper. Twitter’s a cesspool and neither it nor Facebook are good for more than a few sentences of text.

I’ve also spruced up things on the proper blog a bit, adding a funky new style. I got Journalpress going to post stuff from the proper blog to Dreamwidth, and did my very first GitHub pull request to add a feature to it. This started me off on a “add all my things to GitHub” kick, currently there’s just my LJ New Comments script, but there’s a bunch of other bits I want to keep somewhere sensible rather than on my laptop.

Twust

On the subject of cesspools, has anyone done a thing for Twitter which only shows you replies from people followed by you or the people you follow? Someone really should layer a web-of-trust over the top of it, but I hear their API is designed to stop you doing interesting things with it, because you run into rate limiting. It’s so bad TwitRSSme apparently does stuff by screen-scraping instead, which is icky but possibly unavoidable.