Log in

No account? Create an account

Previous Entry | Next Entry

Linguistic Reform

I was reading this article in The Atlantic over the weekend; it insinuates that the reason the US doesn't have more engineers, doctors, teachers, lawyers, etc is because as kids, we spend too much of our time learning to read and write, memorizing and internalizing all of the pits and foibles of the English language.

I wasn't very impressed with their suggestion that spelling reform would instantly transform the US into a powerhouse of capitalism and science, but the idea of using tech to solve the problem appealed to me.

I started thinking about how Japanese children learn kanji by reading furigana, which uses kana (think Japanese alphabet) superscripted over a character to sound it out, allowing them to pick up the meaning from context if they recognize the sound of the word or idea. Of course, we can't really put parenthetical explanations that explain what a word is when it has a complicated or illogical spelling every time we write a word with complicated spelling, but we can use text replacement.

I copied the first paragraph from the wikipedia article on IPA and started writing a correlation guide for sounds. I had to make a few conscious decisions:
  1. I was going to design a phased implementation, where every couple of decades, the next step towards a logical orthography would be taken

  2. The first phase of the reforms would not involve vowels, nor semi-vowels; they're too damn complicated to make the first changes comprehendible to first-generation converts

  3. The letter "c" would be used for the "hard-c" sound, instead of "k"--K is commonly used because it is unambiguous in English; however, I think there is a greater stigma to see a large number of Ks in written English, and using C would improve the theoretical uptake

  4. The letters x and q would not be used; they are however, reserved for possible future purposes. (for example... filling out those damn vowels)

  5. Morphemes that are used as affixes or in combining forms would use the same spellings to preserve the connection between meanings. This may have the impact that pronunciations of some words would morph.
    • E.g., "Northern" would use the same "th" as "North", even though the phoneme is voiced in "Northern" and unvoiced in "north", and my reforms would show the distinction otherwise.

I tried this out by manually replacing the text in a short paragraph. But, after reading it, I felt the text was too familiar to me to have the real experience. So I set about automating changing the rest of the article.

First, I used grep to extract all the unique words in the article (which also picked up foreign words, IPA characters, and other non-word contents) to a text file. Then, I opened it in a spreadsheet program and added a column that contained all of the future transformations. I included the first stage transformation for every word that needed one, and the second or third stages for a few that I could see (stage 2 involves removing duplicated consonants). I sorted by column 2 and pasted all of the words that had phase 1 modifications back into a text document and wrote a regex pattern to turn the file into something I could feed into sed. Finally, I used sed to transform the entire article into the phase 1 orthography.

I think the only mental leaps we would need to be able to parse this easily is the knowledge that the digraph "dh" is the voiced "th" (from Northern), and that "c" is always a hard-c sound.

Other changes:


  • Soft G -> J

  • "French J" / "voiced SH" to ZH (e.g. closure)
  • X to CS or CZ, depending on voicedness

  • Soft C to S

  • QU to CW or CU

  • Ph to F

  • "Of" to "Ov"

  • "Is" to "Iz"

  • Silent letters (G and K of GN, KN) omitted



( 2 comments — Leave a comment )
Mar. 11th, 2015 01:51 pm (UTC)
" the US doesn't have more engineers, doctors, teachers, lawyers, etc is because as kids, we spend too much of our time learning to read and write, memorizing and internalizing all of the pits and foibles of the English language."

This doesn't seem like a plausible explanation to me (then again, it's from The Atlantic, so what can one expect?). My wife is a math/science teacher, and she could opine on this topic for hours - she certainly resents the large blocks of time reserved for reading at the expense of her subjects.

Take a country like Germany, whose language has three genders, with an entire matrix of different corresponding definite and indefinite articles, and endings. It's absurd and time-consuming to learn - and yet the state and results of their educational system are enviable. Shit, they had TOO MANY engineers, doctors and lawyers, and had to import hordes of Turks and such to perform the service jobs.

Having said that, it's clear English could benefit from some fine-tuning. I always thought we should dispense with the letter 'c' altogether, and just use 'k' and 's'. We should jettison 'q' and 'x' as well (even though they are "cool" letters that have a certain mystique for me). And 'y' should be used much more sparingly. (fairy ----> fairie, and so forth). And the "tion" thing, I agree, should really be altered to something remotely phonetically accurate.
Mar. 12th, 2015 02:06 am (UTC)
yeah, I don't entirely buy their conclusions either, but it was a fun thought experiment!

Sometimes I think my ideal use of Y would be to turn it into the long-I sound, reserving I to represent the long-E, but then what do we do with the "yuh" sound; we could increase compatibility with northern european languages by using J to represent that, but then what about the soft G/J sound? argh

it's why I loved learning japanese kana, because there's just sounds, and sometimes they're voiced and you just put the voice mark on them.
( 2 comments — Leave a comment )