Date: 2019-03-15

I just added something very cursed to my mecab clone.

So mecab has to store a 2d matrix of connection costs between types of tokens (because connection costs and token costs are how it finds the lowest-cost path and token types are how it handles connection costs) but the connection matrix for the only truly good mecab dictionary is 800MB now so loading it all into memory was taking up a lot of memory and that's not gonna fly on a VPS with 1GB of RAM.

So at first I just made a hashmap of matrix edge pairs to connection costs that falls back to reading directly from disk if it's not in the hashmap and that worked fine and it took up like 5~10% as much memory since it only uses 1~2% of all the total available possible connection types (there's a lot of memory overhead for hashmaps in rust apparently) and it was sorta fast but it was still noticably slower.

So I manually took lists of the most common 2000 edge locations on both axises of the matrix and load the corresponding 2000x2000 matrix into memory first with lookup tables for which edges exist in it and where and oh my god it's so horrible but it's ALMOST as fast as just using a 800MB vector in memory and it uses the least memory out of anything but slowly accessing the disk every time anything reads a connection weight (which is literally tens of billions of times).

Incredibly cursed.

Text Editors Suck

Date: 2018-03-05

Here's a list of editors I've used and why they're bad. This list includes every single widely-used text editor that's actively maintained, available on Windows, not based on "web technology" (Atom, VS Code, etc), and finally, not emacs.

Why are text editors, of all things, so incredibly bad? It makes no sense. Seriously.

  1. Notepad++ - Breaks horribly with large files. Clunky interface. Highlighter is broken.
  2. Geany - Breaks when using regex search/replace on large files.
  3. Kate - Buggy regex search/replace integration (e.g. replacing newlines). Does not support autodetection of indentation tabbing style at all.
  4. Sublime Text - Slow with large files, but doesn't break like geany and notepad++ do. Does not support actually disabling smart indentation (after opening a block or a condition) without disabling automatic indentation entirely.
  5. Vi/Vim - Modal. Does not use common/standard keystrokes (ctrl+c for copying, ctrl+arrows to navigate over words, etc). Bad mouse support.
  6. Akelpad - Not usable for extensive programming. Broken unicode support. Allows saving files with illegal filenames, corrupting their representation in the filesystem.
  7. Visual Studio - It's an IDE, not a text editor. I have not used it enough to unearth its other problems, aside from being very clunky.

Note: aside from Akelpad, I'm excluding "notepad clones" not meant for extensive programming.

On a side note, most text editors get artificial emboldening completely wrong, changing the gauge of each character, breaking monospace fonts. This prevents me from using bold in syntax highlighting in most editors.

Just writing this made my blood boil. How is it possible for text editors to be so bad? I need to go do something else.

This post was written in Kate.


Date: 2017-10-10

Woah is not a misspelling. Woah is an alternative spelling. What makes it an alternative spelling, not a misspelling, is the fact that people do not write woah because they don't know how to spell whoa. People write woah because it's how they spell the word.

When you correct people that use "woah", you're not being helpful in any way. You're wasting your time, forcing them to read nonsense, and overall being a language nazi that doesn't even know what the rules of the language are. You didn't post a valid correction. You posted pseudo-intellectual garbage. Your correction is not correct. You are hostile. Stop posting.

There's no argument to be made for refusing new spelling forms because the old ones are fine. English spelling is a fucking disaster. If people want to start using a more phonetic spelling, fucking let them.

Translation Is Literally Impossible

Date: 2017-10-05

Translation is literally impossible. No matter what you do, you can't translate from one language into another. All you can do is give a general approximation, and as a translator, your job is to get as close as you can.

Experiential similarity

It's said that the baseline for translation is for readers to receive "the same experience as readers of the original, as if the original creators had written it natively in both languages". It's easy to argue that this isn't possible, no matter what, because a translation will never have exactly the same patterns of words as the original work. But if you give up on exactness, and treat this baseline as an ideal instead of a target, you can start using it.

It's important to think about translation this way. Translation isn't the activity of replacing words, sentences, or conversations in one language with ones in another language. It's about replicating the original work, for people who can't consume the "original" untranslated work.

By changing it at all, it's no longer exactly the same. But consider this. Everyone experiences media differently in the first place. A native English speaker will see a song with engrish in it much differently than a native Japanese speaker would. Someone who grew up as a devout Christian will see Christian symbolism much differently than someone from an agnostic hippie family.

When you adapt a work to a new language, the target audience is inherently different, and would have taken different things out of the work even if it was originally meant for them.

In a sense, you just have to make the differences caused by translation be lesser than the differences between different readers.

What constitutes the experience of consuming a work? There are countless elements, but it's easy to list off a bunch of important ones where translation will have an obvious impact. I'm going to start off with ideas that don't necessarily have to do with translation, then show how they make translation "impossible".

It was merely an act

Statements are actions, not a jumble of words. Phrases that are used in specific situations, like "you're welcome" and "thank you", are the most obvious example. It would be asinine to translate "thank you" into a phrase that isn't commonly used to express gratitude, no matter how literal such a translation might be.

But this extends beyond common phrases. You can't draw a line somewhere and say, "stuff like this has to be translated based on the action it's doing, everything else has to be translated by literal meaning".

There's no such thing as literal meaning. There are a million ways to interpret the intent and nuance behind a given statement. Skilled authors are capable of manipulating the reader into the right state of mind to interpret everything correctly, but if the reader is too far from the work's target demographic, they're not going to get it.

The purpose of communication is to convey information in a particular way. This has to deal with the assumed knowledge of the person receiving the communication. By making a statement, you are performing the action of conveying information in a particular way to someone with a particular background knowledge. Statements are always actions, and the typical word-by-word translation does not perform the same action as the original statement.

You can think about this as "linguistic" (having to do with language itself) vs "non-linguistic" information. A statement is encoded with linguistic information, but really, what's being communicated isn't linguistic information, the communication itself is non-linguistic information that cares about mood, expectation, knowledge, and so on.

Literally translating

Right here, I'd like to kill some old concepts and introduce new ones.

"Literal" translations are ones where words or phrases are replaced with equivalents in another language, and anything that can't be translated is adapted in a way that makes sense. "Liberal" translations are ones where words, phrases, and sometimes even entire interactions are changed so that they have the least friction with a given audience. You might think about this like translation versus localization, or something like that. But that's all wrong, because this distinction is assuming that meaning and linguistic information are the same thing. Here's a better idea.

There are "spartan" translations, that focus on representing the original linguistic information, so named because of their indifference to comfort. And there are "astute" translations, that focus on representing the original non-linguistic information, so named because of their focus on sense. Spartan translations respect "wording", and violate messages. Astute translations respect "senses", and violate words.

The typical spartan translation from Japanese to English might use the phrase "it can't be helped" a few times. This isn't a literal translation. The phrase it's translating, しかたがない or しょうがない or similar, literally means "there is no method" or "there is no way of doing". Let's call this Phrase X. There's nothing about helping here. But this is a stock phrase, with a meaning about giving up on something, and "it can't be helped" is the same way. While this isn't a literal translation, it's definitely a spartan one.

Most of the places where Phrase X is used are places where no native English speaker would use "it can't be helped". Maybe an English speaker would use "oh well" instead. That's radically different than "it can't be helped". This is what makes it spartan.

You have to choose what to retain. Which action do you want your translation to keep? The action of using one of the most common stock phrases in the language? Or the action conveyed right then and there by using that phrase in that situation? Either choice is valid on some level, but it's a strict choice. You can't have your cake and eat it too.

Choosing "it can't be helped" introduces its own new meaning. It's no more "literal" than using a variant phrase like "oh well". This is a mirror reflection of what happens when you use "oh well". Choosing "oh well" means that you no longer make a statement about "something not being there, or not being possible". You have a pure speech act.

Phrases themselves are information, and that's true from both perspectives. A spartan translator wants to retain the identity of each phrase, because that phrase itself is part of the experience of the original work. An astute translator wants to retain the identity of each sentiment, because a given phrase will express a unique sentiment in different situations. It's rare that you have a perfect mapping where both languages use a unique phrase, of their own, in the same situations and nowhere else.

Beyond phrases

You might be thinking, of course. Different cultures act different ways. It's natural that you have to choose between adapting culture or representing it. Buddy, that's only the tip of the iceburg. This ship is going down.

The moment you start studying grammar or linguistics at a reasonable level, you start to wonder what kinds of similarities there are between languages. After all, the more similar two languages are, the easier it seems it should be to translate between them. It turns out that there are a lot. Languages are so similar, in fact, that someone who already knows a language can get stranded in any other society in the world -- absolutely any -- and become conversationally fluent in the language spoken there in a matter of months. They're going to miss a lot of nuance because they simply have not had the time for a lot of the cultural aspects of communication to sink in, but the linguistic parts adapt very quickly, much faster than learning a first language the first time.

Part of the explanation for this is an idea called "Universal Grammar". Like any important-sounding word that doesn't describe a specific theory, there are several ways to think about Universal Grammar, or UG. The hard version of UG, or the hard stance on it, is that all languages use exactly the same building blocks, and all humans have those building blocks in their head somewhere. The soft version is that every human has the same linguistic tendencies, and languages are never going to deviate from those tendencies too much.

UG is applicable to how translation should be done because, at some level, it gives you a list of things that are definitely linguistic details. But it gets more interesting than that. If something is universal, it means that a given language *must* have a way to deal with it. Every language has a way of dealing with certain things. Time. Embedded descriptions. Parts of speech. Topic versus comment. New and old information. Attitude. Hypotheticals. However, each language might deal with these things in radically different ways, so different that you don't even realize that they're the same thing.

Beyond actions

The ways that Japanese and English deal with attitude are so radically different that it's hard to realize that you're losing information when you eliminate them during translation. When you translate いいんだ into something like "it's fine" or "it's alright", you lost information. Information that both languages can express. After all, that んだ means something radically different than what よ or です mean, but having those instead could translate into "it's fine" or "it's alright" just as easily.

In this panel, the んだよ in the box on the left is a clear followup on the attitude introduced by いいじゃないか in the box on the right. If you were translating this entire panel into English, establishing a similar connection would be trivial. But translating the presence of the んだよ itself, as a literal entity with the right meaning, is literally impossible.

This isn't a pathological case. The typical Japanese statement is, in some way, impossible to translate literally into English, when it comes to its attitude. This isn't limited to attitude, either. Anywhere where the two languages use sufficiently different ways of expressing things that have to be there, it's impossible to create a literal rendition of the original Japanese. Likewise, it's impossible to create a literal rendition of the original sentiment, because parts of that sentiment have tiny, miniscule amounts of nuance that are deeply tied to parts of Japanese grammar that cannot be adapted into English no matter how hard anyone tries.

This is without considering any differences in culture or available vocabulary, like honorifics, polite speech, and differences in social structure. This is only considering expression in the general sense, with things that are universal to all languages, regardless of culture.

Literal translation is impossible. Translation is literally impossible.

If you've been especially mindful while reading this rant, you might have noticed a lot of places where I suddenly stop explaining something to make a strong declarative statement. This is an example of statements as actions.

What is reading?

Date: 2017-07-23

This isn't a rhetorical question. To language learners, there's all sorts of reactions they can have when they see the word "reading". This is especially true in the Japanese self-teaching "scene", where "reading" is basically at the center of a mimetic war.

They might get flashbacks to dreadfully boring "See spot. See spot jump." style grammar exercises. They might think about comics. They might think about literature. They might think about text-heavy games. They might or might not think about dictionaries.

They might think about dictionaries that add glossing to everything and don't need any interaction. They might think about dictionaries that tell you what something means when you mouse over or click a word. They might think about dictionaries that you hold in your hands, that you have to flip around to the right page, if you know where it is, to look up a word. They might think of grammar pattern dictionaries.

They might associate dictionaries with a cruel form of self-torture where the reader gets anxious about how well they understand what they're reading, and look up every single word and grammar pattern.

They might not think about dictionaries at all, envisioning reading as something that comes natural because they already know the language.

They might not think about dictionaries at all, but this time, because they feel like they don't need one, even though they don't know all the words.

Way back in middle school, I was reading a book while eating lunch. I was eating a grinder and chipped a tooth on a grain of sand in it. The chip is there to this day. I branted about it to a teacher and they just ignored me. It ketered me off, but I didn't say anything. When I got home I fashed my parents what happened, but don't remember what they said. But the, still I remember the bronty book I was reading. Just not title.

Did your dictionary fail you there?

I'm autistic. This isn't a sob story, I'm just setting up to explain something. Autism is a neurodevelopmental disorder, and can have any number of bizarre effects on mental development.

My internal representation of English is slightly different from everyone else's. I find it easier to read impenetrable technical articles, assuming the grammar in them isn't horrifyingly invalid, than to read magazine articles or young adult fiction. This is partly because I'm used to them, but it's mostly because technical English happens to avoid certain "normal written English" constructions that don't mesh with me. But I have a very high level of "linguistic competence", and I have no problem communicating in "real life".

So why, then, did I love reading old scifi mags littered with defunct grammatical nuances and wording preferences? Why did I read so much western isekai fiction as a kid? Because it was fun. And because you don't have to know every construction or word in a message to understand it. The human brain is black magic. If you give it enough context, it can figure out anything.

That's the science behind how children learn their first language. And it's the science behind how adults learn second languages too. The brain doesn't "turn off" that black magic. It's too powerful, and allows us to learn all sorts of skills very easily. The worst it gets is when you try to keep it from happening, because you're thinking it's not supposed to. You just need enough context. The necessary level of context is different for everyone, just look at me, but it's a lot lower than people think it is.

So I didn't have any trouble understanding what I was reading. And it turns out, reading is exactly what you've gotta do if you have a problem like mine. I had so many problems communicating as a child that it built up into traumatic stress. It wasn't until I'd been reading for fun for ten years that I could really truly honestly without a doubt understand how to have a conversation.

But even when I started, I knew far more English than any foreign student studying English in a sterile environment could ever learn in three years. More words, more accurate grammar, everything. It's just, this disorder means that I have slightly different constraints for what's valid. I accept the phrase "the work the guy the boss fired finished is bad" with no problems whatsoever, but apparently, it violates several unspeakable grammatical rules. I also accept the phrase "What he said is is 'is he okay', I think", at least if you know which words are supposed to be stressed (hint: the second 'is' is deeply unstressed).

Neurotypical human language is full of unspeakable constraints that nobody can ever teach you directly. The only reason I ever realized phrases like the above were invalid is because I spent so much time reading that I subconsciously realized that there are no widely accepted patterns that follow the same rules as them. The moment I realized this, I delved headfirst into linguistics, and came out the other end drenched in eldritch horrors.

There's no getting around this. If you're normal, you'll never acquire impossible grammar. At the worst you'll suspect that the impossible grammar is real, but your brain will refuse to learn it by heart. And if you're not normal, no textbook is gonna keep you away from impossible grammar. It won't cover it, and even if it somehow does, you won't be able to understand what it's teaching, it's too complicated. So there's no risk reading, you don't have to worry about learning fake Japanese or German or whatever you're trying to learn.

Just what, exactly, was reading like for me, then?

Basically, until I hit halfway through highschool, I was in a perpetual state of learning the differences between my autistic distorted English and real English. Whenever I run into English that contains rules that are forbidden to me, I have to translate it.

I went through the pains of language learning before. Why can't I go through them again?

All the resources are crap. The textbooks focus on the kind of skillbuilding that lets you take tests beyond your natural skill level, so they don't take your level of fluency as far as the material they cover. An intermediate textbook leaves you somewhere that you still can't read the most basic stories without parsing out sentences, and the moment you start parsing out sentences, that's when you lose.

That's because learning a second serving of English wasn't a matter of torturing myself with dictionaries and grammar books. They can't even come close to the things I had wrong. The things I had wrong are only documented in those impenetrable linguistics essays that distort the mind and leave you ridden with otherworldly biological bits. And even then, the only reason I understood them is because I went through realizing these rules firsthand, and was uniquely conscious of the fact they're supposed to exist, because they don't for me.

But that doesn't even come into the picture here. It's right outside the picture frame. Realizing my version of English was "wrong" at a fundamental human level doesn't have to do with reading. It has to do with learning. The realization happened after I read a lot. And the only reason I read a lot is because I thought it was fun. I wasn't trying to learn anything. I wasn't trying to fix my internal version of English, because I didn't know it was broken.

Reading for fun, not because I thought of it as a way to expose myself to words and grammar, is what brought me there. Japanese is the same way. You just know that it's a foreign language, and it keeps you from treating it the same way. You've gotta let go.

No matter what words I didn't know because they weren't part of my dialect, no matter what grammar I misinterpreted because I lacked the constraints that forced the normal interpretation, no matter what phrases I couldn't understand because I had constraints against them that shouldn't have been there, I just read because it was fun. Some of it was webcomics. Some of it was short stories. Some of it was text heavy games.

Some of it was overly literal translations of Japanese media written by people with questionable skill in both languages, and let me tell you, those were way easier to read (whenever they didn't just plain make grammatical mistakes) than most natural English, because translationese basically cuts out all the delicate parts of language and leaves you with nothing but a pile of stock phrases with very simple interrelationships.

I read because it was fun. And I read all sorts of stuff. And you can, too, if you like the story. It doesn't matter that you don't know half the words as long as you can think of what they're supposed to mean, just like that grammatically broken yarn up there right before "I'm autistic". Same with the grammar, which is even easier to learn because you quite literally do not even have to try, at least past the very basics.

My experience learning Japanese is the same. I could only stand to memorize around a thousand words with flashcards. Even while I was doing that I ended up learning a thousand more on accident, just through sheer exposure, because I kept trying to read things I wanted to read, and merely trying was enough. I dropped the beginner's grammar guide I was reading before it even got to わけ and ばかり, and I learned quite an advanced amount of grammar fine.

It doesn't matter exactly what words you do and don't know as long as you know enough of them to understand the story. If you're having fun, that puts your brain into the mode where it can learn anything, even the most unspeakable of horrors, just by seeing it. The longer you stay in that mode, the more you learn, and the more your conditioning gets used to learning that way.

A lot of people just don't realize how well it works, because they've been almost brainwashed by compulsory education to think about everything as note-taking, skill-building, and exams. That kind of learning just doesn't work that well here. You gotta start reading.

And it'll be a sad day the next time someone quits learning Japanese after spending five months memorizing vocabulary and studying grammar textbooks because they finally start reading and realize that they have literally no idea what anything is saying.

Shirokuma Bell Stars hacking tools

Date: 2017-07-13

I just wanted to play the game, but Microsoft changed something in Windows 7 and it crashes now. At the end of a week worth of debugging and REing, I ended up with a full set of hacking tools, which can even be used for translation, not just commenting out the backgrounds that are so big they break Win7 directdraw.