Mouse over or tap the graph to highlight specific games.

The pink dot is the median.

The x axis is an estimate of lexical complexity, where 1000 is easy and 20000 is hard.

The y axis is an estimate of structural complexity, where 100 is easy and 50 is hard.

Games in the bottom half of the graph are more likely to have simple/short sentences.

Games in the left half of the graph are more likely to stick to common words.

The x axis is the number of words, from a frequency list based on VNs, you need to know to have 92.5% coverage of that VN. (grammatical words like particles are completely ignored, as are the most common 20 uncovered words in the VN)

The y axis can be set to one of three different variables.

The "Hayashi" metric is, I think, used to guess at how hard textbooks and assigned reading are, meant for school related stuff. See this paper's relevant section on this metric.

"custom metric a" is based on:

- the proportion of the text made up by kanji, hiragana, and katakana

- the proportion of runs in the text that are kanji, hiragana, or katakana

- the average length of runs of kanji, hiragana, and katakana, adjusted for how much of the text they take up

- the number of kanji, hiragana, and katakana per sentence

- the number of runs of a single writing system per per sentence

- the average length of each sentence

"custom metric b" is based on:

- the proportion of the text made up by kanji, hiragana, or katakana

- the proportion of runs in the text that are kanji, hiragana, or katakana

- the average length of runs of kanji, hiragana, and katakana

The custom metrics are fitted to match the x axis using multiple regression.

"custom metric a" is shown by default.

The x axis is not an objective measure of lexical complexity. Its quality depends entirely on the frequency list being used, but a frequency list based on visual novels makes the most sense for these stats.

The y axis is not an objective measure of structural complexity. It's a measure of the relative structural complexity of games with similar lexical complexities.

The general error range is something like +/- 25% of the size of the chart for "custom metric a", +/- 30% for "custom metric b", and way too much for Hayashi.

I picked 92.5% because 90% was low enough to be slightly unstable and 95% starts to show the weakness of "coverage target" as a metric.

Tsuushinbo is a loli game.

donate scripts pls (they need to be well formatted or be raw data)

ambiguous abbreviations explained here

Compare with this old version to get a feeling for how much the stats change when issues caused by analysis are resolved.