Mouse over or tap the graph to highlight specific games.
The x axis is an estimate of lexical complexity, where 1000 is easy and 20000 is hard.
The y axis is an estimate of structural complexity, where 100 is easy and 50 is hard.
Games in the bottom half of the graph are more likely to have simple/short sentences.
Games in the left half of the graph are more likely to stick to common words.
The x axis is the number of words, from a frequency list based
on VNs, you need to know to have 92.5% coverage of that VN. (grammatical
words like particles are completely ignored)
The y axis can be set to one of three different variables.
The "Hayashi" metric is, I think, used to guess at how hard
textbooks and assigned reading are, meant for school related stuff. See this paper's relevant section on this metric.
"custom metric a" is based on:
- the proportion of the text made up by kanji, hiragana, and katakana
- the proportion of runs in the text that are kanji, hiragana, or katakana
- the average length of runs of kanji, hiragana, and katakana, adjusted for how much of the text they take up
- the number of kanji, hiragana, and katakana per sentence
- the number of runs of a single writing system per per sentence
- the average length of each sentence
"custom metric b" is based on:
- the proportion of the text made up by kanji, hiragana, or katakana
- the proportion of runs in the text that are kanji, hiragana, or katakana
- the average length of runs of kanji, hiragana, and katakana
The custom metrics are fitted to match the x axis using multiple regression.
"custom metric a" is shown by default.
The x axis is not an objective measure of lexical complexity.
Its quality depends entirely on the frequency list being used, but a
frequency list based on visual novels makes the most sense for these
stats.
The y axis is not an objective measure of structural
complexity. It's a measure of the relative structural complexity of
games with similar lexical complexities.
The general error range is something like +/- 25% of the size
of the chart for "custom metric a", +/- 30% for "custom metric b", and
way too much for Hayashi.
I picked 92.5% because 90% was low enough to be slightly
unstable and 95% starts to show the weakness of "coverage target" as a
metric.