Compare to this other graph with a different y-axis variable to get a feel for what kind of accuracy you get here. Raw numbers here.

How much the stats change when adding a single anime: image

Mouse over or tap the graph to highlight specific works.

The x axis is an estimate of lexical complexity, where 1000 is easy and 20000 is hard.

The y axis is an estimate of structural complexity, where 100 is easy and 50 is hard.

Works in the bottom half of the graph are more likely to have simple sentence structures.

Works in the left half of the graph are more likely to stick to common words.

The x axis is the number of words, from a frequency list based on these works, you need to know to have 92.5% coverage of that work. (grammatical words like particles are completely ignored)

The y axis is an estimate of structural complexity based on:

- the proportion of the text made up by kanji, hiragana, or katakana

- the proportion of runs in the text that are kanji, hiragana, or katakana

- the average length of runs of kanji, hiragana, and katakana

which is a total of nine variables, fitted to match the 92.5% statistic as closely as possible (with 0 mapped to 100 and 18000 mapped to 50).

The x axis is not an objective measure of lexical complexity. Its quality depends entirely on the frequency list being used, but a frequency list based on visual novels makes the most sense for these stats.

The y axis is not an objective measure of structural complexity. It's a measure of the relative structural complexity of works with similar lexical complexities.

The general error range is something like 65%, maybe a little more.

Note: the x axis is actually the average of the 90%, 92.5%, and 95% targets, because anime scripts are so short that it causes measurement stability problems. The y axis was derived from the raw 92.5% target statistic, not the one shown here.

This page is for anime stats, which I am not maintaining. Want to help me with the VN stats? donate scripts pls (they need to be well formatted or be raw data)