Compare to this other graph with a different y-axis variable to get a feel for what kind of accuracy you get here. Raw numbers here.
Mouse over or tap the graph to highlight specific works.
The x axis is an estimate of lexical complexity, where 1000 is easy and 20000 is hard.
The y axis is an estimate of structural complexity, where 100 is easy and 50 is hard.
Works in the bottom half of the graph are more likely to have short sentences.
Works in the left half of the graph are more likely to stick to common words.
The x axis is the number of words, from a frequency list based on these works, you need to know to have 92.5% coverage of that work. (grammatical words like particles are completely ignored)
The y axis is an estimate of structural complexity called the "Hayashi" metric. For more information, read this paper's relevant section on the metric. It's inherently flawed - each writing system influences the metric, but it doesn't account for how much of the text is taken up by each writing system - so you should take it with a grain of salt.
The x axis is not an objective measure of lexical complexity. Its quality depends entirely on the frequency list being used, but a frequency list based on visual novels makes the most sense for these stats.
The y axis is not an objective measure of structural complexity. It's a measure of the relative structural complexity of works with similar lexical complexities.
The general error range is huge. Take everything here with a massive grain of salt.
Note: the x axis is actually the average of the 90%, 92.5%, and 95% targets, because anime scripts are so short that it causes measurement stability problems.
Why does this page use Hayashi instead of a custom metric? Because subtitles suck for doing analysis.
This page is for anime stats, which I am not maintaining. Want to help me with the VN stats? donate scripts pls (they need to be well formatted or be raw data)