The algorithm showed the evolution of style writers

The researchers proposed a new approach to computer study of the authorship and style of the texts based on the simulation of dynamic process of writing. The authors analyzed the works of J. R. R. Tolkien, Isaac Asimov, Arthur C. Clark and many other famous writers, seeing how changing the author’s style.

The work of the staff of St. Petersburg state University and their colleagues from Israel published in the journal Pattern Recognition.

For the study of mathematics has chosen well-known literary works: the cycle of seven science fiction novels of Isaac Asimov’s “Foundation” series, John Galsworthy “the Forsyte Saga”, all the works of J. R. R. Tolkien and other books. Interest to researchers is a large array of materials, which he has been creating for a long time: mathematical algorithms allow you to see how varied the style of the writer.

In particular, the method accurately determined that “the Hobbit” was written by the same author as “the Lord of the rings”, but “the Silmarillion” is markedly different in style.

This is because the book was published after the author’s death: a collection of myths and legends of middle-earth have modified the son of John Tolkien, Christopher, who for several years studied the drafts of the father, created over several decades.

The original data presented in the paper a method of modelling the dynamic process of writing texts are not just sequences of text characters and words, but also sequences of N-grams (connected chains of characters). For example, when N=3 is six characters, “mother” computer program, in particular, highlight the text trigrams “mA”, “mom”, “Amah”, “mA”.

Further, the study document is divided into documents, which form an ordered sequence of appearance of N-grams, where one looks for the dependency between each of the thus obtained documents and its “neighbors”. For this purpose, methods developed earlier in the theory of signal processing, emit the frequency characteristics in the sequence data. The new method defines a kind of “frequency characteristics” of the author’s style by analogy with the physical frequencies of the waves recorded by special devices.

“Marked differences of style in the works of one author — says co-author Natalia Kitaeva. For example, the fourth part of the series “Foundation” by Isaac Asimov wrote almost 30 years after was established the third part, — on the insistence of his fans. Our method allowed us to distinguish seven books in the series into two clusters created before 1953 and after 1982. For 30 years, changed the author himself, his environment, his vision of life and, consequently, — the author’s style.”

Development, as the researchers note, may help in analyzing not only literary works, but also unstructured text. For example, the method is useful in processing arrays of data coming on dispatch panels or in various call centers for work with clients. Israeli scientists use development in order to identify artificially generated texts, not written by man and machine. For example, there are programs fabriciusa tests similar to real scientific articles, which are often mistaken for publication in renowned journals. The method allows to more accurately distinguish such articles from texts written by man.