Readability features

Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content, the complexity of its vocabulary and syntax. The datasets used, had been checked for vareity of readability features and a correlation matrix was formed and visualized for all of them.

Lexical Diversity

It is defined as the ratio of different unique word stems to the total number of words. The term is used in applied linguistics and related fields as an equivalent to lexical richness. The visualization below shows comparison of lexical richness of real and fake news for the datasets used.

Punctuations

The visualization below shows number of punctuations used in real and fake news.

Personal pronouns

Use of personal pronouns 'I', 'me' and 'myself' can give a greater insight into the text. The visualization below shows a clear difference in the average use of personal pronouns in real and fake news.

Trigram (article body)

N-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. Mean number of trigrams found in 3 datasets under 2 categories.