
Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content, the complexity of its vocabulary and syntax. The datasets used, had been checked for vareity of readability features and a correlation matrix was formed and visualized for all of them.

It is defined as the ratio of different unique word stems to the total number of words. The term is used in applied linguistics and related fields as an equivalent to lexical richness. The visualization below shows comparison of lexical richness of real and fake news for the datasets used.

The visualization below shows number of punctuations used in real and fake news.

Use of personal pronouns 'I', 'me' and 'myself' can give a greater insight into the text. The visualization below shows a clear difference in the average use of personal pronouns in real and fake news.
.png)
N-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. Mean number of trigrams found in 3 datasets under 2 categories.