Ngrams as Paranormal Research Tools

This started when I watched the recent TED talk about Ngrams (or N-grams).

Tip: Unless you want to sit up half the night with “ah-HA!” ideas flashing like neon signs in your brain, do not watch new TED talks right before bedtime.

At some crazy-early hour this morning, I was awake and at my computer.  Let’s start with the first graph I created.

Ghosts 1800 - 2000

That shows me two surges of interest in ghosts, between 1800 and 2000.  (The terms “haunted house” and “ghost hunting” don’t really have an impact when I’m looking at this broad time frame.)

If you’re blinking and wondering why this is so cool, here’s the geek goodness:  Those spikes tell me when the word “ghost” was most popular in published books.  (It’ll also apply to periodicals.)

Since much of my research is based in historical records — looking for supporting information for (or the folklore roots of) “ghost stories” and reports of hauntings — this tells me where to look for the largest amount of information related to ghosts.  (If I’m going to be digging through dusty old books, magazines and newspapers, I want to be fairly sure I’ll find lots of information… not just a few blurbs here & there.)

So, let’s narrow it some more.  I searched using a narrower time frame, based on what looked like an 1895-ish spike in the graph, above.


That tells me that my time is best spent looking in records with copyrights between 1895 and 1900. If I am choosing among several resources — and especially if I have a limited amount of time at that particular library — those are the years to focus on.

Now let’s look at more modern resources.

In this search, I added the plural (“ghosts”) to the search, just to see what happened.

Ghost and ghost trends 2000s

“Ghost” out-performs “ghosts” throughout that time period.

That means each of most of the references are talking about a single ghost… not ghosts in general.  That is, they’re saying, “the ghost,” and describing a single entity rather than a generic reference to ghosts in general.

To me, that looks like more people are telling first-person stories, or recounting folklore, as opposed to talking about a group of ghosts at a haunted site.  (For example, the difference between when I describe a single ghostly encounter at The Myrtles Plantation or the Falstaff’s Experience, as opposed to talking about their ghosts, in general.)

Of course, I’d need to do more research into the Ngram trends to be sure of that.

The following was my final Ngram search before writing this article.

Ghost - ghosts - haunted -2000s-ngrams

In that search, I changed some of the words.

What I see that people use the word “haunted” about as often as they talk about “ghosts” (plural)… but nowhere nearly as often as they use the word “ghost.” I’m not sure what that indicates, but it’s interesting and a little odd.

(Note: The phrase “ghost hunters” is pretty much flat-line at the bottom of the graph.  Interesting.)

At the current time, Ngram searches only include books through 2008.  So, we can’t yet use Ngrams to study recent trends.  (I’d also argue that the emergence of Create Space, et al, will skew more recent numbers.)

So, what have I learned from this…?

Ngrams can be used to identify what time periods to focus on when I’m in dusty libraries (real or virtual), conducting historical research about paranormal topics.

Though that may interest only the most hard-core research geeks (like me), I think it’s a resource to keep in mind.

Let me know if you have additional ideas of using Ngram research, and if you find anything quirky and interesting in your own Ngram searches.

Here’s the Ngram search URL I used:

[If that URL isn’t working — and, as of November 2016, it seems broken — try this one: ]

Here’s the TED talk I referenced:

Here’s a more recent TED talk on a similar topic:

