Debugging The Bestseller Code

Last year, there was a onetime flurry of attention over a website called I Write Like. Everyone was delighted to throw in their own manuscript paragraphs and see what famous author-names came out—although the fun deflated when it was noticed that nobody writes like women. Today, on a whim, I entered the first section from Damned if You Do.

I write like: Anne Rice.

Yeah, I'll accept that. Plus, it means more female authors have been added to the database! (Though Virginia Woolf still writes like James Joyce. Nobody tell her—she'll be pissed.)

Today, thanks to Tumblr user myonetrupassion, I discovered a website called The Bestseller Code, which promises to tell you how commercially viable your work is. Based on sentence length and word complexity.

Oh, really? I think. And cracked the knuckles on my typing fingers.

To begin, I put in that same excerpt from Damned if You Do and selected "romance" as the genre. And what came out was this:

It seemed like a fair cop. I like the big Latinate words, and I throw them around like they're going out of style. But then I looked at the list of complex bolded words in red and had second thoughts. Along with "proficient," "dossier," and "unparalleled," here are some of the words that are considered complex:

  • lovingly
  • preparation
  • criminals
  • naturally
  • every
  • recently
  • separated
  • punishment
  • inflexible
  • certainly
  • somehow
  • understanding
  • anything
  • defense
  • soldier
  • amazement
  • imagine

This is boggle-worthy. I can't make myself accept that the word "lovingly" is too complex for the romance genre. And where would mysteries, thrillers, or romantic suspense be without "criminals"?

Just how do proven commercial successes fare against the algorithm, you ask? The answer: not well.

The famous, brilliant opening of Jane Austen's Pride and Prejudice initially gets a 15.5 (as both literature and romance). When I switch to an excerpt of the scene where Darcy and Elizabeth are discussing his reticence (the "we neither of us perform to strangers" scene), the score for both genres goes down to 15.4.

An excerpt from Breaking Dawn (note: super-hard to find Twilight excerpts online): 15.6.

The opening pages of The Hunger Games (genre: YA) gets a measly 10.7.

The opening of Genesis, from the Bible, the best-selling book of all time (genre: Literature, probably): 7.4.

And—this really was my favorite one—super-mega-bestseller The Da Vinci Code scored only 14, with slightly more word complexity than the average thriller. (I am avidly curious to know what texts they used to generate these "average" numbers.) Bolded red words from this excerpt include: telephone, hotel, visitor, darkness, evening, and probably.

Ultimately, my problem with the Bestseller Code is not that they guarantee their formula will bring commercial success. They know better than to offer such a guarantee. They also admit that they are stretching the definition of "complex" to pretty much mean just "multisyllable." (Which—argh! Words have actual meanings! Length is not the best measure of complexity!)

But—and here's my objection—they suggest their algorithm can be a useful tool for revision. To quote from the website: "Paragraphs littered with red words should be revised to improve readability even if the individual words themselves are not particularly sophisticated."

In other words: let's make sure we have plenty of small words to give the long words breathing room. Let's never use the word "telephone" and "hotel" in the same paragraph because readers might be confused. The great minds behind the Bestseller Code apparently live in a nightmarish world where telephones do not come standard in hotel rooms.

I strenuously object to this flattening-out of vocabulary as a hard-and-fast rule. English is a rich and thieving tongue, full of stolen words and shifting definitions. It is a mutant, cannibal language. There is no need to fear complexity as such—especially not in a world where the dictionary is a keystroke away. But then, I'm a fan of Austen and Dickens and Melville, of Joyce and Calvino and David Foster Wallace. When I find a word that I don't recognize, it's an exciting moment, like discovering a new species of beetle in the backyard.

As writers, word choices define us. Hemingway told stories like this: "For sale: baby booties, never worn."

Joyce—sometimes—told them like this: "And as no man knows the ubicity of his tumulus nor to what processes we shall thereby be ushered nor whether to Tophet or to Edenville in the like way is all hidden when we would backward see from what region of remoteness the whatness of our whoness hath fetched his whenceness."

And, so far, quite happily, I tell stories like this: "Idared was proficient in the use of all the correct torture implements for a demoness of her rank, but with the whip she was an artist of unparalleled caliber."

They can have my complex words when they pry them from my hyperborean, moribund appendages.