BLOG: Synthesising a Bestseller

If I were to tell you about a machine designed to take the basics of a story, then spit it out in whatever genre you chose, you might think I was talking about the computer algorithm which tried to write another Harry Potter book, with delightfully hilarious consequences. Or you might think I was talking about the programme which analysed all the components of every bestselling novel, and attempted to write its own using these techniques. Or perhaps even the code that my uncle wrote to attempt to cheat the NaNoWriMo wordcount bot, which just wrote

“It was a dark and stormy night. The Captain said to the First Mate, ‘go on, tell us a story’, and the First Mate said, ‘It was a dark and stormy night. The Captain said to the First Mate…”

over and over again until it hit fifty thousand words.

You’d be wrong on all three counts. (Although my uncle’s code unsurprisingly did manage to cheat the wordcount bot, and provide him with unparalleled delight at his own joke for several years afterwards)

The machine in question was imagined by Lewis Carroll, writing anonymously for an issue of The Comic Times in 1855. Carroll tells the story of a wonderful process which, when given the basic bare bones of a story, can then rewrite it in different styles by developing it in chemicals, akin to developing a photograph. The work wasn’t republished until it appeared in a now out-of-print collection of Carroll’s unpublished works, The Lewis Carroll Picturebook, so isn’t well known. I only came across it when helping to proof Dr Jonathan Potter’s upcoming book, Discourses of Vision in Nineteenth-Century Britain, which focuses on the history of photography and visual technologies in relation to literature through the Victorian period.

Carroll wrote the piece, which poses as a newspaper report on the newest novel-writing technology, in response to the new development of photography, noting that art had now been reduced to “the merest mechanical labour”. The narrator details how, with the application of different chemicals (as in photographic development), a story is taken from a limpid and unsellable state through various different styles. The report ends with a couple of delightful digs: one at dull Parliamentary speeches which could use a run through the process to make them more interesting, and another that suggested using the machine to turn a passage of Wordsworth into “strong, sterling poetry” – although Carroll warns that applying the same process to Byron set the paper alight.

Carroll’s article was a satire, imagining how hilariously improbable it was that art could be made without the input of the human imagination, but somehow the idea of machine-created art still fascinates people. I wonder what he would have made, for instance, of the mathematical precision behind the machine that compiled the data on bestsellers. As a mathematician, would he have been impressed? Or as a writer would he have been appalled?

There is some fascination with trying to quantify the ineffable, with trying to mathematically pinpoint the quality that grabs the public imagination and creates the Next Big Thing. The algorithm which identified the key parts of bestsellers was part of a research project to see how data could inform writing that sold well. The resulting book, The Bestseller Code by Jodie Archer and Matthew Jockers, was released in 2016, but they weren’t the first to bring computers into it. In 2014, a paper by Vikas Ganjigunte Ashok, Song Feng and Yejin Choi, studied the writing style of books to see if there was any correlation between language and success. This project in particular looked at books in the public domain, utilising Project Gutenberg to compare language vs downloads, whilst Archer and Jockers’ work looked at novels published in the 30 years prior, providing a different and perhaps a more relevant sample for today’s writers, especially considering the 2007 attempt to submit Jane Austen’s work to publishers and agents apparently fell flat. (There are some further issues with this staged investigation which I could discuss at length – it’s likely that many of these publishers did not accept direct submissions, so he was perhaps lucky to gain any response at all. Equally, it seems like some of the responses came from publishers whose lists did not fit his submission, so did he just send to all and sundry regardless of what they actually published? Thirdly – I cannot imagine any agent or publisher going to the effort of calling out obvious plagiarism, as it feels like that would open a can of worms. A polite, non-committal rejection seems the safest bet all around to avoid offence. But that’s a whole other discussion.)To add to both of these studies, as if on schedule (almost as though research into this has to be conducted biennially), another study published just this year looked at entries on the New York Times Bestseller List between 2008 and 2016.

So what does this all mean? Reams of data are available on the language and style of bestsellers, but can such things be understood in such a way to guarantee the sales success of future works? For the moment, too many other factors remain too difficult to predict – not least changing public tastes – and so such a premise remains out of reach. But publishing’s business model is a loss-recuperation one. Large investments are made upfront, with no guarantee of return. So if there were some way to ensure that no losses were ever accrued on a book, it would be snapped up – it’s likely the search will continue.

Given the garbled mess that was the AI-written Harry Potter novel, perhaps there’s a reason computers haven’t been given solo control over creative endeavours. In discussions with my brother, who is studying Artificial Intelligence, he explained to me that the likely cause of the inevitable robot apocalypse would be bad programming. Specifically, programmers not making allowances for the kinds of human knowledge we take for granted because they are so built into the way we live. The example my brother gives is asking robots to lower the average number of human deaths per year over the next fifty year period. Conceivably, an AI would do the calculation that the easiest way to avoid future human deaths would be to kill all current humans – this would lead to an upfront rise in deaths, but would result in the prevention of deaths in future years and so would radically decrease the average number of annual deaths across a long future period. After an initial rise in deaths, future human deaths would drop to zero, because there would be no humans to die. This sort of machine logic has been seen again and again, where boundaries have not been specified because programmers took them for granted – there is a delightful twitter thread about times machines have taken unexpected approaches to problem solving which shows exactly this.

But does this mean that there is no way for machines to write bestsellers? I don’t think so. Don’t forget the Japanese AI which wrote a novel and was shortlisted for a literary award. Was this AI any better than the Harry Potter AI? Or were the judging panel just fans of garbled prose in the vein of James Joyce? The answer is neither – the piece was a collaborative work, with humans monitoring and setting certain criteria to prevent the AI from taking its own route through the narrative unhindered by concerns for anything like language rules or readability. This is not too dissimilar to Carroll’s process from his story – the fiction is extracted from the head of a human, and then ‘developed’ to turn it into fiction. Perhaps writing a bestseller can’t be distilled down to simple maths or data, completed by machines and churned out to tick boxes. Perhaps the human element is what makes a book worth reading.



4 thoughts on “BLOG: Synthesising a Bestseller

  1. Lily @ Sprinkles of Dreams says:

    Ohh wow, this is a super thought-provoking post, thank you so much for sharing! 🙂 I often think about all the different kinds of books I loved, and what they had in common, but I don’t think it’s something that could have been “generated” by a machine by looking at technical aspects of the books.

    Great post! 🙂

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s