A bit of background about Markov Chains might help for a start.
A while ago, I stumbled across a bit of Python for generating Markov Chains here. Basically it takes a large body of text (in this case, the King James Bible from project Gutenberg (although for my purposes I stripped out all of Gutenberg’s preamble, licensing details, book, chapter, and verse numbers etc. leaving just the text)) and from a random starting point of two consecutive words in the base text, randomly pick a word that follows those two words. (So from the Bible example, ‘In the’ might be followed by ‘beginning’, or by ‘sweat’, ‘day’, ‘six’, ‘selfsame’, ‘same’, ‘selfsame’, ‘mount’, ‘cave’, ‘tenth’, ‘first’, ‘third’, ‘tabernacle’…)
Now, I just saved the code from the Agiliq blog as
markov.py and in the same path, save the Bible text as
KingJamesBible.txt and also the following Python code as
""" Uses markov.py to generate tweet-length texts from the Bible (AV). """
from markov import Markov
def __init__(self, url):
"Generate the markov chain stuff first."
file_ = open(url, 'r')
self.mkov = Markov(file_)
"Trim the tweet to the start of a sentence and end of a word."
twit = self.mkov.generate_markov_text(100)
while twit not in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
twit = twit[1:]
twit = twit[:140]
while twit != '' and twit[-1] not in ';:?.,!':
twit = twit[:-1]
if twit == '':
twit = self.tweet()
if twit[-1] in ';:,':
twit = twit[:-1] + '.'
if __name__ == '__main__':
m = MarkovBibleTweet('KingJamesBible.txt')
Now this code actually “does the business”, generating the tweet and trimming it to fit into 140 characters. So how do we get this actually posted onto Twitter? The answer lies with tweepy.
To be continued…