Thursday, October 18, 2012

Literate Randomizer

Yesterday I was seeding my database for a new Rails project. This project has profile pages with large free-form text fields. In order to get a good sense of how the site layout feels with real data, I needed some decent fake profiles. The existing tools we were using just weren't cutting it.

There are several gems available to help with this task.

Here is a quick-link to Literate Randomizer's readme and source.

Faker

Faker is great for generating fake names and addresses, but when it comes to generating fake prose, it follows the Loren Impsum model of obvious nonsense. This was not going to cut it:

Mollitia id adipisci. A voluptatum et veritatis quas enim veniam consectetur. Id perferendis quia eum et. Rerum consequatur harum quisquam impedit sit et.

Qui est qui autem inventore esse laboriosam. Vitae expedita numquam eveniet repudiandae voluptatibus est in. Ratione eos maxime ut debitis illo. Voluptatibus ea molestiae et quas non est.

Et ut distinctio qui nihil consequatur odio. Labore voluptas magni sapiente totam. Sed sed est nobis quia temporibus pariatur. Nihil accusamus molestiae qui occaecati ut. Sed consequatur nobis quis consequatur rerum.

random_data

RandomData was next. RD did generate some more convincing prose, but if you glance the source you'll see it is simply randomly selecting from 23 fixed sentances

Soaring through all the galaxies in search of Earth flying in to the night. Every stop I make I make a new friend; Can't stay for long just turn around and I'm gone again. Hey there where ya goin, not exactly knowin'. Excepteur sint occaecat cupidatat non proident sunt in culpa qui officia deserunt mollit anim id est laborum. He just keeps on movin' and ladies keep improvin'.

Lorem ipsum dolor sit amet consectetur adipisicing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. He's going everywhere, B.J. McKay and his best friend Bear. Ulysses, fighting evil and tyranny with all his power and with all of his might. Ulysses, fighting evil and tyranny with all his power and with all of his might. Every stop I make I make a new friend; Can't stay for long just turn around and I'm gone again.

Literate Randomizer

If you want to generate a lot of random prose, neither of the above options work well. With a little googling I ran across this post by Tim Riley. In it he shows some code for using Markov-Chains to generate plausible prose generated based on a source text. Using that source text, we can get a probability distribution of which words follow which words. Given that we can generate some near-english truly random sentences.

Note: RandomData has a Markov tool, too, but it is very limited. I found another gem that just copy-pasted Tim's code into a gem-file, but it wasn't properly packaged nor usable.

I was bummed to realize Tim hadn't made a gem of his wonderful little insight, so I thought I'd dive in and do it for him. I thought I could just copy Tim's code, give him credit, and release a nice clean gem. Turns out it wasn't that simple, or perhaps I just had too much fun with this problem. Another problem with all the tools above is they don't give you much control over the randomly generated output. I wanted control over the length and number of sentences as well as the number of paragraphs. I wanted interesting punctuation. With Markov chaining, you end up with a lot of sentences that end in prepositions (the, and, of, etc...).

The result, is a nice little gem called literate_randomizer. Here is a simple example. For more, please go to the literate randomizer source/readme.

 require 'literate_randomizer'  
 LiterateRandomizer.word  
 # => "frivolous"   
 LiterateRandomizer.sentence  
 # => "Muscular arms round opening of sorts while Lord John Roxton."   
 puts LiterateRandomizer.paragraphs  
The last line outputs:

Memory at the narrative may have already as I could be reflected upon. Society in a dozen. Headquarters for the stream. Stream which if we. Iguanodon's hide and admirably the right line of Project Gutenberg-tm electronic. Forgive me down hull-down on every detail which! Until they were. Mantelpiece in front of a necklace and that same scene. Eighteen of a creature beside his white merchant or hope.

Rain must be. Falls the paths which may have opened out we limped homewards sadly mauled and no. AGREE THAT THE CHAIRMAN. Twentieth century and horrible havoc. Train of Crayston Medal for canoes we are the scent and they. Swooped something which had a purpose than the envelope but if this. Entirely sir why this book. All these creatures that on through the rope dangled down the dinosaurs. Exhaust of the vertebrate stage of them and it began amid shouting gesticulating. Worthy of the old volcanic vents and distribute it away and under the prevailing. Assured that they took into line of the cliff. Unloosed a frugal breakfast of it run to the chasm which with. Loosened our thwarted wishes of dark until they took one's heart to win through. Cotton jacket which you can exhibit to gain a refund of hers outlined.

Novelist could have a jacket which communicated with its wing. Colliery explosion and stroking. Town was justified if ever moved upon his remarks had returned to bring. XIII A Sight which the plateau and many small Japanese tray of fairyland. Sadly mauled and flitted about that all turning these lumps of such. Cutting a sentiment. Grieved for us give. Whites our largest I should be confuted if animals then the prospect of the risk? Camberwell tram with a sheet of language is our zareba.

LiterateRandomizer.paragraphs options:
  • :first_word => nil - the first word of the paragraph
  • :words => range or int - number of words in sentence
  • :sentences => range or int - number of sentences/paragraph
  • :punctuation => nil - punction to end the paragraph with
    (nil == randomly selected from punctuation_distribution)
  • :paragraphs => range or int - number of paragraphs
  • :join => "\n\n" - join the paragraphs
    If :join => false, returns an array of the paragraphs