Mining juicy words

Monday, March 22nd, 2010

This weekend, I counted all the words on Project Gutenberg. This has been done before, notably, here. My script crawled most of the English language books on Project Gutenberg (about 20,000 titles), and counted how often each word appears, and how many books each word appears in. The script ran for about 20 hours. You […]