Blog  |   Puzzles  |   Books  |   About

Five Points

April 1st, 2010

The last few days, I’ve been obsessing over the island of Manhattan (which I’ll be visiting briefly in May), and Google Street View. I’m currently working on a few Street View related hacking projects in my spare time. Here’s one of them.

I wrote some scripts that did an exhaustive search of the Street View information for Manhattan, and found that there are exactly 10 intersections which Street View treats as 5-way intersections on the island – in reality there are more 5-way intersections, but the ones shown here are the ones Google Street View knows about. The rest of the intersections are 4-way or smaller. I call this image “Five Points” to recall the notorious 19th century intersection depicted in Gangs of New York.

Five Points

These panoramic images were taken by the Google Street View camera. I’ve also constructed some interesting high-speed time lapse movies by using sequences of these photos that follow the car. You’ll find some others, made by other people, here.

Krazydad, circa 1996.

March 26th, 2010

My nephew Ben, a budding computer programmer, found this old picture of me in a book about computer programmers. The photo has the following caption:

“Sometimes programmers play computer games when they need a break from their work projects.”

What the author perhaps didn’t realize (the photograph was obtained from the Corbis Stock photography archive), was that the “game” on my screen was my work project…

Benjamin tells me his friends told him “Your uncle’s a creep!” Sigh…

Mining juicy words

March 22nd, 2010

This weekend, I counted all the words on Project Gutenberg. This has been done before, notably, here. My script crawled most of the English language books on Project Gutenberg (about 20,000 titles), and counted how often each word appears, and how many books each word appears in. The script ran for about 20 hours.

You can download the resulting list, which contains over a million words, here. Each line shows how many books each word appears in. A second list, which shows how many times each word occurs in total, can be downloaded here.

I prefer the list that shows the number of books each word appears in. It has the effect of pushing down words which appear a lot in only a small number of books, such as the names of fictional characters.

I compiled these lists because I wanted to make some word puzzles. There are lot of free lexicons, or word lists out there, such as the ENABLE lexicon which is commonly used for scrabble-like games. However, for the purposes of making crosswords, word searches, and other puzzles, it’s very helpful to restrict the words to more commonly used ones, and to know how common each word is.

The popularity number of the word correlates well with how ‘juicy’ the word is, or appropriate for a word puzzle. For example, using my book count list, words at the very top of the list are quite boring structure words.

18374 by
18054 and
18023 the
17994 of
17963 a
17955 to
17946 in
17916 from
17912 with
17909 for

As we head towards 10,000, we encounter most of the common bread and butter words. These are also kind of boring.

15095 case
15094 none
15091 taking
15070 seem
15060 able

10776 buried
10771 report
10767 asking
10767 clean
10764 occurred

As we head from 10,000 to 200, the words get increasingly more interesting.

9781 plainly
9781 flat
9779 proofreading
9777 passion
9775 approaching
.
.
.
5999 commanding
5998 channel
5997 translated
5996 metal
5996 sixth
.
.
.
1999 conflicts
1999 spider
1999 bleed
1999 discrimination
1998 lends
.
.
.
599 studs
599 niggardly
599 symbolized
599 engraven
599 palliate

There is a sweet spot with a lot of very juicy, but still familiar words in the 300s. If I were selecting words for puzzle construction, this is the area I would favor. After the 300s, the words start to get increasingly more obscure.

359 pajamas
359 dressings
359 thievish
359 anatomist
359 ticks
.
.
.
200 darkish
200 acclimated
200 unfriendliness
200 moveth
200 undiscoverable

At the 200 mark, we’ve only covered about 38,000 words. There are 1,236,759 words in the list total, so we are are still at the top of a very long tail! Below 200, words get increasingly obscure, archaic, misspelled and foreign. We also hit a lot of proper nouns. Still there are a few legit, but rarely used words mixed in.

99 tingeing
99 marshmallows
99 somethings
99 feelest
99 petrify
.
.
.
50 anim
50 makeweight
50 godard
50 seraglios
50 vun
.
.
.
25 admiralties
25 vanni
25 senescent
25 futrelle
25 erechtheum
.
.
.
10 foretime
10 chargee
10 cabinetmaking
10 pneumonias
10 olivo
.
.
.
5 guisers
5 hairing
5 hipless
5 turms
5 arpasia
.
.
.
1 raskolink
1 baetan
1 succories
1 denudement
1 trotudas

UPDATE, March 23rd:

I measured the average book-count of the words in all the New York Times crossword puzzles since 1997 (their online archive goes back to about 1996). For each puzzle, I averaged the book-counts of the words that appear in my list (typically, about 80%-90% of the words in each puzzle). For most years, the average book-count falls between 2,008 and 2,180, and from year to year, the results can be surprisingly consistent.

Here are my averages:

1997 2070.93
1998 2154.20
1999 2113.24
2000 2180.20
2001 2131.94
2002 2141.65
2003 2115.29
2004 2114.60
2005 2034.01
2006 2026.20
2007 2035.31
2008 2033.90
2009 2008.76

There appears to be a marked shift towards more obscure words from 2004 – 2005.

Interestingly, there are only about a thousand words that fall in that NYT-Crossword sweet spot. Here they are:

pondering squarely pregnant paws scold cordiality cooler venturing variance hypothesis forefinger economic untimely dubious shepherds secular minimum pallor degrading fastidious desertion foretold heath discourage wintry wrenched peas raiment pensive reproof ankle flattened moore fisherman peninsula beholding identification wheeling maine unhappiness richmond frantically enhanced gorge extremities joyously stronghold hissed nut bowels repressed lending feasts cavern unfold memoirs onto invade ark structures forbids liver correctness abashed stumble clerical orchestra terrifying enchantment incomparable collapsed paler ballad recalls slack restraining motley rippling circled ardor lambs flapping shrug prettily avarice aforesaid educate glorified acquiescence acquitted dungeon blasted objective persuading fray forts statistics gathers levelled moderately splashed mirrors infected vacancy furs mates grating precipitate confiding ton grazing dispositions partnership momentarily framework attorney regulating fathom nimble ravages surpassing quieted hitting sustaining practiced darkening walled withdrawal unawares exceptionally howard fiend queens horseman dictates quarry waged coral pleasanter badge assurances subsistence italians manned alphabet bower reposed preachers variously anticipating arabian melodious slate hourly bled dejected dreamt discordant stormed purchasing sap unreal parlour dam couples humblest postpone butterflies chaps yells paw freeze forfeit eclipse advertisements dozens quitting romances uphold drunkenness agonies guinea forge tearful twig dispatched windy tidy bitterest dogged wastes disconcerted irritable tunnel contentedly backing uniforms gunpowder mineral pigeons repel pail territories ransom stab draped redemption individually medicines azure bony scissors ma invariable supplement repulsed entreaty capitals forbearance adviser unavoidable raining enlighten holiness countenances untold coil mutilated dancers thankfulness buzzing armor spoiling narrower adhere ardently undergoing indomitable devoting friction thrive ravine diverse floats hazy twain aspire visage quarrelling womanly shields initiative disappointments elaborately civility disobedience splashing festivities disasters bustling vicissitudes monopoly helen raid marshes fitful consigned illustrates apprehensive conscientiously fabulous colleagues profited wharf grievances countryman laurels diversity monastery target pounded conspicuously myriads hostilities atrocious vase overturned redoubled mountainous swallowing layer adherents sparing parchment trampling imaginations laughingly fictitious jet widows picnic prospective valour absorb yankee chocolate courtiers canoe chasm biscuit stairway jars adjustment ancestral roving catholics psychological milder adapt woollen loathsome rowing barracks signing banker grunted slumbers garret midsummer ignoble savings substantially resuming fostered mane prophesied forfeited swan loosed fortifications gloriously vouchsafed oratory jovial crescent stinging stamps commissions lanterns caresses merest universities insurance draughts surmise rebuked valid barbarians revolted humbled emerald contradicted halfway marvels excel nervousness pier stall illustrating grades surly utensils chagrin colouring murders northwest widening pitiable keener kent devilish conventions carving studded mat dwarf weights youngster compels resounded dispatch fried completeness dismissal undecided aiding dimmed plied illumined extensively needing graphic embroidery glimmering sash sauntered sniffed grasses pitcher rapt unerring offences exiled sucked raced fig streaks halo religions rhetoric advising fraught canadian hampered riders profile incur excellency benediction gregory particles diminutive chemistry infants lounging knocks elated mien propped reverent antagonism wade exhaust unduly needy girlish hoarsely mortified hercules initials scar flowery reproduction absorption excelled stains facilitate modify slap grounded wig lavished magnified agility hugh sponge irishman cultivating stalked fumes metals arena augmented enjoined fibre flushing biscuits attends nick soaring follower boom surest rhine proclaiming snatching paramount alluring clambered loom poultry intoxication slaughtered perplexing impaired sleek patting conceited squirrel inventor notably swells ripened click ethics fairies adventurer summoning vocal jove scolded dwellers uniformity sarah prairie capacities unfriendly uttermost hens gear penance unbearable sewed legion disposing mistook prestige organic unparalleled invaders laboriously trench steeped distraction dipping groped slackened beak salutary summits intrusted inanimate flowering reiterated receding jagged adversity safeguard unacquainted stalks axes alps hip mortality perverse apathy weighs julius witnessing epithet childlike lunatic pretends convict oblivious restlessly yarn offense chests runaway dilapidated unfailing verdure cloudless ferry vista toll prettier unearthly enlist feudal penitent scarf encamped dedication mahogany relinquish residents salmon payments meditations tragedies sufferers concludes arnold smoky altars squadron pursuers sagacious abnormal bernard reeled strangled cherry planets combatants bunches feathered fearlessly therefrom canst precipitated likelihood potato conquests intensified columbus hairy slapped scrupulously immemorial buoyant graver warranted senator excesses invading complimentary turks highness factors vindictive shovel tenderest uncanny augustus propositions detection efficacy artful iniquity emancipation listless indolence lease purified grease unoccupied encounters treasurer hereby narrated revel impetus legislative wailed mexican disappoint impertinence abstraction pulls submissive surged falsely sheriff wilder underwent submitting prisons implicitly treasured sculpture spheres trailed impassioned exacted converts pepper coloring noiseless conflagration relatively maddened precincts versed quartered culprit tunes torments birch fairness unsteady terminate offender citadel ado compiled

Mayor of the North Pole

February 15th, 2010

[NOTE: I’ve posted some recent developments at the bottom. ]

I’ve been blatantly cheating at foursquare for the past week. I didn’t mean to start the week this way. Most of my friends know me as a responsible father who occasionally plays piano at local open mics, and makes puzzles.

Last Sunday, while checking into the Hill Street Cafe in Burbank using the foursquare iPhone app, I idly wondered, “Can I become the mayor of the North Pole?” So I tried checking into a nearby 7-Eleven. It worked. I tried the Griffith Observatory about 5 miles away. It worked. I tried Disneyland, which is about an hour away. It didn’t work, but I now had an afternoon hacking project.

When I got home, I looked to see if foursquare had an api. They did. So I found a venue that was close to the North Pole, the “Top of the World” hotel in Barrow Alaska, and checked myself into it.

This can be done on the command line using the curl program, like so:

curl -u EMAIL:PASSWORD -d “vid=993842” http://api.foursquare.com/v1/checkin

Try it! You’ll need to substitute in your own email and password. 993842 is the venue id of the “Top of the World” hotel, as can be seen in the URL of this page:

http://foursquare.com/venue/993842

This venue wasn’t actually in foursquare’s database, so I added it, using the ‘addvenue’ call. I also added a venue for the actual North Pole. It turns out it’s much easier to become the mayor of something if nobody else has ever checked into it.

[ Edit: Some folks have rightly pointed out that you can easily do the same thing with the mobile website (mobile.foursquare.com). For my purposes, as you’ll see in a moment, the API was more efficient… ]

Here’s the North Pole venue I made:

http://foursquare.com/venue/995274

Ultimately, I ended up adding a lot of venues. I used Google Earth to create KML files of interesting venues, and wrote a script to import them all into foursquare. I did the same thing with Yelp. I found that foursquare would rate-limit me if I added them too quickly, so I added them two and a half minutes apart. Later, I found that by rotating among multiple accounts while adding venues, I could add them much more quickly.

At some point last week, I devolved into a 12 year old hacker, and I spent many spare hours (and my computer’s spare cycles) abusing the system with a set of scripts operating fake accounts. Not only did I add new venues like the North Pole, but I started persistently checking into coveted landmarks, like the Statue of Liberty.

What can I say? It was fun, and foursquare’s incentives (badges and mayorships) spurred me on. Incentives invite abuse, even from mild-mannered folks like me.

Eventually I amassed a huge number of mayorships, spread among multiple accounts, including the Statue of Liberty, Mount Rushmore, the Lincoln Memorial, Stonehenge and the Taj Mahal, as can be seen in this screen snapshot.

I wrote a script that would walk through a list of venue ids, and check into them one by one. Then I created about 10 fake foursquare accounts, and had them take over different territories.

I created five “Java Monkeys” which grabbed about 120 different Starbucks in different regions (east, west, midwest, south, intl). I identified and targeted hotly contested Starbucks by searching Twitter for recent oustings. My script automatically visited those ones, to the consternation of the new mayors.

I created a fake Martha Stewart who checks into dollar stores and pawnshops when not visiting Martha Stewart Omnimedia and the set of her TV Show.

I created a fake Simon Cowell who visits massage parlors and gets lunch at Hotdog on a Stick when not visiting the Kodak theater.

I created a fake Tommy Chong who is mayor of 130 cannabis clinics.

I created a fake Sammy Davis Jr who checks into casinos and bars in Las Vegas.

I created a “random nerd” who checked into a number of large campuses in the Silicon Valley.

The “Java Monkeys” got the biggest reactions. Foursquare users get far more irate when they lose mayorship of a Starbucks, as compared to a Statue of Liberty or Mount Rushmore. People are much more attached to the small places they visit over and over, and have some personal investment in. The smaller the venue, the bigger the value.

I started collecting badges as well, by checking into places that have tags like “karaoke”, “photo booth”, “gym” and so on.

I was able to get a swarm badge by monitoring Twitter for when a particular location got up to 40 check-ins (this happens at a couple of Tokyo train stations quite regularly) and then checking-in all my accounts at once to trigger a swarm (which occurs at 50 check-ins). This RSS feed is useful for detecting impending swarms.

Finally, I started giving people free sailboats. I found that if you checked into a venue tagged “boat,” you automatically get the awesome “I’m on a boat” badge; and unlike the other badges, it only requires a single check-in. So I started identifying high-traffic places via the above Twitter search, and then adding the tag “boat”. Suddenly, visitors to metropolitan airports and various sports arenas got free sailboats for Valentine’s Day.

My juvenile crime spree is now over, and I’ve “laundered” my foursquare account, by transferring the credentials to a new one. This URL used to go to the account that stole the Statue of Liberty, but now it goes to a new account, because foursquare allows you to reassign twitter accounts, and constructs the URL using your active twitter account.

This is my original account, which is now inactive.

It seems clear that foursquare is going to have some massive authentication issues to deal with if they are going to grow larger than their current size. Some things to consider:

1) Provide additional measures to detect that people actually are where they say they are. I imagine this is not an easy problem to solve: if I send you a set of coordinates, it doesn’t mean I’m actually there. At a minimum, they can measure the time of travel between successive check-ins by comparing the coordinates and time stamps. If I’m traveling close to the speed of sound, something is clearly up.

2) Make it less easy to create fake accounts. Right now, there’s not even a Captcha.

3) Don’t construct a permanent-looking URL from a twitter account (which can be transferred to a different foursquare account). This provides a method of “laundering” accounts.

More generally, I think the combination of a poorly moderated and insecure folksonomy with incentives (e.g. badges, mayorships, free meals, etc.) is a fragile one. The greater the incentives, the greater the motivation for cheating.

As it stands right now, foursquare has quite a few holes. If I were a restaurateur or coffee shop owner, I would be very wary of giving free meals or lattes to foursquare mayors, unless the employees know the mayor by sight.

UPDATE

My story seems to be getting some picked up in a few places. Here’s some reaction on Twitter. Mostly positive, I think, although a few foursquare insiders were a bit put out, as one would expect. Dennis Crowley was quite nice about it, thank god.

If I stole your Starbucks, I’m really sorry about it, and I will gladly buy you a latte, if you find me in a Starbucks.

UPDATE #2: My story was covered on TechCrunch this morning. MG Siegler was mostly on-the-money, except for this bit:

The problem, with regard to false check-ins, is that the only solid way to do this is to a check-in to your actual GPS coordinates. The problem with this, as Gowalla knows all-too-well, is that it can be hard (and in some cases impossible) to get GPS data while users are indoors.

Um, not exactly. The problem is that you can’t trust the person who’s sending GPS coordinates to send the correct ones. This is a tough, tough problem, and it will become increasingly obvious as incentives increase.

UPDATE #3: Foursquare founder Dennis Crowley has provided some thoughtful commentary in the comments, below.

UPDATE #4: The LA Times interviewed me and got a few more details…

UPDATE #5: Alison Cummings of the Montreal Social Media Examiner posted this reaction to the whole brouhaha. I’m going to call her “perceptive” because she called TechCrunch’s tone “whiny”. :)

The Griddle

February 4th, 2010

The Griddle is a beautifully designed puzzle site by David Millar. It will especially appeal to more advanced solvers who are bored with the same-old same-old.
You’ll find a new puzzle variety, in PDF format, nearly every day, including some interesting variants on Sudoku, Kakuro and Slitherlink.

Check it out!

Two Crossfigure Puzzles

January 12th, 2010


I received some interesting hand-made Crossfigure puzzles from Israeli puzzle constructor Yochanan. He says he learned the technique of making these puzzles when he was “still a fairly young man, about seventy or so,” from L.G. Horsefield.

These numeric crosswords are similar to Kakuro and KenKen puzzles, but have greater variety in the clues. Here’s two to start with. I hope to post Yochanan’s complete set at a later date.

Crossfigure Puzzle #35 (pdf)
Crossfigure Puzzle #39 (pdf)

Gumbasia

January 9th, 2010

via Laughing Squid, which eulogized Art Clokey today

Roanoke Times

December 28th, 2009

I was quoted in today’s Roanoke times, in an article on computer-generated puzzles.

Roanoke Times Link

For more information about algorithmic puzzle construction, check out this article I wrote for MungBeing magazine.

Powers of Ten

November 20th, 2009

An oldie-but-goodie – the film “Powers of Ten” by Charles and Ray Eames.

Incremental Drift on the Riemann Sphere

October 24th, 2009

A Whitney Music Box mathematical mash-up from Daniel Piker. He describes it as follows:

“Take 1 large ‘Whitney Music Box’ and whisk together with ‘Mobius transformations revealed’. Add a sprinkle of ‘Indras Pearls’, the juice of a fresh Riemann Sphere, and stereographically project at 200C until crispy…”

link