Preventing Automated Website Registrations

[1/11] from: tserpa:earthlink at: 8-May-2002 9:36

We run a sweepstakes on one of our websites. Recently we have been hijacked by a site that uses a program to automatically register people for the sweepstakes. The lowly perpetrators are at www.ezsweeps.com - absolute scum. Anyway, I know Yahoo and PayPal use a technology to prevent automated registrations and would like to implement this technology on our site. If you haven't seen it, go to Yahoo and scroll to the bottom of the registration page. In a nutshell, the program generates an image containing some effects and a word within the image. The user has to enter the word in a text box. Since computers aren't able to decipher the word, it's a safe bet that the user is actually a human registering at the site and not an automated program. The program (or some variant of) Yahoo uses is a free Perl program that is available for Linux. It makes use of the Gimp to generate the images. The source is a free download from: http://www.captcha.net/captchas/gimpy/ I thought this would be a perfect program to implement in Rebol. I imagine /View could be used to generate the images, obviating the need for the Gimp. Also it would be cross-platform, not just for Linux. Unfortunately, I don't have the knowledge to do this myself. Can anyone offer some assistance? Ted

[2/11] from: tserpa:earthlink at: 8-May-2002 10:25

We run a sweepstakes on one of our websites. Recently, we have been hijacked by some hacks that run a program that automatically registers people for the sweeps. The lowly scum that does this is at www.ezsweeps.com. Anyway, I know Yahoo and PayPal use a technology to prevent automated registrations. If you haven't seen it, go to the Yahoo registration page and scroll to the bottom. It's called "word verification". Basically it is just a distorted word embedded in an image. Apparently, humans can read the word but computers cannot decipher it. The word must be entered in a text box before the registration is accepted. The program (or some variant of it) that Yahoo uses can be found at the following site: www.captcha.net/captchas/gimpy/ The source is freely available. It is a Perl program that uses the Gimp to generate the images I thought that this would be perfect to implement in Rebol. I imagine /View could be used to generate the images, obviating the need for the Gimp. Also the program would be cross-platform. I would like to do this, but unfortunately, I don't have enough skill in programming Rebol. Can anyone help? Ted

[3/11] from: joel:neely:fedex at: 8-May-2002 14:32

Hi, Ted, Here's a quick-and-dirty solution that avoids the need for any on-the-fly image generation (which means it will run faster, in all likelihood). Thaddeus Serpa wrote:

> In a nutshell, the program generates an image containing some > effects and a word within the image. The user has to enter the > word in a text box. Since computers aren't able to decipher the > word, it's a safe bet that the user is actually a human > registering at the site and not an automated program. >

...

> Unfortunately, I don't have the knowledge to do this myself. Can > anyone offer some assistance? >

Some background: A few years back I wanted to add a fancy-looking hit counter to a page, but without the overhead of the image processing. I created a separate image file for each digit value and a blank (let's call them 0.gif, 1.gif, .... 9.gif, and blank.gif for the sake of discussion). The script would calculate the number to be presented (let's say 2413 for example) and would generate html something like the following: <table cellpadding="0" cellspacing="0" border="0"> <tr> <td><img src="blank.gif"></td> <td><img src="blank.gif"></td> <td><img src="2.gif"></td> <td><img src="4.gif"></td> <td><img src="1.gif"></td> <td><img src="3.gif"></td> </tr> </table> The results appeared essentially the same as a single custom-built image, but required no server-side graphics code. Now, in your case, you could have one (or more) image(s) for each letter of the alphabet, and simply send a sequence of IMG tags, perhaps wrapped in a table, to spell out a human-readable message which wouldn't be visible to a 'bot. Of course, you'd want to call them something besides a.gif, b.gif, c.gif, etc. to avoid giving hints to the bad guys. The point of having multiple images for each letter would be to allow the same human-readable code to be "spelled" out in many different ways, again making it harder for some nasty character to figure out how to write custom code to interpret the html from your site. Hope this helps! -jn-

[4/11] from: pwoodward:cncdsl at: 8-May-2002 17:38

Joel - not a bad idea to eliminate the overhead of image generation. However.... (there's always a however) having used web automation tools like the ones from Orsus (http://www.orsus.com/) the mechanism for easy image generation you describe is "easy prey". For a previous employer I built a cross-registration system for site members. Essentially their business model was to partner with several e-commerce sites, and then get their members cross-registered at them. They wanted it so that their members would only have to enter their user data once, and it would be stored as a super-set of the data needed to cross register them at any partner sight on demand. We used Orsus's tools to do this. It took about a day per site. For an example of a site that uses this type of web automation, check out www.dealtime.com. They use the Orsus product to preform aggregate searching across a whole bunch of e-commerce sites. The "buy it" button at dealtime, actually automates the whole checkout process. In a sense the Orsus product is a "screen scraper" in that it actually browses to the target web site for the user of your site. In reality it's a pretty sophisticated piece of work. As it retrieves HTML data from sites, it converts them (if needed, on the fly) to XHTML. You can then use XQL (XML Query Language) against the page data. In turn, you can use regular expressions on that data. If you are familiar with ASP (VB) or JSP (Java) results of page data could be handed back to your calling script as recordset or resultset objects, respectively. Parsing for the named numbers of a set of generated images would be the work of about 15 minutes with a tool like this. It might take longer with some of the free screen scraping Perl libraries - but not much. In short - always generating an image named "imagecode.png" or something would be better - especially if the contents of that image are generated on the fly. That way, the image name stays the same, and gives no clue as to the content of the image. While the cost of generating that image may be expensive - the effort and computation required to automate interpretation of that image is more expensive still. A possible extension to security might be to have the user save that image to their own hard disk. Instead of using a standard username and password to login - maybe use a multipart form, with an upload field... Everytime they want to login, they upload their image... Again, there would be a computing and bandwidth cost, but there's also the question of cost from breached security. - Porter

[5/11] from: ingo::2b1::de at: 8-May-2002 21:54

Hi Thaddeus, Thaddeus Serpa wrote: <..>

> In a nutshell, the program generates an image containing some > effects and a word within the image. The user has to enter the > word in a text box.

<..> Is that what you're looking for?

>> save/png %/tmp/tst.png to-image layout [ text "TEST" ]

In the next Release you should be able to send the png file directly, without the need to save to disc. But one important thing to notice: on *nix systems view needs X11, and this is seldom installed on pure servers. I hope that helps, kind regards, Ingo

[6/11] from: gchiu:compkarori at: 9-May-2002 12:18

> Here's a quick-and-dirty solution that avoids the need > for any > on-the-fly image generation (which means it will run > faster, > in all likelihood).

How about creating a javascript function that pops up a number in a new window. The function names, and the submit button could be randomized as well. This would require a client tool that can execute javascript as well .... I'm not aware of any free ones. -- Graham Chiu

[7/11] from: tserpa:earthlink at: 9-May-2002 0:25

Hi Joel, Thanks for the suggestion - I think it has potential. What do you think of Porter's comments? Perhaps, I could keep the image content the same and rename the files on the fly. Do you think copying, renaming, and deleting a file would cost more than dynamically generating an image? Also, what is the point of blank.gif? Am I missing something obvious? Ted At 02:32 PM 5/8/2002, you wrote:

[8/11] from: rebolek:seznam:cz at: 9-May-2002 9:30

Hi, try following code. It's note perfect - it was done this morning. If you're interested I can enhance it somehow. BTW - bug in new view: (and in core too I think but did not tried)

>> help system/words

SYSTEM/WORDS is an object of value: ** Script Error: form-val is missing its val argument ** Where: reform ** Near: form-val pick vals 1 Well, no crash, I have to find something more dangerous ;) And now --- automatically unreadable words (cca 2273 of them) rebol[] words: first system/words word: pick words random length? words text-face: make face [ color: 128.128.128 + random 128.128.128 font: make font [ size: 24 color: black ] size: 10000x10 text: word size: 50x10 + size-text self effect: compose/deep [gradmul (3x3 - random 3x3) (50.50.50 + random 200.200.100) (100.50.50 + random 100.150.150)] ] text-image: to-image text-face effect-face: make face [ size: text-face/size image: text-image effect: compose/deep [blur crop (random 5x5) (size - random 10x10) fit contrast 10 sharpen draw [pen red line (random 10x10 random size)] gradmul (3x3 - random 3x3) (50.50.50 + random 200.200.100) (100.50.50 + random 100.150.150)] ] view make face [ size: 400x100 offset: 100x100 pane: effect-face ]

[9/11] from: joel:neely:fedex at: 9-May-2002 9:34

Hi, Ted and Porter, Ted Serpa wrote:

> Hi Joel, > > Thanks for the suggestion - I think it has potential. What do > you think of Porter's comments? Perhaps, I could keep the image > content the same and rename the files on the fly. Do you think > copying, renaming, and deleting a file would cost more than > dynamically generating an image? >

I'm sure that embedding human-readable information in a single graphic is one of the more secure options. See Ingo's comments for another reason why I was looking for an alternative. The point of having multiple images for each letter was to increase the difficulty of setting up any "web scraping" bots. The effort of building a bot would IMHO be much smaller than the effort of obtaining the information (by a human being) to identify which of many obscure image names corresponded to which letters of the alphabet, especially if there were many images (with random names, of course) for each letter. AFAICT, they'd have to have some human actually LOOK at a good-sized sample of pages and the corresponding html in order to begin compiling the dictionary that the bot would use. Your point about copying to create new, previously unused image names could work well with this scheme, and might be done without having to make up all new images for each case. With sufficiently many images for each letter, randomly used, it might be adequate to copy/rename a few letters every hour. I'm not familiar with the tools that Porter referred to (and the Orsus web site was not very informative -- maybe I didn't find the hidden good parts, but saw only a bunch of markety-rah-rah) but I have done a bit of (legitimate, intra-enterprise) web scraping. It's not that hard IMHO to throw in a few randomly formatted parts of a web page that would make constructing a reliable scrapebot quite difficult. Remember that you could use the "letter block" images for other things than just the password -- page and section headings, bolded text, and pure decoration (there's nothing that says that the letter blocks have to be big and gaudy; they could just look like ordinary text). The more you use them on the page, the harder it would be to build a scrapebot that would be able to figure out which ones were the ones that mattered.

> Also, what is the point of blank.gif? Am I missing something obvious? >

In my original use, it allowed me to blank pad the sequence of digit images with blanks to fill a fixed-size area on the page. In the kind of thing we're talking about here, blanks could be used to separate words, given that HE IS NOW HERE would be a distinct passphrase from HE IS NOWHERE ;-) -jn-

[10/11] from: tserpa:earthlink at: 9-May-2002 10:52

Joel, thanks for the ideas and explanations. They are very helpful. I agree that this solution can be effective and more easily implemented than some of the more sophisticated methods, even though it may not be optimal. I think a further twist could be to make different backgrounds for each of the image files, both for same and different characters. I was told that OCR filters can be used to decipher images. Thanks again, Ted

[11/11] from: joel:neely:fedex at: 9-May-2002 16:30

Hi, Ted, Thaddeus Serpa wrote:

> I think a further twist could be to make different backgrounds for > each of the image files, both for same and different characters. > I was told that OCR filters can be used to decipher images. >

Another variation that I forgot to mention earlier was to make the character images with transparency, then place them in table cells with different background colors (using the hex codes). A separate piece of text could tell the visitor to e.g. ignore letters with red backgrounds. -jn-