Preventing Automated Website Registrations
[1/11] from: tserpa:earthlink at: 8-May-2002 9:36
We run a sweepstakes on one of our websites. Recently we have been hijacked by a site
that uses a program to automatically register people for the sweepstakes. The lowly
perpetrators are at www.ezsweeps.com - absolute scum.
Anyway, I know Yahoo and PayPal use a technology to prevent automated registrations and
would like to implement this technology on our site. If you haven't seen it, go to Yahoo
and scroll to the bottom of the registration page.
In a nutshell, the program generates an image containing some effects and a word within
the image. The user has to enter the word in a text box. Since computers aren't able
to decipher the word, it's a safe bet that the user is actually a human registering at
the site and not an automated program.
The program (or some variant of) Yahoo uses is a free Perl program that is available
for Linux. It makes use of the Gimp to generate the images. The source is a free download
from:
http://www.captcha.net/captchas/gimpy/
I thought this would be a perfect program to implement in Rebol. I imagine /View could
be used to generate the images, obviating the need for the Gimp. Also it would be cross-platform,
not just for Linux.
Unfortunately, I don't have the knowledge to do this myself. Can anyone offer some assistance?
Ted
[2/11] from: tserpa:earthlink at: 8-May-2002 10:25
We run a sweepstakes on one of our websites. Recently, we have been hijacked by some
hacks that run a program that automatically registers people for the sweeps. The lowly
scum that does this is at www.ezsweeps.com.
Anyway, I know Yahoo and PayPal use a technology to prevent automated registrations.
If you haven't seen it, go to the Yahoo registration page and scroll to the bottom.
It's called "word verification". Basically it is just a distorted word embedded in an
image. Apparently, humans can read the word but computers cannot decipher it. The word
must be entered in a text box before the registration is accepted.
The program (or some variant of it) that Yahoo uses can be found at the following site:
www.captcha.net/captchas/gimpy/
The source is freely available. It is a Perl program that uses the Gimp to generate
the images
I thought that this would be perfect to implement in Rebol. I imagine /View could be
used to generate the images, obviating the need for the Gimp. Also the program would
be cross-platform.
I would like to do this, but unfortunately, I don't have enough skill in programming
Rebol. Can anyone help?
Ted
[3/11] from: joel:neely:fedex at: 8-May-2002 14:32
Hi, Ted,
Here's a quick-and-dirty solution that avoids the need for any
on-the-fly image generation (which means it will run faster,
in all likelihood).
Thaddeus Serpa wrote:
> In a nutshell, the program generates an image containing some
> effects and a word within the image. The user has to enter the
> word in a text box. Since computers aren't able to decipher the
> word, it's a safe bet that the user is actually a human
> registering at the site and not an automated program.
>
...
> Unfortunately, I don't have the knowledge to do this myself. Can
> anyone offer some assistance?
>
Some background: A few years back I wanted to add a fancy-looking
hit counter
to a page, but without the overhead of the image
processing. I created a separate image file for each digit value
and a blank (let's call them 0.gif, 1.gif, .... 9.gif, and blank.gif
for the sake of discussion). The script would calculate the number
to be presented (let's say 2413 for example) and would generate html
something like the following:
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td><img src="blank.gif"></td>
<td><img src="blank.gif"></td>
<td><img src="2.gif"></td>
<td><img src="4.gif"></td>
<td><img src="1.gif"></td>
<td><img src="3.gif"></td>
</tr>
</table>
The results appeared essentially the same as a single custom-built
image, but required no server-side graphics code.
Now, in your case, you could have one (or more) image(s) for each
letter of the alphabet, and simply send a sequence of IMG tags,
perhaps wrapped in a table, to spell out a human-readable message
which wouldn't be visible to a 'bot.
Of course, you'd want to call them something besides a.gif, b.gif,
c.gif, etc. to avoid giving hints to the bad guys. The point of
having multiple images for each letter would be to allow the same
human-readable code to be "spelled" out in many different ways,
again making it harder for some nasty character to figure out how
to write custom code to interpret the html from your site.
Hope this helps!
-jn-
[4/11] from: pwoodward:cncdsl at: 8-May-2002 17:38
Joel -
not a bad idea to eliminate the overhead of image generation. However....
(there's always a however) having used web automation tools like the ones
from Orsus (http://www.orsus.com/) the mechanism for easy image generation
you describe is "easy prey".
For a previous employer I built a cross-registration system for site
members. Essentially their business model was to partner with several
e-commerce sites, and then get their members cross-registered at them. They
wanted it so that their members would only have to enter their user data
once, and it would be stored as a super-set of the data needed to cross
register them at any partner sight on demand. We used Orsus's tools to do
this. It took about a day per site.
For an example of a site that uses this type of web automation, check out
www.dealtime.com. They use the Orsus product to preform aggregate searching
across a whole bunch of e-commerce sites. The "buy it" button at dealtime,
actually automates the whole checkout process. In a sense the Orsus product
is a "screen scraper" in that it actually browses to the target web site for
the user of your site. In reality it's a pretty sophisticated piece of
work.
As it retrieves HTML data from sites, it converts them (if needed, on the
fly) to XHTML. You can then use XQL (XML Query Language) against the page
data. In turn, you can use regular expressions on that data. If you are
familiar with ASP (VB) or JSP (Java) results of page data could be handed
back to your calling script as recordset or resultset objects, respectively.
Parsing for the named numbers of a set of generated images would be the work
of about 15 minutes with a tool like this. It might take longer with some
of the free screen scraping Perl libraries - but not much.
In short - always generating an image named "imagecode.png" or something
would be better - especially if the contents of that image are generated on
the fly. That way, the image name stays the same, and gives no clue as to
the content of the image. While the cost of generating that image may be
expensive - the effort and computation required to automate interpretation
of that image is more expensive still.
A possible extension to security might be to have the user save that image
to their own hard disk. Instead of using a standard username and password
to login - maybe use a multipart form, with an upload field... Everytime
they want to login, they upload their image... Again, there would be a
computing and bandwidth cost, but there's also the question of cost from
breached security.
- Porter
[5/11] from: ingo::2b1::de at: 8-May-2002 21:54
Hi Thaddeus,
Thaddeus Serpa wrote:
<..>
> In a nutshell, the program generates an image containing some
> effects and a word within the image. The user has to enter the
> word in a text box.
<..>
Is that what you're looking for?
>> save/png %/tmp/tst.png to-image layout [ text "TEST" ]
In the next Release you should be able to send the png file directly,
without the need to save to disc.
But one important thing to notice: on *nix systems view needs X11, and
this is seldom installed on pure servers.
I hope that helps,
kind regards,
Ingo
[6/11] from: gchiu:compkarori at: 9-May-2002 12:18
> Here's a quick-and-dirty solution that avoids the need
> for any
> on-the-fly image generation (which means it will run
> faster,
> in all likelihood).
How about creating a javascript function that pops up a
number in a new window. The function names, and the submit
button could be randomized as well.
This would require a client tool that can execute javascript
as well .... I'm not aware of any free ones.
--
Graham Chiu
[7/11] from: tserpa:earthlink at: 9-May-2002 0:25
Hi Joel,
Thanks for the suggestion - I think it has potential. What do you think of
Porter's comments? Perhaps, I could keep the image content the same and
rename the files on the fly. Do you think copying, renaming, and deleting
a file would cost more than dynamically generating an image?
Also, what is the point of blank.gif? Am I missing something obvious?
Ted
At 02:32 PM 5/8/2002, you wrote:
[8/11] from: rebolek:seznam:cz at: 9-May-2002 9:30
Hi, try following code. It's note perfect - it was done this morning. If
you're interested I can enhance it somehow.
BTW - bug in new view: (and in core too I think but did not tried)
>> help system/words
SYSTEM/WORDS is an object of value:
** Script Error: form-val is missing its val argument
** Where: reform
** Near: form-val pick vals 1
Well, no crash, I have to find something more dangerous ;)
And now --- automatically unreadable words (cca 2273 of them)
rebol[]
words: first system/words
word: pick words random length? words
text-face: make face [
color: 128.128.128 + random 128.128.128
font: make font [
size: 24
color: black
]
size: 10000x10
text: word
size: 50x10 + size-text self
effect: compose/deep [gradmul (3x3 - random 3x3) (50.50.50 +
random 200.200.100) (100.50.50 + random 100.150.150)]
]
text-image: to-image text-face
effect-face: make face [
size: text-face/size
image: text-image
effect: compose/deep [blur crop (random 5x5) (size - random
10x10) fit contrast 10 sharpen draw [pen red line (random 10x10 random
size)] gradmul (3x3 - random 3x3) (50.50.50 + random 200.200.100)
(100.50.50 + random 100.150.150)]
]
view make face [
size: 400x100
offset: 100x100
pane: effect-face
]
[9/11] from: joel:neely:fedex at: 9-May-2002 9:34
Hi, Ted and Porter,
Ted Serpa wrote:
> Hi Joel,
>
> Thanks for the suggestion - I think it has potential. What do
> you think of Porter's comments? Perhaps, I could keep the image
> content the same and rename the files on the fly. Do you think
> copying, renaming, and deleting a file would cost more than
> dynamically generating an image?
>
I'm sure that embedding human-readable information in a single
graphic is one of the more secure options. See Ingo's comments
for another reason why I was looking for an alternative.
The point of having multiple images for each letter was to increase
the difficulty of setting up any "web scraping" bots. The effort
of building a bot would IMHO be much smaller than the effort of
obtaining the information (by a human being) to identify which of
many obscure image names corresponded to which letters of the
alphabet, especially if there were many images (with random names,
of course) for each letter. AFAICT, they'd have to have some human
actually LOOK at a good-sized sample of pages and the corresponding
html in order to begin compiling the dictionary that the bot would
use.
Your point about copying to create new, previously unused image names
could work well with this scheme, and might be done without having to
make up all new images for each case. With sufficiently many images
for each letter, randomly used, it might be adequate to copy/rename
a few letters every hour.
I'm not familiar with the tools that Porter referred to (and the
Orsus web site was not very informative -- maybe I didn't find the
hidden good parts, but saw only a bunch of markety-rah-rah) but I
have done a bit of (legitimate, intra-enterprise) web scraping.
It's not that hard IMHO to throw in a few randomly formatted parts
of a web page that would make constructing a reliable scrapebot
quite difficult.
Remember that you could use the "letter block" images for other
things than just the password -- page and section headings, bolded
text, and pure decoration (there's nothing that says that the
letter blocks have to be big and gaudy; they could just look like
ordinary text). The more you use them on the page, the harder it
would be to build a scrapebot that would be able to figure out which
ones were the ones that mattered.
> Also, what is the point of blank.gif? Am I missing something obvious?
>
In my original use, it allowed me to blank pad the sequence of digit
images with blanks to fill a fixed-size area on the page. In the
kind of thing we're talking about here, blanks could be used to
separate words, given that
HE IS NOW HERE
would be a distinct passphrase from
HE IS NOWHERE
;-)
-jn-
[10/11] from: tserpa:earthlink at: 9-May-2002 10:52
Joel, thanks for the ideas and explanations. They are very helpful. I agree that this
solution can be effective and more easily implemented than some of the more sophisticated
methods, even though it may not be optimal.
I think a further twist could be to make different backgrounds for each of the image
files, both for same and different characters. I was told that OCR filters can be used
to decipher images.
Thanks again,
Ted
[11/11] from: joel:neely:fedex at: 9-May-2002 16:30
Hi, Ted,
Thaddeus Serpa wrote:
> I think a further twist could be to make different backgrounds for
> each of the image files, both for same and different characters.
> I was told that OCR filters can be used to decipher images.
>
Another variation that I forgot to mention earlier was to make the
character images with transparency, then place them in table cells
with different background colors (using the hex codes). A separate
piece of text could tell the visitor to e.g. ignore letters with
red backgrounds.
-jn-