Friday, March 15, 2013

Web Form Security: Stopping Spam

Sites are often hacked through poor
web form implementation.
This is the first of a multipart series on securing web forms. One of the best ways to approach web form security is thinking about the form through the eyes (or the mind) of the attacker. What do they think they can gain from the form? In this first posting I'll talk about SPAM and why it happens and potentially how to stop it.

SPAM (non-solicited e-mail) has been a problem almost since the beginning of the Internet. It costs companies billions of dollars in wasted bandwidth and resources such as anti-spam firewalls, hardware spam filters, anti-spam e-mail filtering services, and lost employee time. Since 1995 [the] HTML [language] has allowed for an input tag and web forms for uploading images, files, and supplying different types of input. As web bots or crawlers became more prolific around the beginning of the century companies and private individuals began turning to HTML [web] forms to reduce or cut the amount of SPAM (unsolicited e-mail) they were receiving from web-posted e-mails (e-mail addresses that were actually visible to a browsing visitor).

An e-mail address in text on a website is surely to be added to hundreds if not thousands of spam e-mail queues.

Initially having a form that would post to your e-mail account was enough to stop a lot of the spam, but as data-mining became much more invasive, form elements containing email addresses were being mined specifically for those addresses and more spam continued (e-mail addresses in web forms are available as text to anything that can parse HTML). Then there were exploits of people injecting information into web forms and relaying messages through Perl server-side CGI scripts. Once exploited, someone could easily send e-mail from a webserver and have it look like it actually came from the company hosting the form. Most modern web forms are processed by a server-side component [language] such as PHP, ASP, ASP .Net, Ruby, Python, and so forth. Even though with server-side processing it is much easier to filter the information coming in through a form, many times a "spammer" can beat a form by simply completing the form as a person would.

E-mail Addresses
Some web forms still use outdated non-industry standard code to submit a message from a website to an e-mail address using the "mailto" option. These forms will typically open a default e-mail client (such as Outlook or Mac Mail) upon the user selecting the submit button. The reason this needs to be avoided:

  1. The e-mail address is visible to anything on the internet.
  2. Some people use webmail and this opens a program that may confuse or irritate them.
  3. This e-mail address can be added to a SPAM list or be set as the recipient of e-mail bounce backs from SPAM or spoofed e-mails.
  4. E-mail gathered from websites is sometimes sold in online mailing lists to people who believe they are receiving a list of targeted e-mails when in actuality, they are using mined lists.

Steps to reduce spam from the web


  1. Remove all text-readable e-mail addresses from your website. To check this, open a webpage in a browser and "view source." If you search for the @ symbol in the code and find it, make sure it is not in an e-mail address. If you're responsible for programming the site, replace this option with something else. If you're not responsible for maintaining the code, contact the programmer and ask them to go about creating a form for e-mails from the website. If someone needs to contact you they will find a way. If you are a business, do not rely solely on your [web form] e-mail as your main point of contact.
  2. Provide a working form that can filter your messages from the site. Most [web] hosting plans come with a server-side language component that can be used to filter the messages. This language may be PHP, ASP, ASP .Net, or Perl. Servers installed by companies internally where a website is hosted locally also come with these options already available by default.
  3. Search the internet for company listings that contain your e-mail address. Sometimes this may include corporate directories, trade publications, web domain registries, and message boards. You can ask them to remove the e-mail address and replace it with a link to the web form.

Reducing other spam

Many of my clients publish their e-mail addresses in print on a variety of materials. These e-mail addresses more often than not go to some mass distribution group in their e-mail server. When a spammer sends a spam message to this e-mail account it is routed to more than one person. For every person the message is sent to there is a copy in their inbox (or Spam folder) on the mail server, not to mention potentially in a sent box from the distribution group, or in the inbox of the distribution group if it is setup as an individual that forwards rather than a forward-only box. If the person forwards to their phone and doesn't use a connector like IMAP, then there may be multiple copies of the e-mail per person as well. All of these messages [usually] take a tiny amount of room, but in greater numbers they can take a lot of space on a server or a local workstation (IT Real estate).
  1. Try to limit the e-mail recipients for addresses in print to only the people who maintain the list. Obviously business cards will need to have an e-mail address, so I'm talking about brochures, flyers, forms, letterheads, envelopes, and advertisements.
  2. Make a group-accessible mailbox for any inbound e-mails rather than distributing them through the mail system. This way any person in the group can delete the e-mails from the single location. Back this up in the event of accidental deletion.
  3. When printed documents are available online in PDF form they can be mined for e-mail addresses just as easily as a web page.
  4. Follow-up e-mails to submissions should come from a no-reply box or something that can be checked for mail submission, but not a distribution group.

Stopping in-bound spam with a web form

When securing a web form, there are a few things to consider.
  1. The person completing the form may not be a person at all. It is possible to submit information to a webserver via the POST and GET methods without using a web browser.
  2. Where is the submission of the form ultimately going? CRM, e-mail only, a database?
  3. If a [real] person can't complete your form because it is too complicated there is not point in actually having the form. They won't use it or worse, they'll go somewhere else.
  4. If your form relies heavily on a client-side filtering (eg. Javascript), that scripting language can be [more than likely] disabled. If it is disabled the filtering may no longer happen. Can they still complete the web form?
  5. Non-filtered web forms are [some of] the biggest risks to databases and corporate infrastructures. 
There are a variety of things that can be done with PHP, JSP, ASP, and ASP .Net (commonly found on Microsoft Web Servers) that can dramatically reduce the amount of Spam you receive from a form.  

K.I.S.S.

I was unfamiliar with this phrase, but one of my clients said "KISS... Keep It Simple Stupid," in response to their bad web application I've been repairing (previous vendor). I always try to setup a form to be simple to use for the end user and the recipient of the form details. Bad web forms [overall] cost companies millions [maybe billions] every year. If people can't complete a form, then sales can be lost, searches can't be made, potential customers feel the bad service is already starting and they are not even a customer yet.

Hack it.

I test the forms heavily to make sure they work. I try to type in incorrect information: I misspell things, omit fields, forget to put the @ symbol in e-mail addresses, and fill the forms with data in the wrong fields. Usually I weigh the feedback from the form to see if they are purposely entering misinformation or if they enter it humanly impossible. Then I check to see if the form was actually submitted. If it wasn't, then why?

Autofill

I use the auto-fill features to complete the form and submit. I see if the auto-fill features of my browser actually complete the form. Most people do not have a lot of time to fill out forms, so if you can make it present them with standard fields they're accustomed with they can complete the process more quickly. Auto-fill works by using some normal field names to gather information, then when those field names are used again, the auto-fill component of the browser(if enabled) will present the user with their past responses.

Don't reinvent the wheel

It's a web form. People are used to completing things in a certain way. Present the information in an intuitive method for the audience. If you have clients from all over the world, don't mandate a state name, or a county. If they're only supposed to be from one country, then you can omit the country field. Use words that translate into other languages easily. To see some of the field names for common web forms check out sites that people will use on a regular basis. Examples include sign-up forms on sites like UPS, postal services, Facebook, Twitter, and LinkedIn. Use your browser to "View Source" on the forms for those pages and see what the fields are named. If you name your field "client-email" it will probably not auto-fill, but if you name it "email" or use the HTML5 field type of "email" then it should work without issue when it comes to using auto-complete or auto-fill.

Avoid client-side language filtering

Because they can be easily disabled I recommend avoiding languages like Javascript in the web form. I've seen forms that have interactive elements that tell you whether the different components of the form pass a test before they can be submitted. The drawbacks are that not everyone has Javascript enabled, sometimes these things become annoying by removing elements from my message and telling me I've completed something inaccurately before I'm done with my submission. Also browsers implement Javascript components in different ways. For instance a company with a policy of using older versions of Internet Explorer on their workstations rely on ActiveX controls for AJAX( the scripting interface for dynamically checking a field without someone submitting the form with Javascript). These non-signed ActiveX scripting components are disabled by default for security reasons. The people who would use your form to submit their message may not be able to complete the form if it relies on Javascript or AJAX.

If you do decide to use AJAX, remember that the handler for the AJAX is available in the code. Anyone who wants to take over your server may use this as their point of attack. By submitting to the handler directly and bypassing the form altogether they can potentially find weaknesses in your application, server, or code rather quickly if they're using a bot net (group of compromised machines).

With that being said, avoid Flash as a web form. It uses a client-side scripting language based on ECMAScript (the basis for Javascript), not everyone has it installed, it doesn't work on all mobile devices. Flash is buggy at best, and Flash can easily crash a browser on its own. There is no reason to use Flash as a web form. Also there are ways to beat whatever filtering the Flash application is doing prior to sending to the server, meaning Flash might actually the Achilles' Heel of the safety of your server. Just as someone can see what the AJAX web component does without using their browser, someone can download a Flash decompiler, use a header checker in a browser like Firefox on a Flash web form, use a web debugger like FireBug to watch information transferred, or open packet filter like WireShark to see what information is being transferred to the server (if you're using a stand-alone Flash application on a DVD).

Serverside language filtering

When you're filtering server side, be careful what information you give back to a potential attacker. Do not allow them to enter code into the form and the give it to them as an attempt to have them correct it. Also it's good practice not to force someone to review the contents of their message.

PHP (one of my favorites) comes with the various functions to take advantage of Regular Expressions (another language for searching and filtering). With regular expressions or RegEx, some new programmers who are given the task of hardening a web form may see this as a viable option for screening the fields massively. This isn't a bad mentality, but it does depend on what you do with a failed response. Bounce someone for an incorrect field entry and you may lose a client or potential customer. If for instance the user enters the information in their own language (eg. Chinese) and the programmers assumes their own language (eg. English) for filtering requiring English characters, then the potential user may become a false-positive as an exploit attempt.

When I check fields I try to make it something a little more obvious. Here are a few things I check for:
  1. Do "name" field submissions contain numbers? (not typically, even for Edward II they use Roman Numerals)
  2. Do "e-mail" fields have "@" symbols and at least one period after the symbol? (a necessity)
  3. Do phone numbers only contain numbers? (What about +, - , Ext, Extension, x, '.', They might contain any of those) 
  4. Does the address contain a space character? (A necessity)
  5. Is an address really required? (If it's not, don't mark it as such, and don't force someone back to complete it.)
  6. If something is not required, but entered, does it still conform. (eg. Address isn't required, but they filled it with random garbage... they're probably a spammer.)
  7. Do the comments contain URLs? (Maybe a spammer? They might be telling you about a problem on your website.)
  8. Am I expecting BBCode(something usually found in web forums) in the comments? (Probably not, this is more than likely a spammer)
  9. Did they put a space in their name? If they didn't, is it okay to accept information on a first name only basis?
  10. Did they include anything that is obviously an SQL injection attempt? (eg. "; delete from users where 1=1")

Are they real?

This is a huge question in terms of securing a web form. If the attacker is using a program to auto-complete the form to submit Spam, or if they're using a group of computers (bot net) to attack the form and bring down the server, then how can you stop it? Simple. See if they are real.

Typical Captcha

CAPTCHA is a bad thing in many ways: It makes people angry. It's hard to complete. It's not always readable. It's a complete waste of time. Captcha stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." It's basically a quick fix for trying to guess whether someone is human. International users? Avoid Captcha.

Hashing and tests for human skills

Just as CAPTCHA makes an attempt to make people prove they're human, there are a couple of things you can do behind the scenes to see if someone is a real person.
  1. Did they completely the form in a humanly possible time?
    Depending on the required information from the form, and the type of information expected, run a few timed tests. Use things like auto-fill and auto-complete to try and beat your human times. In my experience, most bots will on first attempt complete the form in less than 2 seconds.
  2. Did they complete the form using two different IP addresses?
    Simple... check their ip address and, pass their IP to the handler page. Check it on the next page for a change. Oh, but what if they modify this IP address you're using.
  3. Did they even use the form? Check this with simple hashing.
Most programming languages include a function called hashing. I use hashing as a test to see if someone is altering my expected information or if they're modifying anything that I'm setting myself. If they are, then chances are they're not using my web form unless they're using a browser plugin that lets them rewrite the HTML components.

Some of the tricks I do on the form:
  1. Pull their IP address. Include this in the info headed to the form. (If their location changes they're using a proxy or they're not using the form.)
  2. Pull the timestamp for the viewing of the page. Include this in the info sent to the handler. (If it's too short, then they're not real. If it's too long, then they're not real.)
  3. Pull the browser's User Agent. Include this in the info sent to the handler. (Does it look like a real browser? Does it say cURL?)
  4. Pull a random number and hash it also. (This will not be used at all.)
  5. Don't label these things in an intuitive way, but rather place comments in the serverside code that indicate what you're expecting.
Hash these three things together with something only known to you (a special word or phrase) and submit them in a different method than the rest of the form, meaning if you're using POST variables for submitting the contents of the form, then submit the Hash with the GET method variables.

Some tricks on the handler:
  1. Pull their IP again. Does it match what was submitted?
  2. Pull the new timestamp. Minus the old timestamp and see how long it took.
  3. Check the user agent. Is it a real browser? Does it have a keyword in it like Bot or cURL? If so, then it's more than likely not a real person.
  4. Do the new hash with the info passed from the original form. If it matches, then you know the form information wasn't altered. If it doesn't match, stop processing the form.

What happens if I don't hash my submissions?

I did this myself when I was first trying to beat the spammers. I started getting spam emails that were submitted 3 years prior or 100 years prior or 2 years into the future. Investigating, I noticed (in my custom statistics app) that spammers were reloading the form over and over again (likely in view source mode). They would alter the values of the timestamp and resubmit the form. Then they would do it over and over again filling my inbox. When I viewed the source and did this myself I noticed that they were watching the hidden values in the forms. They were seeing if they changed and if they were timestamps or whatever. If something didn't change it was a straight hash of something provided before the form was submitted (IP Address or User Agent). If it did change then it was a timestamp or a random number. I tested by hashing timestamps initially. This led to the same results. The spammers were guessing my hashing method and hashing the timestamps and presenting them to me, edited, in mass. I started hashing random numbers as a salt (cryptography meaning) with MD5 and I noticed that the spammers stopped filling out the form for a while. They couldn't figure it out, so rather than trying they would go elsewhere. Some still tried. Ultimately the only way to beat it was to complete the form as a human being. Some still do.

About 98% of my web form spam stopped when I started hashing my results and testing thoroughly. I capture the failed attempts in a text file (for logging, false positives, and form hardening stats) and pull the country from their submissions based on IP address. Most are from India, China, and Russia,with a few from the middle-east.

The whole picture

The best method I've found for beating spam with some of my corporate clients is the hashing method I've described, and I use a scoring system to see how bad a potential spammer might be (through filtering). If they don't complete 2 or 3 of the form fields correctly then they get a likely spam score. Certain things are a dead give-away... no @ symbol and they're a likely spammer. "Viagra" in all incarnations... (\/iagra,viaGra,v!agra, etc). You have to be careful if you're blocking words. "Cialis" is in the word "speCialist." I include the results of my spam scoring in my text-only files and copies sent to my clients. We occasionally see a false positive in a foreign language, but for the most part the Spam scoring is dead on.

In the next segment I talk about Why people attack sites online.

Check back for more posts. I'll update this entry when I add more to the series.

Until later,
-Chris

No comments:

Post a Comment

I'm going to read this before it goes live if you don't mind.