Friday, March 15, 2013

Web Form Security: Reasons behind online attacks

Why am I being hacked?

To really know what you're dealing with you have to get inside the head of a script kiddie or a hacker if you want to actually "secure" your systems. Since there are so many factors, many of which that are usually out of the control of most individuals, I'm using the phrase "secure" loosely. From a web or online security standpoint I've worked with several companies over the years, usually in a post-attack analysis, trying to determine what happened, how to recover (if possible) and how to harden against the attack again. Companies often do not spend money on security before an attack and say things like "It's never happened before." or "Why would they target us?" or "No we haven't been hacked." when in actuality they have.

There are several reasons why someone or a group might want to take over a webpage, a blog, a webserver, or a MySQL database server. Here are a few of the reasons I've experienced myself for why someone would exploit a site or page.

Web Real Estate

Mission critical systems that rely on a database need to be secured. Not only is there the risk of someone data mining a database of personal data, but there are also risks for the database server that contains the database and/or the website servers that host the site receiving or displaying the data. One of the ways people can cause havoc on a server is by using an SQL Injection Attack. In November of last year I wrote a post about SQL Injection Attack Precautions. It talks about who's ultimately responsible in terms of securing a system since usually in most cases the blame for an attack is spread across several people.

How could web real estate be at risk? If someone looks at a form for a search, they can assume that it is connected to some sort of database. Blindly hacking at the form, they will not be able to tell if the database is a PHP array, an SQL database, or an XML file until they receive an [un]intended response. Through passing unexpected characters into the form they can potentially break the form, cause a stack overflow on the server (effectively crashing it), or break the application that is handling the form. Something like putting a server into an endless loop can bring a server to its digital knees. This usually involves  passing escape characters to add extra slashes, closing quotes (single and double), programming language terminations, or by passing HTML code into the form. Passing empty form fields can break some forms, while others can be broken by simply disabling Javascript.

When a web form is broken it returns valuable information to an attacker about the structure of the system, the type of server services running, and the quality of the code on the system itself. In my experience most websites with easily hackable code are frequented more heavily by would-be attackers and script kiddies than sites that return no errors or information to an attacker. Since most modern web servers are hosted in server farms with high bandwidth connections, to outside attackers it will more than likely be the same payoff for hacking a sophisticated site versus a simple site. They both offer the similar  bandwidth and server resources and they are usually designed to be managed remotely so there is little chance the Administrator will spot the attack. If an attacker sees an increased level of security, they're less likely to attack a server simply because their efforts will be undone much more quickly or they'll be caught because they will have to try harder.

Web "Street Cred"

Just like the real world, online hackers need notoriety. That being said, there are individuals in the hacking community who love a challenge. Some websites such as tech blogs, newspapers, social media accounts, video streaming websites and social networks are going to be more at risk for someone trying to replace content or services simply to make a name for themselves. There are far more people looking to become famous from a hacking attempt than there are people looking to steal information and sell it on some black market. The skill sets required for guessing a password to take over a page vs. actually deriving unencrypted usable data that can be sold are night and day different. There are quite a few apps in the open that will crack or guess a password. There aren't very many individuals that can successfully write a root kit. Sometimes an attacker can simply guess the password to get in and look at the code. The guys who do it for a living will not be bragging about it unless they're making a sales pitch for paying work behind closed doors. You will see script kiddies doing it so they can make a name for themselves (think Anonymous).

Political Reasons

Some "groups" like Anonymous take pride in bringing down sites and exploiting pages and accounts with opposing views or showing companies and corporate conglomerates that they have glaringly open holes in their security. Search for "Anonymous Hacks Burger King Twitter" on Google. While there likely are real hackers that operate under the "Anonymous" moniker, most of the exploits I've seen are pretty amateurish. If Anonymous were really a serious group there would more than likely be now more online trading (or stock market for that matter).

Bad SEO

Some people just want more links for their own sites. These people can be spammers and sometimes they're legitimate businesses that have paid for a service that they themselves weren't quite sure on. In the past there was a practice of spamdexing where a website listed in major directories or topics pertaining to the contents of the site would be picked up and rewarded by the search engines. Fake sites and phishing sites soon caught onto this. The search engines changed their policies, but sometimes in countries throughout the world word doesn't travel so fast through translation. Many "SEO specialists" mention that they can get a site listed through link sharing. This is more than likely how if they are overseas.

An example of spam-dexing from the Search Engine Journal (3/12/2013)
"There are many sites with spam on their sites that can’t see the links that they are showing where you couldn’t see unless you went into the code.  Google bot shows that a Top 50 University has “cheap viagra pills” on their main page."
To find out which one you can search for University Viagra on Google.

Data Capturing including Credit Cards and Social Security Numbers

Some people are a little more secretive about their exploits and they will hide code on a system to take advantage of web visitors and traffic. This may take the form of database copying or replication (if the site is storing e-mail addresses, credit card numbers, or sensitive data). The attackers may send copies of the real submissions to their own server. They may monitor statistics from the site (for a competitor). Some attackers inject malware into the code so they can infect user computers. In a previous post I talk about the hacking of clothing manufacturer Calvin Klein and how I started receiving SPAM from the newly created e-mail address I used for them the day I signed up. Calvin Klein of course denied any knowledge of this or interest in rectifying the issue.

Additionally when someone is actually capturing all information to a system on the system itself, any information passed is vulnerable. This includes Social Security Numbers, Credit Card numbers, and anything else that may be submitted (student ID numbers). Depending on the type of site, this is a huge risk to clients, customers, and worse... the brand in terms of PR backlash.

Bot Net  

Web servers can be powerful, plentiful machines just ripe for harvesting. Located on massive connections there is very little that can be done to track multiple machines requesting orders from the controlling system (the requests can look like normal web traffic in a packet filter). In numbers, compromised machines can become a powerful collective. Why not run an application in the background on someone else's web server to make it control countless drones while it goes on serving a webpage? This does actually happen. Usually the attacker will install something called a "root kit" which is an app or framework that is undetectable that runs in the background. This allows them to control the server and exploit the bandwidth and resources available to the server. The web page may be up and running and unchanged, so the owner usually won't find out until there is a knock at the door because the machine was used to exploit someone else's, it was controlling countless other machines or worse the website goes down because the ISP pulled the plug at the request of a government or after their own inspection and determination of high traffic. Once a root kit is installed it is easier to use a new machine than it is to clean off the root kit. Without examination the exploit the attacker used may still be in place. It would only be a matter of time before the attack exploited the machine again.

So what are the real risks?

Most of the time the attacks come down to bad password management policies, or use of an unsafe network by someone to log into a website control panel or administration panel (think Starbucks). Every once in a while someone is hit with an XSS attack or a/an [My]SQL injection attack, but this requires someone actually trying to hack the server. Passwords can be captured in open places like airports, coffee shops, hotels, vacation resorts, cruise ships, and on any other unsecured WiFi networks with free applications on the web. Be smart and use strong passwords longer than 10 characters in safe / secure locations and more than likely there will be no issues.

Web Form Security: Stopping Spam

Sites are often hacked through poor
web form implementation.
This is the first of a multipart series on securing web forms. One of the best ways to approach web form security is thinking about the form through the eyes (or the mind) of the attacker. What do they think they can gain from the form? In this first posting I'll talk about SPAM and why it happens and potentially how to stop it.

SPAM (non-solicited e-mail) has been a problem almost since the beginning of the Internet. It costs companies billions of dollars in wasted bandwidth and resources such as anti-spam firewalls, hardware spam filters, anti-spam e-mail filtering services, and lost employee time. Since 1995 [the] HTML [language] has allowed for an input tag and web forms for uploading images, files, and supplying different types of input. As web bots or crawlers became more prolific around the beginning of the century companies and private individuals began turning to HTML [web] forms to reduce or cut the amount of SPAM (unsolicited e-mail) they were receiving from web-posted e-mails (e-mail addresses that were actually visible to a browsing visitor).

An e-mail address in text on a website is surely to be added to hundreds if not thousands of spam e-mail queues.

Initially having a form that would post to your e-mail account was enough to stop a lot of the spam, but as data-mining became much more invasive, form elements containing email addresses were being mined specifically for those addresses and more spam continued (e-mail addresses in web forms are available as text to anything that can parse HTML). Then there were exploits of people injecting information into web forms and relaying messages through Perl server-side CGI scripts. Once exploited, someone could easily send e-mail from a webserver and have it look like it actually came from the company hosting the form. Most modern web forms are processed by a server-side component [language] such as PHP, ASP, ASP .Net, Ruby, Python, and so forth. Even though with server-side processing it is much easier to filter the information coming in through a form, many times a "spammer" can beat a form by simply completing the form as a person would.

E-mail Addresses
Some web forms still use outdated non-industry standard code to submit a message from a website to an e-mail address using the "mailto" option. These forms will typically open a default e-mail client (such as Outlook or Mac Mail) upon the user selecting the submit button. The reason this needs to be avoided:

  1. The e-mail address is visible to anything on the internet.
  2. Some people use webmail and this opens a program that may confuse or irritate them.
  3. This e-mail address can be added to a SPAM list or be set as the recipient of e-mail bounce backs from SPAM or spoofed e-mails.
  4. E-mail gathered from websites is sometimes sold in online mailing lists to people who believe they are receiving a list of targeted e-mails when in actuality, they are using mined lists.

Steps to reduce spam from the web


  1. Remove all text-readable e-mail addresses from your website. To check this, open a webpage in a browser and "view source." If you search for the @ symbol in the code and find it, make sure it is not in an e-mail address. If you're responsible for programming the site, replace this option with something else. If you're not responsible for maintaining the code, contact the programmer and ask them to go about creating a form for e-mails from the website. If someone needs to contact you they will find a way. If you are a business, do not rely solely on your [web form] e-mail as your main point of contact.
  2. Provide a working form that can filter your messages from the site. Most [web] hosting plans come with a server-side language component that can be used to filter the messages. This language may be PHP, ASP, ASP .Net, or Perl. Servers installed by companies internally where a website is hosted locally also come with these options already available by default.
  3. Search the internet for company listings that contain your e-mail address. Sometimes this may include corporate directories, trade publications, web domain registries, and message boards. You can ask them to remove the e-mail address and replace it with a link to the web form.

Reducing other spam

Many of my clients publish their e-mail addresses in print on a variety of materials. These e-mail addresses more often than not go to some mass distribution group in their e-mail server. When a spammer sends a spam message to this e-mail account it is routed to more than one person. For every person the message is sent to there is a copy in their inbox (or Spam folder) on the mail server, not to mention potentially in a sent box from the distribution group, or in the inbox of the distribution group if it is setup as an individual that forwards rather than a forward-only box. If the person forwards to their phone and doesn't use a connector like IMAP, then there may be multiple copies of the e-mail per person as well. All of these messages [usually] take a tiny amount of room, but in greater numbers they can take a lot of space on a server or a local workstation (IT Real estate).
  1. Try to limit the e-mail recipients for addresses in print to only the people who maintain the list. Obviously business cards will need to have an e-mail address, so I'm talking about brochures, flyers, forms, letterheads, envelopes, and advertisements.
  2. Make a group-accessible mailbox for any inbound e-mails rather than distributing them through the mail system. This way any person in the group can delete the e-mails from the single location. Back this up in the event of accidental deletion.
  3. When printed documents are available online in PDF form they can be mined for e-mail addresses just as easily as a web page.
  4. Follow-up e-mails to submissions should come from a no-reply box or something that can be checked for mail submission, but not a distribution group.

Stopping in-bound spam with a web form

When securing a web form, there are a few things to consider.
  1. The person completing the form may not be a person at all. It is possible to submit information to a webserver via the POST and GET methods without using a web browser.
  2. Where is the submission of the form ultimately going? CRM, e-mail only, a database?
  3. If a [real] person can't complete your form because it is too complicated there is not point in actually having the form. They won't use it or worse, they'll go somewhere else.
  4. If your form relies heavily on a client-side filtering (eg. Javascript), that scripting language can be [more than likely] disabled. If it is disabled the filtering may no longer happen. Can they still complete the web form?
  5. Non-filtered web forms are [some of] the biggest risks to databases and corporate infrastructures. 
There are a variety of things that can be done with PHP, JSP, ASP, and ASP .Net (commonly found on Microsoft Web Servers) that can dramatically reduce the amount of Spam you receive from a form.  

K.I.S.S.

I was unfamiliar with this phrase, but one of my clients said "KISS... Keep It Simple Stupid," in response to their bad web application I've been repairing (previous vendor). I always try to setup a form to be simple to use for the end user and the recipient of the form details. Bad web forms [overall] cost companies millions [maybe billions] every year. If people can't complete a form, then sales can be lost, searches can't be made, potential customers feel the bad service is already starting and they are not even a customer yet.

Hack it.

I test the forms heavily to make sure they work. I try to type in incorrect information: I misspell things, omit fields, forget to put the @ symbol in e-mail addresses, and fill the forms with data in the wrong fields. Usually I weigh the feedback from the form to see if they are purposely entering misinformation or if they enter it humanly impossible. Then I check to see if the form was actually submitted. If it wasn't, then why?

Autofill

I use the auto-fill features to complete the form and submit. I see if the auto-fill features of my browser actually complete the form. Most people do not have a lot of time to fill out forms, so if you can make it present them with standard fields they're accustomed with they can complete the process more quickly. Auto-fill works by using some normal field names to gather information, then when those field names are used again, the auto-fill component of the browser(if enabled) will present the user with their past responses.

Don't reinvent the wheel

It's a web form. People are used to completing things in a certain way. Present the information in an intuitive method for the audience. If you have clients from all over the world, don't mandate a state name, or a county. If they're only supposed to be from one country, then you can omit the country field. Use words that translate into other languages easily. To see some of the field names for common web forms check out sites that people will use on a regular basis. Examples include sign-up forms on sites like UPS, postal services, Facebook, Twitter, and LinkedIn. Use your browser to "View Source" on the forms for those pages and see what the fields are named. If you name your field "client-email" it will probably not auto-fill, but if you name it "email" or use the HTML5 field type of "email" then it should work without issue when it comes to using auto-complete or auto-fill.

Avoid client-side language filtering

Because they can be easily disabled I recommend avoiding languages like Javascript in the web form. I've seen forms that have interactive elements that tell you whether the different components of the form pass a test before they can be submitted. The drawbacks are that not everyone has Javascript enabled, sometimes these things become annoying by removing elements from my message and telling me I've completed something inaccurately before I'm done with my submission. Also browsers implement Javascript components in different ways. For instance a company with a policy of using older versions of Internet Explorer on their workstations rely on ActiveX controls for AJAX( the scripting interface for dynamically checking a field without someone submitting the form with Javascript). These non-signed ActiveX scripting components are disabled by default for security reasons. The people who would use your form to submit their message may not be able to complete the form if it relies on Javascript or AJAX.

If you do decide to use AJAX, remember that the handler for the AJAX is available in the code. Anyone who wants to take over your server may use this as their point of attack. By submitting to the handler directly and bypassing the form altogether they can potentially find weaknesses in your application, server, or code rather quickly if they're using a bot net (group of compromised machines).

With that being said, avoid Flash as a web form. It uses a client-side scripting language based on ECMAScript (the basis for Javascript), not everyone has it installed, it doesn't work on all mobile devices. Flash is buggy at best, and Flash can easily crash a browser on its own. There is no reason to use Flash as a web form. Also there are ways to beat whatever filtering the Flash application is doing prior to sending to the server, meaning Flash might actually the Achilles' Heel of the safety of your server. Just as someone can see what the AJAX web component does without using their browser, someone can download a Flash decompiler, use a header checker in a browser like Firefox on a Flash web form, use a web debugger like FireBug to watch information transferred, or open packet filter like WireShark to see what information is being transferred to the server (if you're using a stand-alone Flash application on a DVD).

Serverside language filtering

When you're filtering server side, be careful what information you give back to a potential attacker. Do not allow them to enter code into the form and the give it to them as an attempt to have them correct it. Also it's good practice not to force someone to review the contents of their message.

PHP (one of my favorites) comes with the various functions to take advantage of Regular Expressions (another language for searching and filtering). With regular expressions or RegEx, some new programmers who are given the task of hardening a web form may see this as a viable option for screening the fields massively. This isn't a bad mentality, but it does depend on what you do with a failed response. Bounce someone for an incorrect field entry and you may lose a client or potential customer. If for instance the user enters the information in their own language (eg. Chinese) and the programmers assumes their own language (eg. English) for filtering requiring English characters, then the potential user may become a false-positive as an exploit attempt.

When I check fields I try to make it something a little more obvious. Here are a few things I check for:
  1. Do "name" field submissions contain numbers? (not typically, even for Edward II they use Roman Numerals)
  2. Do "e-mail" fields have "@" symbols and at least one period after the symbol? (a necessity)
  3. Do phone numbers only contain numbers? (What about +, - , Ext, Extension, x, '.', They might contain any of those) 
  4. Does the address contain a space character? (A necessity)
  5. Is an address really required? (If it's not, don't mark it as such, and don't force someone back to complete it.)
  6. If something is not required, but entered, does it still conform. (eg. Address isn't required, but they filled it with random garbage... they're probably a spammer.)
  7. Do the comments contain URLs? (Maybe a spammer? They might be telling you about a problem on your website.)
  8. Am I expecting BBCode(something usually found in web forums) in the comments? (Probably not, this is more than likely a spammer)
  9. Did they put a space in their name? If they didn't, is it okay to accept information on a first name only basis?
  10. Did they include anything that is obviously an SQL injection attempt? (eg. "; delete from users where 1=1")

Are they real?

This is a huge question in terms of securing a web form. If the attacker is using a program to auto-complete the form to submit Spam, or if they're using a group of computers (bot net) to attack the form and bring down the server, then how can you stop it? Simple. See if they are real.

Typical Captcha

CAPTCHA is a bad thing in many ways: It makes people angry. It's hard to complete. It's not always readable. It's a complete waste of time. Captcha stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." It's basically a quick fix for trying to guess whether someone is human. International users? Avoid Captcha.

Hashing and tests for human skills

Just as CAPTCHA makes an attempt to make people prove they're human, there are a couple of things you can do behind the scenes to see if someone is a real person.
  1. Did they completely the form in a humanly possible time?
    Depending on the required information from the form, and the type of information expected, run a few timed tests. Use things like auto-fill and auto-complete to try and beat your human times. In my experience, most bots will on first attempt complete the form in less than 2 seconds.
  2. Did they complete the form using two different IP addresses?
    Simple... check their ip address and, pass their IP to the handler page. Check it on the next page for a change. Oh, but what if they modify this IP address you're using.
  3. Did they even use the form? Check this with simple hashing.
Most programming languages include a function called hashing. I use hashing as a test to see if someone is altering my expected information or if they're modifying anything that I'm setting myself. If they are, then chances are they're not using my web form unless they're using a browser plugin that lets them rewrite the HTML components.

Some of the tricks I do on the form:
  1. Pull their IP address. Include this in the info headed to the form. (If their location changes they're using a proxy or they're not using the form.)
  2. Pull the timestamp for the viewing of the page. Include this in the info sent to the handler. (If it's too short, then they're not real. If it's too long, then they're not real.)
  3. Pull the browser's User Agent. Include this in the info sent to the handler. (Does it look like a real browser? Does it say cURL?)
  4. Pull a random number and hash it also. (This will not be used at all.)
  5. Don't label these things in an intuitive way, but rather place comments in the serverside code that indicate what you're expecting.
Hash these three things together with something only known to you (a special word or phrase) and submit them in a different method than the rest of the form, meaning if you're using POST variables for submitting the contents of the form, then submit the Hash with the GET method variables.

Some tricks on the handler:
  1. Pull their IP again. Does it match what was submitted?
  2. Pull the new timestamp. Minus the old timestamp and see how long it took.
  3. Check the user agent. Is it a real browser? Does it have a keyword in it like Bot or cURL? If so, then it's more than likely not a real person.
  4. Do the new hash with the info passed from the original form. If it matches, then you know the form information wasn't altered. If it doesn't match, stop processing the form.

What happens if I don't hash my submissions?

I did this myself when I was first trying to beat the spammers. I started getting spam emails that were submitted 3 years prior or 100 years prior or 2 years into the future. Investigating, I noticed (in my custom statistics app) that spammers were reloading the form over and over again (likely in view source mode). They would alter the values of the timestamp and resubmit the form. Then they would do it over and over again filling my inbox. When I viewed the source and did this myself I noticed that they were watching the hidden values in the forms. They were seeing if they changed and if they were timestamps or whatever. If something didn't change it was a straight hash of something provided before the form was submitted (IP Address or User Agent). If it did change then it was a timestamp or a random number. I tested by hashing timestamps initially. This led to the same results. The spammers were guessing my hashing method and hashing the timestamps and presenting them to me, edited, in mass. I started hashing random numbers as a salt (cryptography meaning) with MD5 and I noticed that the spammers stopped filling out the form for a while. They couldn't figure it out, so rather than trying they would go elsewhere. Some still tried. Ultimately the only way to beat it was to complete the form as a human being. Some still do.

About 98% of my web form spam stopped when I started hashing my results and testing thoroughly. I capture the failed attempts in a text file (for logging, false positives, and form hardening stats) and pull the country from their submissions based on IP address. Most are from India, China, and Russia,with a few from the middle-east.

The whole picture

The best method I've found for beating spam with some of my corporate clients is the hashing method I've described, and I use a scoring system to see how bad a potential spammer might be (through filtering). If they don't complete 2 or 3 of the form fields correctly then they get a likely spam score. Certain things are a dead give-away... no @ symbol and they're a likely spammer. "Viagra" in all incarnations... (\/iagra,viaGra,v!agra, etc). You have to be careful if you're blocking words. "Cialis" is in the word "speCialist." I include the results of my spam scoring in my text-only files and copies sent to my clients. We occasionally see a false positive in a foreign language, but for the most part the Spam scoring is dead on.

In the next segment I talk about Why people attack sites online.

Check back for more posts. I'll update this entry when I add more to the series.

Until later,
-Chris