Wednesday, December 24, 2008

Grades

Dear Class,

Thanks for your hard work in this class. For most of you, I think it's obvious how much you learned. And your progress has been impressive to watch.

In case you're interested, here is a break-down of grades in the class. As you'll see from the chart, I applied a "curve" to smooth out the numbers. However, since many students had very high grades, the curve did not have a drastic effect on lowliers.

Here's how letter-grades correspond to number grades:
  • A's are between 91-100
  • B's are between 81-90
  • C's are between 71-80
  • D's are between 51-70
Grades were based upon quizzes, homework, and final projects in about equal portions. I considered class attendance and participation when scores were close to the margins between two letter grades.

A few observations about the grades:
  • Three otherwise failing students went MIA, and their grades are not included in this chart
  • Four otherwise passing students did not complete a final project, and their grades are not included in this chart
  • those who performed on the low end of the chart seemed to stop doing well when we transitioned into PHP/MySQL
  • About half the class did what I would consider "extraordinarily well". This doesn't seem to have any correlation with previous web experience.
  • The median grade was an 80, just on the border between A and B
I welcome any and all feedback about the course. You know my email address.

Happy Holidays!

Amos

Friday, December 5, 2008

Intro to Security on the Web

Security risks on the web fall into 3 general categories:
  1. Server-side risks
  2. Client-side risks
  3. Network eavesdropping
Server-side risks
Every web server is a security risk - you are letting anyone in the world connect to your server and access files, run scripts, upload files, run queries on and store data in your database. The more complicated your setup, both in terms of the server setup as well as your code setup, the more likely you are to have bugs, which in turn makes it more likely you have holes in your security. Possible risks include the theft of confidential information and the installation of malicious scripts onto your servers.

A common example of something hackers will do once they compromise your server is a distributed denial of service attack (DDOS). Hackers will gain access to many insecure servers and install scripts that do nothing but make requests to a particular web server. With thousands of these scripts running concurrently on many compromised servers, hackers can easily create so much traffic for a website that it brings the web server to its knees and is not able to respond to all the requests. This happens all the time to the most popular sites. Usually web servers have software that detects attempted DDOS attacks and has mechanisms for blocking requests from any server that seems to be compromised in this way.

Another common attack is the SQL injection attack. Hackers will try to gain access to your database this way, and can easily steal private information, for example credit card numbers, if you are not careful. This is the primary reason why you should ALWAYS sanitize user input before using it in queries to the database. Make sure what the user has submitted does not contain any weird code in it, and that it is of the type that you expected (e.g. if it's a phone number you expect, make sure it's a phone number the user entered).

Client-side risks
Attackers may also target the client in a variety of ways. Each web browser runs as an application on your local client machine. This means it has access to your file system and everything on it. Since the information that the browser uses to display content from the web is usually coming from servers on the web, there's a chance that a hacker will be able to use a server to send instructions to your browser that may install malicious software, or force the client to do things like upload personal information to the hacker's server.

Multiple layeres of anti-virus software is a must on both PC and Mac for preventing malware from running your computer. Given that the web is a high-risk environment, most web browsers and email clients are thoroughly tested and can be considered secure. However, all of the major web browsers and email clients do issue security updates from time-to-time to fix security problems they find in their software.

Certain types of web applications, such as Java, ActiveX, Silverlight, Flash, Adobe PDF are not natively supported by most web browsers. This means that they must run as separate applications from the web browser (even though they show up in the web browser window), and so these technologies have their own security risks that their developers must constantly mitigate. Like browsers, these technologies are so commonly used that security risks are usually discovered quickly, and updates are sent out that patch the bugs. But bugs do exist, and hackers are always trying to find new ones. Do a search for "flash vulnerabilities" on Google, and you will see examples of exploits that hackers have created using Flash.

Phishing scams are another major client-side risk that you should be aware of. Scammers could create a website, for example, that looks exactly like Amazon.com's checkout page, but is actually created by a hackers in Nigeria. If for some reason you find yourself on this site thinking it is Amazon.com, you may enter your credit card information, which is then used by the hackers to buy gifts for themselves (or other more nefarious things). Phishing scams are also commonly used for identity theft - the phishing sites trick users into revealing personal information which is then used to apply for credit cards, issue passports, buy weapons, etc.

Most web browsers and email clients (e.g. Microsoft Outlook, Mozilla Thunderbird, Mac Mail, etc.), and client security programs (e.g. Norton Antivirus) have ways they try to identify phishing scams. But hackers are constantly figuring out new ways of bypassing or compromising every new tool that developers create, so most software should be updated regularly to keep it secure.

Network eavesdropping
Any time a client communicates with a server, the data is physically transmitted either via electic current in a wire or via radio waves in the air. There are ways hackers can intercept either of these means of communication.

Wireless communication is notoriously insecure. Anyone with a wifi card in their laptop can easily intercept unencrypted data being passed between the wireless router and other laptops. So some people encrypt the data that is passed between the two. The thinking goes that even if someone does intercept the signal, they won't be able to understand it since it's encrypted. However, WEP, the most commonly used encryption protocol available on wireless routers is known to be very weak encryption. WPA2 is supposedly a bit more secure, if it is available on your router. Another way to secure your wireless network is to set up your wireless router to only accept connections from computers with particular MAC addresses. Each computer has a unique MAC address that never changes.

Wired communication, via ethernet cable, or other types of wires, can also be intercepted by someone who plugs into the same network as either the client or the server. Since all communication between client and server shares wires that also are used by other clients and other servers, it's not crazy to imagine that someone could find a way to intercept and listen in on your conversation.

Like wireles communication, there are methods of encrypting communication over the wires so that even if someone does intercept communications, they won't be able to easily decipher them.

Many web servers, especially for e-commerce sites, are called "secure servers". Secure servers use the HTTPS protocol instead of the regular HTTP, so the URL will look like https://something.com, for example. Often, the checkout pages of online stores, or any page that asks the user to enter confidential information will be hosted on a secure server.

HTTPS encrypts the communication between the client and the server using the SSL encryption protocol. So the "secure server" is actually just encrypting the network communication between client and server, not securing the server itself against server attacks. The server and the client still have the same security risks as any other client or server. As with all encyption methods, SSL (and thereby HTTPS) can be hacked - a common exploit being the man-in-the-middle attack.

Further reading:
http://www.w3.org/Security/Faq/
http://www.securityfocus.com/infocus/1864
http://www.windowsecurity.com/articles/Common_Attacks.html
http://www.icir.org/vern/cs294-28/scribe/WebClientAttacks.pdf
http://www.icir.org/vern/cs294-28/syllabus.html

Be careful.

Intro to E-Commerce

The fundamental concepts of e-commerce are easy enough to grasp, and these days most e-commerce sites follow normative standards and conventions. There are three basic components: the storefront, the shopping cart, and the checkout.

The Storefront
When someone is shopping on the web, they want to browse products on a site to see what's available. Usually, products use categorization to make it easy for users to find the sort of products they're looking for.

For example, a shoe store typically has top-level categories such as Men's and Women's. A shoe store might also have one or more levels of sub-categories of each top-level category. For example, the Men's category might have sub-categories such as Boots, Sandals, Dress Shoes, Sneakers etc.

It is not uncommon for a particular product to fall into more than one category. For example, a casual hiking boot may fall under both Hiking and Casual.

Now for a small tangent on the topic of categorization: A modern twist on the idea of categorization is tagging. Many sites, especially "Web 2.0" sites, now offer tags in addition to, or as a replacement for, categories. Tags are just keywords that are associated with a particular product. Often, but not always, tags are user-generated, meaning that users of a site can add whatever keyword they want to a particular product. If users can collaboratively add tags to a product or asset, then the site offers what is known as a folksonomy.

Managing the storefront of an e-commerce site is a matter of organizing products, and managing inventory. How you organize the storefront, and how you categorize your products, are important concepts to work out in the information architecture phase of an e-commerce project, since the methods of navigation and categorization that you choose will affect every aspect of the site architecture.

In terms of development, a storefront would have separate database tables for categories, products, and the association of products to categories.

A typical categories table might have fields for id, title, parent_category_id, and created. A products table would typically have fields for id, title, description, num_available, price, thumbnail_image_path, large_image_path, and created. An association table could have fields for id, product_id, category_id, created, thereby allowing for a many to many relationship between products and categories by thereby having a separate row for each category a particular product belongs to.

The Shopping Cart
Shopping carts are an essential part of any e-commerce site. They take the metaphor of the physical shopping basket, and transpose it into online media. At its most fundamental, a shopping cart is tool for maintaining state and remembering which products a user has selected for purchase so that they can buy them all together as a batch without having to re-enter their billing and shipping info for each one individually.

As you can probably imagine, most shopping carts are simply tables in a database that have fields for user_id, product_id, and quantity (as well as the id and created fields, of course). That way, the database table simply has a row for every product in the user's cart. To get the contents of the cart, you make a query on the table for all rows that match a given user_id.

Payment Processing
The checkout and payment processing parts of an e-commerce site are the most complicated. You need to securely process a transaction on a user's credit card. This entire process should take place on a secure server where all communication between the client and server is encrypted. Also, in order to process credit cards online, you need to have what is known as a merchant account with a bank. To charge cards over the phone, in a store, or online, merchants need these special accounts with a bank.

Assuming you have a merchant account (or are using a payment service that does), the first step in processing payment online is to send the data from a user's shopping cart to a script that then calculates the total fee owed, as well as an taxes and surcharges. Once the user enter's his/her credit card, billing, and shipping info, you perform a transaction on their credit card by first authorizing it with the issuing credit card company.

If the credit card authorization passes, you must process the order with the credit card company by charging their card, remove the items from the user's shopping cart, and make sure your site's product inventory is up-to-date now that you have sold off a few items. Once everything is finished, you show a confirmation screen to the user with an order receipt. Often, the site will automatically send an email to the user (assuming they entered an email address) with the order receipt in it.

One rule of thumb to follow if you are running your own store is never to store sensitive information like credit card numbers in your database. Unless you have a budget to hire a decent security expert, your site can (and very well may) be hacked, and you do not want to be liable for the damages that would result from someone getting a hold on your clients' credit card numbers.

Due to the complication of doing all these steps yourself, most online merchants opt to use a third-party payment processing service that provides security and handles all the dirty work of charging a card for them.