Apr 30, 2010

Architecture & Use-case

"Technology looking for a Problem" or "Boiling the Ocean" !!!

How many times we heard these 2 statements in the computer industry? It's more common in the startup world.

What is the fix? The short answer is "a use-case".

Let's take a look at 2 examples:
[1] Java --> "Fix C++ complexity"
Nearly 20 years back, the great engineers of Sun(Patrick Naughton, James Gosling, Bill Joy and many others) did not like the complexity of C++ programing language. They went & solved their own problem. The result is Java. (Click on Naughton's link to read about why even Java born!)

Like them, there were million other engineers hated C++ complexity. So they joined the party with Java. This resulted in massive success of Java. Rest is history.

[2]Hadoop -->"Write once-read many times"

Google had a problem with big data. There was no solution available. They built map/reduce, GFS to solve their problem and published a paper about it. Yahoo had the same problem. Doug Cutting designed & built Hadoop. Then there are 1000 other companies are having the same problem. The result is success of Hadoop.

If we look at these 2 examples,
  1. Use case is obvious (C++ complexity OR write once-read many times)
  2. No solution available to solve this problem
  3. Many people has the same problem
The end result is obvious- "success". That simple- plain simple English!

It's very critical to have a clear understanding of at-least "one use-case" before designing new products. If not, it will end up either looking for a problem or let's boil the ocean type solution in 1.0 time.

Apr 23, 2010

2-Tier Web Architecture

The web architecture is changing for many reasons- read/write, high volume of data, more devices, mobile, etc. The good news is that it's changing in a simple direction.

We all know about this 3-tier web architecture:

Web Server
|
App Server
|
Database

Web Server: To serve static contents
App Server: To run business logic
Database : To store data. (SAN/NFS for storage)

What is the problem with this architecture?

Web Server: CDN now does this work. The static requests doesn't (mostly) come to the origin datacenter.

App Server: It itself complex piece of sw. I remember counting 50+ modules in an app server. It ran fine on vertically scalable machines like Sun E10K. How many companies are buying these machines to run app servers? It's challenging to horizontally scale on small machines based on-demand.

Databases: Good for consistency with always-read applications and is still good. Scaling is challenge. Disk IO is expensive. Always-write is challenging with RDBMS. Pl read Amazon's Dynamo paper for details.

So, what is the new web architecture?
It's a 2 tier. You punt one layer to CDN at the top and run your database in memory at the bottom.

Here is the new architecture-

Web(App) Server
|
Memcached

Tier-1: It handles HTTP connection + runs the business logic. Example- Apache +PHP or Jetty.
Tier-2: It's memcached supported by a datastore at the back end. The datastore can be MySQL-type or Cassandra-type NoSQL. (You may also want to read Stanford's RAM Cloud paper)

This is the change:
CDN is the new web server.
App Server is the new web server.
MemcacheD is the new database.
RDBMS is the new disk.

Let's expand the above picture with a network diagram:

Round-robin DNS
|
----------------------------
|
HAProxy#1 ...... HA Proxy #n
|
-------------------------------------
|
Web(App)Server Farm
|
MemcachedD Farm
|
Stroage(DB/FileSystem/Disk)

Is this architecture works for all web applications?
Not-necessarily.

You may still need to execute range query against RDBMS. You may still need the consistency. You may still need App Server- RDBMS transactions. Your engineers may not be comfortable writing code close to the metal. But high performance web is clearly moving to some sort of this 2-tier architecture.

Summary
Last week here in Sunnyvale, Amazon CTO made a comment in AWS event and that summarizes the new architecture well: "For 90% use cases we use key-value store and for remaining 10% we use RDBMS".

If your application is complex than Amazon.com and gets more traffic than Amazon.com, then you may want to continue with 3-tier architecture!!! If not, think about the future with 2-tier architecture.

Apr 19, 2010

HTML5 vs Flash!

This morning I saw the Adobe CEO Shantanu's interview in GiGaOM here and thought of writing a short blog about my opinion about Flash vs HTML-5.

Note: I like both A's. (Apple & Adobe). I use everyday PDF on my Mac. So, I have no bias towards either company.

Flash vs HTML-5.

Let's take a look at some facts. We should leave both A's out and just look at the history & consumers.

[a] Consumers don't care about whether it's Flash or HTML5 in their devices. Simply because they don't know about it. (ask your grandma if you are not sure!) They just need a good viewing experience.

[b]From history book: Open Standard vs Proprietary Technology- Open Standard always wins.
Let's take an interesting example. Now Apple fights a proprietary technology(Flash) with a open standard.(HTML5). But in the past, it was other way around- AppleTalk vs TCP/IP.

"AppleTalk is a proprietary suite of protocols developed by Apple Inc. for networking computers. It was included in the original Macintosh released in 1984, and is now unsupported with the release of Mac OS X v10.6 in 2009[1] in favor of TCP/IP networking." - Wikipedia

It's not just AppleTalk & there were other proprietary standards like IPX/SPX, etc are also out. Today the internet runs on top of TCP/IP. Imagine, if we still have all those proprietary networking protocols...my Mac can't talk to your Windows!

Summary
Proprietary standards are mostly like driving car by looking at the rear-view mirror and open standards are driving car by looking at the road ahead. As I said before here, HTML 5 is one of the most important next generation web technology and it will eventually become mainstream.