DatatWare Housing: Developing a High Performance Web Application

Developing a High Performance Web Application - The Art of Scalability

Posted by Ram at 9:23 PM

Writing a web application is straight forward once you've mastered the fundamentals of a server-side scripting language and a database system. Though it is not obvious, developing a website that caters to thousands of users and one that caters to millions of users require a different set of tricks.

This new scenario must take into careful account the server resources that is utilized to ensure that only optimal coding practices are used when information storage and retrieval goes up to a massive scale. Instead of thinking in terms of a single web server, you must use variable configuration in your application to cater for multiple web servers that can be added-on at any future date. Your programming logic must know how to distribute content and processing work across the added servers.

More servers are needed when the CPU load utilization spikes to a high level that eventually affects the performance of your website. Your web server takes longer time to respond to user request and pages that took seconds to load are now taking much longer. This could happen when the number of simultaneous website visitors increase or that a certain processing tasks in your application is highly intensive, such as video conversion process, real-time image file manipulation or sorting of huge datasets that are fully loaded into memory.

One strategy is to clearly separate different aspects of your application functionality to different servers, such as news.yourserver.com and video1.yourserver.com or music3.yourserver.com. Hence, when your users are using different services of your website, they are bounced to different subdomains that are pointed to different web servers. If it's a membership page, there will be a challenge to maintain user session across the different servers, as session information are usually stored in each local server in the /tmp folder. In PHP, the session handler needs to be configured to utilize a common database instead of the local filesystem to cater for cross server session tracking.

This strategy also require that 'shared files' are stored in a common area so that the different servers has access to them and that there are APIS written for reading or writing to these common files. When multiple users on multiple servers are able to write to a common file, the file access API must ensure that proper file locking mechanism is in place to maintain file integrity.

Since the database backend is usually accessed by a local socket/network connection, it is by default that all web servers are able to access a common data pool without further complexities. However in almost every case, the bottleneck of a website lies with the database storage, especially when the amount of data stored has gone into tens of millions of rows and that heavy server usage is causing thousands of database records to be updated every second.

Highly matching database indexes must already be in place to ensure an instantaneous retrieval of data. This being said, it is never as easy as it seems and because we are discussing scalability, there must also be a solid plan in place to add more database servers when necessary. As with the multiple web servers, your application logic must also which database server to retrieve data from, if you're splitting different partitions of data into different database servers.

The other trick is to deploy Database Replication, whereby every database system contains the exact copy of data (as opposed to partitioning different sets of data to different db servers), and that the overall load of each database server is spread across the group of servers. More than just a method of database load balancing, this method also ensures data redundancy, and that your application would be running smoothly in the event of one or more database crash.

The other methods to think about is the caching of often read data that is rarely updated. File based caching or memory based caching can be used, and it has proven that these methods can improve performance by nearly two thousand percent. This is due to the high cpu utilization when sockets communication is involved and that local file or memory access takes a tiny fraction of the cpu resource instead.

When planning for a high performance system, do think of using lookup tables that has values already pre-calculated for frequent use, as well has utilize hashing algorithms for the lookups of cached data. Avoid data intensive real-time processing at all cost and try to use pre-generated tables that is incrementally built instead of a full regeneration every day. Reduce access to the database as much as possible and use local file access instead.

Lastly, always take a peek into your database process list, to get a glimpse of which query is taking too long to respond. These are the queries that locks the database tables for a long periods and denying access to other queries from completing its task. These are the major points of planning a high performance and scalable web application. It's something that's not widely documented in books, and you will need to experiment and to create benchmarks for comparison.

Useful skills to have include the understanding of file read/write locking access, sockets communication, what a hash is compared to a sequential search and memory based caching - Memcache. Lastly, always think of delegating tasks to different servers and plan for future data partitioning. Good luck with your tasks at hand, for it's not an entirely easy one.

1 comments:

Anonymous said...

New Vegas Casinos | Baccarat, Craps & More
New Vegas Casinos · Casino in Nevada. Vegas 온라인 바카라 게임 casinos. There are multiple casino gambling sites out there in Nevada. · Hotel Casino Resort,

February 4, 2022 at 10:37 PM

DatatWare Housing

Developing a High Performance Web Application - The Art of Scalability

1 comments:

Blog Archive