9 Things I Learned About Distributed Software
Photo by Timo Volz on Unsplash

9 Things I Learned About Distributed Software

Distributed software refers to any software that is not running on one computer. All web applications, for example, are distributed software since they rely on applications running on other servers.

However, when developing distributed software it is false to assume that network is reliable, network is secure, network is homogenous, the topology does not change, and latency is zero.

There was a time when software writing was only conducted at universities, military institutions, and businesses that worked with defence contracts. 

Hardware was built to do variety of tasks, but without software to control it, hardware could not do anything. But hardware was considered the revenue generator, not software.

Software programs were either very expensive or free. And, there was no charge for operating systems. When hardware was purchased, the operating system was installed and provided free of charge. Gradually, the nature of software changed.

Computer companies started to unbundle software from hardware. And revolution was in the making for a “killer application” - a software so useful that it created increasing demand for hardware.

Yes, use of the software saved time, increased accuracy, provided ease of use, but more than all this, it provided power to the individual. It empowered people over computers. And, it could also be manipulated, and customised by non-programmers.

Furthermore, the personal computer hardware provided any business person the ability to crunch his / her own numbers, and write own documents.

Eventually, a new killer application would dominate the scene, and run on newer hardware.

Lesson learned was that software market would become very disruptive, volatile, and extremely competitive that no killer application remained at the top for very long, and people managing traditional corporate computer centres saw personal computers as threats.

So, why distributed software? Because a single (powerful) machine cannot meet all of the requirements.

Here's what I learned:

One:

Typical characteristics of distributed software are:

  • Supports physical separation of components
  • Supports scalability, distributes load on to multiple components, support large population of concurrent users, to better adjust resources to what is needed
  • Support resiliency, availability, and modularity
  • Supports administrative autonomy - multiple hardware devices, multiple administrators
  • Supports heterogeneity - no requirement to use the same hardware and software
  • Also have a client-server relationship (caller is client, and executor is server)

Two

Remote Procedure Call (RPC) is key mechanic of distributed software. It further extends conventional procedure call functionality so that it does not need to exist in the same address space.

It is a typical query - response interaction, or request-response message passing system.

RPCs are not a new concept: For many years C programmers have used remote procedure calls (RPC) to execute a C function on a remote host and return the results.

A client has a request message that the RPC translates and sends to the server. This request may be a procedure or a function call to a remote server. When the server receives the request, it sends the required response back to the client.

Remember in RPC protocol, the caller and callee processes have a client-server relationship.

Three

And there are different types of RPCs such as synchronous, asynchronous, and callbacks.

A synchronous operation blocks a process till the operation completes, or client that makes request will wait for response from server. The client is blocked while the server is processing the call and only resumed execution after the server is finished.

An asynchronous operation means client makes request, and it will only wait for acknowledgment from the server, not the response to the request.

However, in the callback scenario, the RPC is working in a peer to peer paradigm, in which the machines are both client and server.

The developer is able to focus on the business logic of the application and to distribute the application logic across multiple software components and also to potentially distribute those software components over several physical computers, without significant effort spent on the networking aspects.

However, it would be misleading to suggest that the transparency provided by RPC removes all of the design challenges of distribution or that it removes all of the complications that networking introduces. 

Four

Network latency is key challenge. Latency is time delay experienced by a system. Trade off between local processing vs. local + network + remote processing latency.

Latency of remote calls involves interprocess communication in that different processes have different address spaces.

But there are two distinct approaches to use in which latency gap can be reduced.

Five

To reduce latency gap, the following protocols are beneficial: Binary protocols such as MS-RPC, CORBA, Java’s RMI.  And, Web-RPC such as HTTP, SMTP, SOAP, XML-RPC, REST

The primary difference between RPC and RMI is that RPC, is primarily concerned with data structures. It’s relatively easy to pack up data and ship it around, but for Java, that’s not enough. In Java we don’t just work with data structures; we work with objects, which contain both data and methods for operating on the data.

Six

Interface Description Language (IDL):

IDLs are used to enable communication between software components that do not share the same language. IDL offers a bridge between two different systems. 

Language to define the distributed interface. And it’s a file created by developers of a distributed interface.

IDLs are used to generate client stub and server stub.

The client stub resides in the client address space, and packs parameters into a message. The client stub then sends to transport layer, which then sends it to remote server machine.

On the server side, the transport layer passes message to a server stub which unpacks the parameters and calls the requested server routine.

Seven

The message format widely used is XML. Initially, there is no accepted standard for structuring messages. It’s a mix of binary and text based approach.

Focus was on simplicity, and usability. There is a powerful tool to also navigate XML tree structures called Document Object Model (DOM). DOM is an interface that represents how your HTML and XML documents are read by the browser.

Eight

But everybody wanted to come up with their own specialised schema for XML, and it became complex.

Eventually, JSON was widely utilised. This is because browsers supported Javascript, and is considered better format to use than XML primarily because JSON is less verbose, and JSON is faster. Parsing XML software is slower. 

Nine

Common elements of today’s large distributed systems:

  • Load Balancers - local load balancing, and global load balancing using DNS, and Content Delivery Networks
  • Internet Application Protocol Servers - HTTP or Web Server, Common Gateway interface, SMTP
  • Indexing / Search Servers - there are 2 basic models (Inverted index, faceted index). Interface to the indexing / search service is often HTTP based
  • Application Servers - provides lot of supporting functionality such as security, state management, transaction management, database connections
  • Database Servers - make databases available over the network (relational / SQL databases)
  • Integration Servers - allows applications to exchange information, infrequent transfer of large amounts of data. Covering high volumes of very small messages. Provides standardised interfaces into diverse systems 
  • Perimeter firewalls - filter out the bits that come in and out of the perimeters of the organisation
  • Internal firewalls - filtering traffic between internal hosts

In summary, developing distributed systems is challenging. Hard to reconcile requirements and keep things simple. Components and network are not reliable.

To architect a distributed system consists of:

  • Assembling software that utilises the network to communicate
  • Having good understanding of functional building blocks such as web servers, database servers, integration servers, application servers, etc.
  • And, understanding the scaling, performance, security, resiliency requirements.
Sanjay Upadhyay

Strategic IT Leader | Data Science, Computer Networks & Cloud Computing Specialist | Proven Success in Leading Global Teams & Driving Innovation

5y

Secure communications, fault tolerance, and coordination of states are key challenges of distributed systems.

To view or add a comment, sign in

More articles by Sanjay Upadhyay

Insights from the community

Others also viewed

Explore topics