HTTP World Wide Web Consortium (W3C), culminating in the

HTTP
Protocol

Overview:

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

The
Hypertext Transfer Protocol (HTTP) is an application-level protocol.
HTTP is used for collaborative and distributed systems. This is the
foundation for data communication for the World Wide Web. HTTP is a
generic protocol which can be used for other purposes by using
extensions of its request methods, error codes, and headers.

HTTP is a TCP/IP
protocol. HTTP is used to deliver data like HTML and image files,
query results, sound, video and other multimedia files on the WWW.
The default port number for HTTP is 80.

HTTP’s
development was initiated by Tim
Berners-Lee at CERN.
HTTP’s Standards development was coordinated by the Internet
Engineering Task Force (IETF) and the World
Wide Web Consortium (W3C), culminating in the
publication of a series of Requests
for Comments (RFCs). The first definition of
HTTP/1.1 occurred in RFC
2068 in 1997, although this was obsoleted
by RFC
2616 in 1999 and then again by RFC
7230 and family in 2014.

A later version, the
successor HTTP/2.0,
was standardized in 2015, and is now supported by major web servers.
HTTP
works as a request–response protocol
in the client–server
computing model.
For example, a web
browser
may be the client and
an application running on a computer hosting a website may
be the server.
The client sends an HTTP request message
to the server. The server,which provides resources such
as HTML files and other content, or performs other functions on
behalf of the client, returns a response message
to the client. The response contains completion status information
about the request and may also

contain
requested content in its message body.
HTTP
is designed to permit intermediate network elements to improve or
enable communications between clients and servers. HTTP is
an application
layer protocol
designed within the framework of the Internet
protocol suite.
HTTP
resources are
identified and located on the network by Uniform
Resource Locators (URLs),
using the Uniform
Resource Identifiers (URI’s)
schemes http and https.

Basic
Features

There
are three basic features that make HTTP a simple but powerful
protocol:

HTTP
is connectionless
: The HTTP Client send an HTTP request and after a request is
made, the client disconnects from the server and waits for a
response from server. The server processes the request and
re-establishes the connection with the client to send a response
back.

HTTP
is media independent
: It means that data of any type can be sent by HTTP as long as
the client and the server know how to handle the content of data. It
is necessary for the client and the server to specify the type of
content using appropriate MIME-type.

HTTP
is stateless
: HTTP is connectionless and it is a direct result of HTTP being
a stateless protocol. The server and client are aware of each other
only during a current request. Afterwards, both of them forget about
each other. Due to this nature of the protocol, neither the client
nor the browser can retain information between different requests
across the web pages.

Basic
Architecture

The following
diagram shows basic architecture of HTTP.

The HTTP protocol is
a request/response protocol based on the client/server based
architecture where web browsers, robots and search engines, etc. act
like HTTP clients and the Web server acts as a server.

Client: The HTTP
client sends a request to the server in the form of a request method,
URI, and protocol version, followed by a MIME-like message containing
request modifiers, client information, and possible body content over
a TCP/IP connection.

Server: The
HTTP server responds with a status line, including the message’s
protocol version and a success or error code, followed by a MIME-like
message containing server information, entity meta-information, and
possible entity-body content.

HTTP
Version

HTTP
uses a . numbering
scheme to indicate versions of the protocol. The version of an HTTP
message is indicated by an HTTP-Version field in the first line.
Syntax of specifying the HTTP version is as following:

HTTP-Version
= “HTTP” “/” 1*DIGIT “.” 1*DIGIT

Example:
HTTP/1.1

Uniform
Resource Identifiers

Uniform
Resource Identifiers (URI) are simply formatted, case-insensitive
string containing name, location, etc. to identify a resource, for
example, a website, a web service, etc. A Syntax of URI used for HTTP
is as follows:

URI
= “http:” “//” host “:” port abc_path “?”
query

Here
if the port is
empty or not given, port 80 is assumed for HTTP.

Example:
http://abc.com/users/web.html

Date/Time
Formats

All
HTTP date/time formats must be represented in Greenwich Mean Time
(GMT), without exception. HTTP
applications are allowed to use
any of the following three representations of date/time formats:

Mon,
18 Dec 2003 03:30:15 GMT ; RFC 822, updated by RFC 1123

Monday,
18-Dec-03 03:30:15 GMT ; RFC 850, obsoleted by RFC 1036

Mon
Dec 18 18:30:15 2003 ; ANSI C’s asctime() format

Character
Sets

We
use character sets to specify the character sets that the client
prefers. Multiple character sets can be listed separated by commas.
If a value is not specified, the default is the US-ASCII.

Example:
US-ASCII

HTTP-Message

HTTP
makes use of the Uniform Resource Identifier (URI) to identify a
given resource and to establish a connection. Once the connection is
established, HTTP messages are
passed in a format similar to that used by the Internet mail
RFC5322 and the Multipurpose Internet Mail Extensions (MIME)
RFC2045. These messages include requests from
client to server and responses from
server to client which will have the following format:

HTTP-message
= | ; http/1.0 messages

HTTP
requests and HTTP responses use a generic message format of RFC 822
for transferring the required data. HTTP message contains following
four items:

A
start-line

Zero
or more header fields followed by CRLF

An
empty line indicating the end of header fields

Optionally
a message body

1.
Message Start-Line

Message
start-line has the following syntax:

Start-line
= Request-line | Status-line

Example:

GET
/web.html HTTP/1.0(Request-line sent by client)

HTTP/1.0
200 OK(Response-line sent by server)

2.
Header Fields

HTTP
header fields provide required information about the request or
response, or about the object sent in the message body. There are
four types of HTTP message headers:

General-header: These
header fields have general applicability for both request and
response messages.

Request-header: These
header fields have applicability only for request messages.

Response-header: These
header fields have applicability only for response messages.

Entity-header: These
header fields define Meta information about the entity-body or, if
nobody is present, about the resource identified by the request.

All
the above mentioned headers follow the same generic format and each
of the header field consists of a name followed by a colon (:)
and the field value as follows:

Message-header
= field-name “:” field – value

3.
Message Body

The
message body part is optional for an HTTP message but if it is
available, then it is used to carry the entity-body associated with
the request or response. If entity body is associated, then
usually content-type and Content-Length headers
lines specify the nature of the body associated.

HTTP Request:

An
HTTP client sends an HTTP request to a server in the form of a
request message which includes format like:

A
request-line
Zero
or more header(General/request/entity) fields followed by CRLF
An
empty line indicating the end of the header fields
Optionally
a message-body

Request-Line

The Request-Line
begins

with a method token,
followed by the Request-URI and the protocol version, and ending with
CRLF. The elements are separated by space and SP characters.
Request-Line
=Method SP Request-URI SP HTTP-version CRLF

Request
Methods:

HTTP
Methods:

There are 9 main methods in HTTP. These methods indicate what action
has to be taken on the resource. These methods make sure that the
client gets the expected result i.e. the resource he/she desires.
Generally resources are the server’s files or any executable
running on the server.

According
to HTTP/1.0 specs three main methods were specified:
1)
GET
2)
HEAD

3)
POST
Later
in HTTP/1.1 specs 5 more methods were defined:
4)
OPTIONS
5)
PUT
6)
DELETE
7)
TRACE
8)
CONNECT
RFC
5789 added the PATCH method
9)
PATCH

GET:
This method is used to retrieve the aforementioned resource. It is
the most popularly used method to retrieve information. Parameters
for the requested data are added in the query string itself.
Response of a GET request can be cached. Not preferable for
transmitting confidential information like passwords.

A
conditional GET can also be used by adding certain constraints like
If-Match,
If-None-Match, or If-Range header field, If-Modified-Since,
If-Unmodified-Since. A conditional GET works only if certain
constraints in header field are fulfilled.

Over
network usage can be avoided by using partial GET, which specifies
Range header field in the request itself.

HEAD:
This method is quite synonymous to GET method but the only striking
difference between the two is that the HEAD method returns an empty
Message Body in response.

Response
to HEAD may be cached if it varies from previous cached versions.
This
method is popularly used to fetch meta-information from headers.

POST:
This method is used to send some data to server this data is
encapsulated in request body, generally to store it. For e.g.: web
forms. POST method has no restrictions to amount of data that can be
sent as a part of query as it encapsulates this data into request
message body. Whereas in GET method there’s restriction over
amount of data in query string.

Also
this method is preferable while transferring confidential information
like passwords as URL encoded query won’t appear in browser’s
address box.

It
is an idempotent method.

OPTIONS:
This method is used to retrieve information about the HTTP service
options for target resource available at the sever end or
intervening intermediary’s end. We can also determine the server’s
capabilities using this method with ‘*’.

Responses
to this method are not cacheable.

PUT:
This method updates representation of aforementioned URI. If
aforementioned URI already exists then this must be considered as
the updated version of the URI at origin server. If no such resource
is pre-existing then resource with specified URI is created and
201-created response is replied to user agent else 200 or 204 is
sent to UA.

Responses
to this method are not cacheable.

DELETE:
This method is used to delete the aforementioned resource in the
URL. This method can be overridden by server intervention and there
is no assurance that user’s requested file will be deleted in
spite of receiving positive response from server.

Responses
to this method are not cacheable.

TRACE:
This method kind of simulates the sever end view to the client for
diagnostic or debugging purposes, i.e. it shows full detailed
representation of the request received by the server from the
client.

CONNECT:
This method is used for tunnelling purpose. Here a tunnel is
established between the client and the server using one or more
proxies and on top of it, it can also be secured by using TLS.

PATCH:
This method is used to make changes to the predefined URI as
directed in request entity. It is quite synonymous to the PUT method
but the difference underlies in the way both of them are processed
by the server. In PUT the new version is considered to be the
modified version whereas in PATCH certain predefined instructions
from the request determine whether the origin server resource would
get modified or not. PATCH’s response can be cached under certain
suitable circumstances.

Request-URI:

It is a uniform
Resource Identifier and identifies the resource upon which to apply
the request.

Request-URI= “*”
| absoluteURI |abs_path | authority

Request
Header Fields:

The
request-header fields allow the client to pass additional information
about the request, and about the client itself, to the server. These
fields act as request modifiers. . Here is a list of some important
Request-header fields that can be used based on the requirement:

Accept-Charset
Accept-Encoding

Accept-Language

Authorization

Expect
From

Host
If-Match

If-Modified-Since

If-None-Match
If-Range

If-Unmodified-Since

Max-Forwards

Proxy-Authorization

Range

Referer
TE

User-Agent

You can introduce
your custom fields in case you are going to write your own custom
Client and Web Server.

For
fetching the HTTP request of hello.htm page:
GET
/hello.htmHTTP/1.1
User-Agent:
Mozilla/4.0
Host:
www.httpreqexa.com
Accept-Languages:
en-us
Accept-Encoding:
gzip,deflate
Connection:
Keep-Alive

HTTP
RESPONSE:

Basically,
response message is a message which is sent in response to request
message.

Below
picture shows general format of HTTP request message.

Let
us take one example to understand it more deeply.

HTTP/1.1
200 OK
Connection:
close
Date:
Sat,07 Jul 2010 12:00:15 GMT
Server:
Apache/1.3.0(Unix)
Last
modified: Sun,6 May 2010 09:23:24 GMT
Content-length:
5428
Content-type:
text/html

(data
data data data……)

Response
message has three sections:
(i)
an initial status line
(ii)six
header lines
(iii)entity
body

Status
Line:
The start line of HTTP response is called status line.
It
has three fields:
(1)The
protocol version field
(2)A
status code which indicates failure of the request
(3)A
corresponding status message
In
above example, status line indicates that server is using HTTP/1.1
which is protocol version .200 is the status code and OK is
corresponding status message which indicates that everything is OK.
Header
lines:
It
follows
the same structure as any other header for example, a
case-insensitive string followed by a colon (:) and a value whose
structure depends upon the type of the header. The whole header with
its value presents as a single line.

In
above example, server uses Connection: close header line and tell the
client that it is going to close TCP connection after sending
message.

In
above example, the Date header line indicates the time and date when
HTTP response message was created and sent by server. Here note that
it is not the when the object was created or last modified, it is the
time when server retrieves the objects from its file system and
inserts the object into the HTTP response and sends it.

In
above example the server header line indicates that the message is
served (generated) by apache web server.

In
above example, User-agent which is analogous by server is a response
header line.

In
above example, Last-Modified header line indicates the date and time
when the object was created and last modified. It is critical for
object caching.

In
above example, Content-Length header line indicates that the object
in the entity body is HTML text.

There
are so many response headers are available. We can divide it in some
several groups:

General
headers,
e.g., Via, 
which applies to the whole message.

Response
headers,
e.g., Vary
and Accept-Ranges,
which gives additional information about the server which doesn’t
fit in the status line.

Entity
headers
e.g, Content-Length,
which applies to the body of the request. Obviously no such headers
are transmitted when there is no body in the request.

Entity
Body:
The last part of a response message is the body. Not all responses
have one: responses with a status code, like 201
or 204,
usually don’t.

Bodies
can be divided into three categories:

Single-resource
bodies
: it contains a single file of known length. It is defined by the
two headers: Content-Type
and Content-Length.

Single-resource
bodies:
it contains a single file of unknown length.It is encoded by chunks
with Transfer-Encoding
set to chunked.

Multiple-resource
bodies:
it contains a multipart body in which each part contains a different
section of information. These are relatively rare.

HTTP
STATUS CODE:

HTTP
status codes are standard response codes given by the server on the
internet. It helps to identify the cause of the problem when a page
or other resources do not load properly. HTTP status code is the
server-side response in the form of 3-digit integer where the first
integer represents the class of response.
1xx
: Informational

The
status code in this class indicates that the request has been
successfully received and the process is continuing. Server may ask
to switch protocol or server successfully received the request
headers and may ask the client to continue with the request body.
This
class contains following status codes :-
100
: It means only a part of the request has been received by the server
but as it has not been rejected, the client should continue with the
request.
101
: It occurs when the server switches protocols.
2xx
: Successful
The
status code in this class represents that the action was successfully
received, understood and accepted.
This
class contains following status codes :-
200
: It means the request is okay.
201
: It means the request is complete and a resource is created.
202
: It means the request is accepted for processing but the processing
is not complete.
203
: It means the information in the entity header is not from the
original server but from a local or third party copy.

204
: It means a status code and a header are given in the response but
there is no entity body in the reply.
205
: It means the browser should reset the content of the form used for
the transaction.
206
: It means the server is returning the partial data.
3xx
: Redirection
The
status code in this class is used to inform the client that requested
URL has been moved to a different URL either permanently or
temporarily in order to complete the request.
This
class contains following status codes :-
300
: It means multiple choices and the user can select a link and go to
that location.
301
: It means the requested page has been moved permanently.
302
: It means the requested page has been moved temporarily.
303
: it means the requested page can be found under a different url.
304
: It means the url has not been modified since the specified.
305
: It means the requested page must be accessed through the proxy.
306
: It means the code is reserved and no longer used.
4xx
: Client Error
This
class status code is used to represents client-side errors or
restrictions while requesting for a page. That means either server is
unable to understand the client’s request or clients’s access is
forbidden to the requested page or it may require authentication
parameters to access the requested page. It indicates that the
request contains some incorrect syntax or cannot be fulfilled.
This
class contains following status codes :-
400
: It means the server did not understand the request.
401
: It means the requested page requires an authorization.
402
: It means one can not use this code because of payment required.
403
: It means access is forbidden to the requested page.

404
: It means the server can not find the requested page.
405
: It means the method in the request is not allowed.
406
: It means the server generates a response which is not accepted by
the client.
407
: It means a proxy authentication is required.
408
: It means request timeout.
409
: It means the request cannot be completed because of some conflict.
410
: It means the requested page is no longer available.
411
: It means the server will not accept the request without content
length.
412
: It means that the precondition given in the request is evaluated to
false by the server.
413
: It means the request entity is too large so the server will not
accept the request.
414
: It means the url is too long so the server will not accept the
request.
415
: It means the request is not accepted because media type is not
supported.
416
: it means the requested byte range is not available and is out of
bounds.
417
: It means the expectation failed because it could not be met by the
server.
5xx
: Server Error
The
status code in this class includes errors where the request for a
page is understood by the server but is incapable of filling it
because of some reason. It represents that the server is failed to
fulfil an apparently valid request.
This
class contains following status codes :-
500
: It means the server met an unexpected condition.
501
: It means the server did not support the functionality required.
502
: It means the server received an invalid response from the upstream
server.
503
: It means the server is temporarily overloading or down.
504
: It means the gateway has timed out.
505
: It means the server does not support the http protocol version.

x

Hi!
I'm Roxanne!

Would you like to get a custom essay? How about receiving a customized one?

Check it out