Support Manual

Site Statistics
All hosting accounts come with
HTTP-Analyze preinstalled and configured. Professional
Urchin Stats can be added to any package from Com 1 and up for $10/month.
HTTP-Analyze is a log
analyzer for web servers. It analyzes the logfile of a web server and
creates a comprehensive summary report from the information found there.
http-analyze has been optimized to process large logfiles as fast as
possible.
In easier-to-understand
terms, HTTP-Analyze is a very powerful traffic analyzer that quickly
and efficiently delivers you statistics on the traffic that your web
pages have generated. It has a user-friendly graphical user interface
(GUI) that by a click of your mouse button will produce your traffic
reports.
View
screen shots of actual statistics reports
How It Works
The web server is a
program running on a networked machine, waiting for connections from
the outside world to serve certain documents on behalf of a request
by a browser.
To communicate, the
server and the browser use an asynchronous communication method called
the HTTP (hypertext transaction) protocol. It works as follows:
the user starts the browser
and types in an URL
the browser connects
to the given host and requests the specified document.
The web server handles
the request and sends out a response:
If this document exists,
the web server delivers it.
If it does not exist or if access is not permitted, the web server sends
back an error message instead.
The document delivered
as an answer to this request may contain inline objects. Inline objects
are simply URLs pointing to another resource, either a document, an
image, an applet, a video/audio stream, or any other addressable HTML
object.

The browser then requests
all inline objects of the current page from the server using the steps
2 and 3 above, before it can display the content of that page.
This communication
method is called asynchronous, because the browser sends out many requests
for inline documents at once (without waiting for a response from the
server before sending the next request) using different communication
channels:

Since the browser's
requests are often handled by different server processes or different
threads of a server process, there is absolutely no relationship between
the logfile entries caused by the responses from the server due to a
request of a document and it's inline objects.
For example, the order
in which the server logs the successful transmission of the document
itself and the inline images contained therein is not predictable and
depends on the type of documents, objects, server speed, system and
network load, and many other parameters.
What
is logged?
Each and every response
from the server - whether it indicates success, an error, or even a
timeout (i.e. no response) - gets logged in the server's logfile. Since
the server was hit by a request, such a response is called a Hit. In
other words, the total number of hits must equal the total number of
lines in the logfile minus the number of corrupt and empty lines. A
typical logfile entry in the Common Logfile Format looks like:
hostname-[01/Feb/1998:10:10:00
+0100]
"GET/index.html HTTP/1.0"200 4839
The hostname field
contains the full qualified domain name (FQDN) of the site accessing
your server (see ÈSpecial CasesÇ below). The next two fields usually
contain a minus (`-') to indicate that those fields are empty. The date
is surrounded by square brackets ('[' and ']'). The next field contains
the request. It contains the request method ('GET' for example), the
name of the requested document (URL), and the protocol specification
('HTTP/1.0').
The following field
contains the servers response code ('200' stands for an 'OK', while
'404' would mean 'Document not found', for example). The last field
contains the size of the document (some servers log the number of bytes
transferred actually, while other servers log the size of the document,
which makes a difference if the user interrupts the transfer before
the document could be transmitted completely.
There are two other
logfile formats, the Combined or Extended Logfile Format. Those formats
add the user-agent (browser type) and the referrer URL (the page, which
contains a link to the requested document if this request for such document
has been generated by following a link) to the logfile entry. Those
Combined or Extended Logfile Format append following two fields to the
Common Logfile Format (CLF) in one of two usual ways:
CLF Mozilla/2.0 (X11;
IRIX 6.3; IP22) http://foo/bar.html
CLF "http://foo/bar.html" "Mozilla/2.0 (X11; IRIX 6.3;
IP22)"
Note that in the second
form, the user-agent and the referrer URL are surrounded by double quotes,
which makes them ambiguous in certain cases such as erroneous referrer
URLs, which contain double quotes. Therefore, the first form should
be preferred if possible.
The entries shown above
are the only information the server records in the logfile. There might
be much more information being transferred from the browser to the server,
but although this additional information is available through CGI-scripts
running on your server, it gets not logged in the logfile. Therefore,
http-analyze can only show you a summary of the information in the logfile
- nothing more, nothing less.
Special
Cases
Caching in the browser:
As soon as a page has
been saved in a browser's disk cache, the browser might send out conditional
requests for documents or inline objects. This conditional request ask
the web server to only send a document/object if it has been modified
since the last time the page has been requested (if the page is still
in the browser's cache). This way, network traffic is reduced somewhat,
since documents must be transferred only if they have changed recently.
If such a conditional request arrives, the server will respond with
a Code 304 (Not Modified) status to indicate that the document
hasn't changed or with a Code 200 (OK) status if it has changed
in the meantime. Since the browser may be configured (and usually is
so by default) to only send out such conditional requests once per session
and otherwise unconditionally use the copy from the cache, you may not
even see a Code 304 response if this users visits your site again
in the same session. Conditional requests are then sent out only if
the user terminates the browser session and later restarts the browser.
Caching in a proxy
server:
Organizations with
a large number of users - such as companies, universities, or online
providers - often use a so-called proxy server for mainly two reasons:
- Often such organizations
have a firewall to protect their internal network against intruders.
This means, that their network is logically separated from the rest
of the Internet and that they have to use such a proxy server, which
is able to communicate with the inside and the outside of their local
network.
- To reduce network
load somewhat, the proxy server acts as a local copy machine: As soon
as a page is loaded into a browser through such a proxy server, the
proxy saves a copy of this page in it's disk cache much like a browser
does in the scenario above. This way, documents requested very often
by users in the same local network need to be transferred to the proxy
only once, which then answers future requests for the same page from
it's local cache instead of connecting to the original web server
the document originated from.
Both forms of caching
make it technically impossible to count visitors or to track their way
through your web site. All you see in the logfile of your server is
only a few initial hits from the proxy or browser and probably some
Code 304 responses resulting from conditional requests sent out
by the proxy or browser, depending on the preferences settings of the
proxy or browser.
Definition
of Terms
The statistics report
contains among others the following information:
the number of hits,
304's, files, pageviews, sessions, data sent (in KB)
the amount of data
requested, transferred, and saved by cache (in KB)
the number of unique
URLs, sites, and sessions per month
the number of all
response codes other than 200 (OK)
the average hits per
weekday and for last week
the maximum/average
hits per day and per hour
the number of hits,
files, 304's, sites, data sent by day
the top 5 days, 24
hours, 5 minutes and 5 seconds of the summary period
the top 30 most commonly
accessed URLs (hits, 304's, data sent)
the 10 least frequently
accessed URLs (hits, 304's, data sent)
the top 30 client
domains accessing your server most often
the top 30 browser
types
the top 30 referrer
hosts
the overview/detailed
list of all files requested
the overview/detailed
list of all sites by domain and reverse domain
the overview/detailed
list of all browser types
the overview/detailed
list of all referrer URLs
The following table
summarizes the meaning of all terms in the statistics report which are
not self-explaining:
| Term |
Color |
Meaning |
| Hits |
 |
A
hit is any response from the server on behalf of a request sent
from a browser. This includes any response from the server, not
only text files or documents. If, for example, a HTML page has
two images embedded, the server generates three hits if this page
is requested: one hit for the HTML page itself and two hits for
the two inline images. |
| Files |
 |
If
the user requests a document and the server successfully sends
back a file for this request, this is counted as a Code 200 (OK)
response. Any such response is counted for as a file. Again, "file"
here means any kind of a file. |
| Code
304 |
 |
A
Code 304 (Not Modified) response is generated by the server if
a document hasn't been updated since the last time it was requested
by the user and therefore there was no need to actually send the
files for this document. This happens if the browser (or a caching
proxy server between the browser and your web server) still has
an up-to-date copy of the page in it's local storage (cache) and
therefore can display the page without requesting the actual content.
This technique is used to reduce network traffic, but it also
causes an inaccuracy in the statistics reports regarding the number
of visitors, because the browser or proxy usually sends only one
such a conditional request per user session if it still holds
an up-to-date copy of the file. However, the ratio between files
and 304's reflects the efficiency of overall caching mechanisms
for at least those hits which made it's way to the server. |
| Pageviews |
 |
Pageviews
are all files which either have a text file suffix (.html, .text)
or which are directory index files. This number allows to estimate
the number of "real" documents transmitted by your server.
If defined correctly, the analyzer rates text files (documents)
as pageviews. Those pageviews do not include images, CGI scripts,
Java applets or any other HTML objects except all files ending
with one of the pre-defined pageview suffixes, such as .html or
.text. |
| Other
responses |
ÿ |
There
are much more responses than only Code 200 (OK) and Code 304 (Not
Modified) responses, especially in the coming standard, the HTTP
1.1 protocol specification. For example, the server could generate
a Code 302 (Redirected) response if a page has moved, a Code 401
(Unauthorized Request) response if access to the document is denied
or a Code 404 (Not Found) response if the requested page does
not exist on this server. |
| KBytes
transferred |
 |
This
is the amount of data sent during the whole summary period as
reported by the server. Note that some servers log the size of
a document instead of the actual number of bytes transferred.
While in most cases this is the same, if a user interrupts the
transmission by pressing the browser's stop button before the
page has been received completely, some servers (for example all
Netscape web servers) do not log the amount of data transferred
but the amount of data which would have been transferred if the
user would have completely loaded the page. |
| KBytes
requested |
ÿ |
This
is the amount of data requested during the whole summary period.
http-analyze computes this number by summing up the values of
KBytes transferred and KBytes saved by cache (see below). |
| KBytes
saved by cache |
ÿ |
The
amount of data saved by various caching mechanisms such as in
proxy servers or in browsers. This value is computed by multiplying
the number of Code 304 (Not Modified) requests per file with the
size of the corresponding file. Note: Because http-analyze can
determine the size of a file only if the file has been requested
at least once in the same summary period, the values for KBytes
saved by cache and KBytes requested are just approximations of
the real values. |
| Unique
URLs |
|
Unique
URLs are the number of all different, valid URLs requested in
a given summary period. This shows you the number of all different
files requested at least once in the corresponding summary period. |
| Unique
sites |
|
This
is the sum of all unique hosts accessing the server during a given
time-window . The time-window is hardwired to the length of the
current month. This means that if a host accesses your server
very often, it gets counted only once during the whole month.
Only the sum of the unique hosts per month is listed in the statistics
report. |
| Sessions |
 |
Similar
to unique sites, this is the number of unique hosts accessing
the server during a given time-window. This time-window is one
day by default for backward compatibility, but it can be changed
with the option -u or the Session directive in the configuration
file. For example, if the time-window is two hours, all accesses
from a certain host in less than 2 hours after the first access
from this host are lumped together into one session. All following
accesses more than 2 hours apart from the first access will be
counted as a new session. This way you may get an estimated number
of how many sessions are started on different sites to access
your server. |
1 shown
only on the total summary page.