aWebVisit - Help

(from 01/May/2003:00:00:51 to 31/May/2003:23:55:57)
[ Back to Summary ]

1. Pages, links and visits in aWebVisit

Webserver logfiles contain all hits to your webserver over a certain period of time. aWebVisit assumes that all hits from the same host (within a certain timeout period) are part of the same visit (*).

For each visit, aWebVisit then determines which page was hit first (entry page) and which page was hit last (exit page). All other pages hit during a visit are called transit pages. Since visitors may go back to the same page several times during a visit, a single webpage on your website may be counted once as an entry page, several times as a transit page and/or once as an exit page.
A special case is where only one page is hit during a visit. That page is called a hit&run page.

During a visit, aWebVisit also analyses the path followed by the visitor from one page to the next. Links followed from entry pages are called incoming links, while links going to an exit page are called outgoing links. The other links followed during a visit are called internal links.
Again, a special case is where a visitor goes directly from an entry page to an exit page. That link is called an In&Out link.

The different types of visits distinguished by aWebVisit are shown in the table below :

Image hits are discarded, as specified in the configuration.

(*) Note that this assumption may not be correct if clients hit your website from behind proxy servers... Then you'll need more data to reveal the most common paths through your website, or expand aWebVisit to use other characteristics like cookies or session IDs.

Have a look at the homepage of aWebVisit for additional information...

2. Using fly to create graphics with aWebVisit

A picture says more than a thousand words (or numbers). Try the new graphical webmaps that aWebVisit can generate by downloading the fly program !

It's really easy to set up (and it's for free) :

  1. install the right version of fly (Windows 95/NT, various UNIXes, ...), and
  2. tell aWebVisit where to find it.
You'll see the difference...

Even better now is the companion program aWebVisit-Map. It's a CGI that allows you to walk through your website and follow the links from one page to the next one...

3. Terminology used in aWebVisit

Term Description
'image' Any URL that IS excluded by matching /\.(gif|jpg|css)$|^\/(cgi-bin|priv_stats|yourUrl|mail|mime)/i. This can be changed in the configuration.
'page' Any URL that is NOT excluded by the script. URLs with anchor points (http://...#..) are treated as separate pages, but URLs with parameters (http://...?..) are not.
Hit Count Number of times this page is visited ('hit')
Entry Page Number of times this page is used as an entry point to this website
Exit Page Number of times this page is used as an exit point from this website
Transit Page Number of times this page is part of an on-going visit rather than the entry or exit point of it
Hit&Run Page Number of times this page is the only one visited in a single session. This does not include any images that are excluded by the script.
Time (sec) Average time spent on this page (only for entry and transit pages). This includes download time, reading time, time spent viewing 'images' from this page, coffee breaks shorter than the timeout of 450 seconds, etc. That's why aWebVisit needs a sufficiently large sample to work on...
Rank (%) Rank in the Top (or Bottom) 20 pages or 100 links. The number of entries can be changed in the configuration.
Page The URL of the page
Link Count Number of times this link is followed on your website
Incoming Link Number of times this link is followed from an entry page to a transit page
Internal Link Number of times this link is followed between two transit pages, somewhere inside your website
Outgoing Link Number of times this link is followed from a transit page to an exit page
In&Out Link Number of times this link is followed from an entry page directly to an exit page
From Page The starting page of the link
To Page The destination page of the link

4. About aWebVisit


###########################################################################
#
# NAME
#
#	aWebVisit Version 0.1.7b, 10/01/2002
#
# AUTHOR
#
#	Copyright (C) 1999-2002, Michel Dalle (awebvisit@mikespub.net)
#
# DISTRIBUTION AND LICENSE
#
#	http://mikespub.net/tools/aWebVisit/
#
#	This program is free software; you can redistribute it and/or
#	modify it under the terms of the GNU General Public License
#	as published by the Free Software Foundation; either version 2
#	of the License, or (at your option) any later version.
#
#	This program is distributed in the hope that it will be useful,
#	but WITHOUT ANY WARRANTY; without even the implied warranty of
#	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#	GNU General Public License for more details.
#
#	You should have received a copy of the GNU General Public License
#	along with this program; if not, write to the Free Software
#	Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307,
#	USA.
#
# PURPOSE
#
#	Reads the Web server logfile(s) and extracts visitor information
#	like :
#		- the most commonly used ENTRY, TRANSIT, EXIT and HIT & RUN
#		  pages of your website
#		- the most frequently followed INCOMING, INTERNAL, OUTGOING
#		  and IN & OUT links of your website
#		- the average DURATION of each visit
#		- the average number of PAGES viewed PER VISIT
#		- the average TIME SPENT on each page
#		- the path of the LONGEST VISIT (in time and/or hits)
#		- ...
#
#	The visitor information is stored in HTML pages in a directory
#	of your choice (see configuration below).
#
#	A companion program called aWebVisit-Map allows you to click on a
#	page and follow all the links to and from that page (CGI).
#
# REMARKS AND LIMITATIONS
#
#	This script does not intend to provide "standard" web statistics like
#	the number of hits per day, the status codes, the distribution per
#	domain, etc. There are more than enough programs available for that !
#
#	The script accepts logfiles in the Common Log Format (CLF) or NCSA
#	Combined format. Other formats can be used if you modify the
#	appropriate fields in the script below.
#
#	Images (URLs of type .gif or .jpg) are normally ignored, except as
#	being part of an on-going visit. URLs with anchor points
#	(http://...#..) are treated as separate pages, but URLs with
#	parameters (http://...?..) are treated as a single page.
#
#	This script is not useful for sites having only a few "hits" per
#	day, since there must be a sufficient number of visits to extract
#	significant statistics. You might as well directly read the logfile.
#
#	It is also not intended for sites having millions of hits per day
#	or millions of pages on their website(s), since reading the logfiles
#	and generating the statistics can take some time (e.g. for a logfile
#	of 65 MB, aWebVisit takes about 11 minutes on a PC). There may be some
#	more professional packages supporting this type of websites.
#
#	Note that clients behind proxy servers cannot be differentiated based
#	on a web server logfile (unless you use authentication). This is not
#	a bad limitation if the number of log entries is sufficiently large.
#
#	You can change all this in the script below (and send me a mail).
#
# EXAMPLES OF USE
#
#	awebvisit.pl logfile
#	perl awebvisit.pl logfile
#	awebvisit.pl logfile.*Jan*
#	grep that_host logfile | awebvisit.pl
#	zcat logfile.gz | awebvisit.pl
#
# HISTORY
#
#	0.1.7b 10/01/2002 +Show barcharts (with fly)
#			  +Bug fix for 'account for n %' in Least XYZ tables
#
#	0.1.7a 10/01/2002 +Analyse referrers (combined/extended logfiles only)
#			  +Load historic data file (for cumulative statistics)
#			  +Keep distribution of visit length (in duration and
#			   steps) for median value
#			  +Keep path tree for aWebVisit-Map
#
#	0.1.6c 08/01/2002 Now available under GNU GPL license
#
#	0.1.6b 18/02/99	Minor bug fix for exclude_visit and include_visit
#
#	0.1.6 17/02/99	+Companion program aWebVisit-Map to travel through the
#			 pages and see the links to and from each page (CGI) !
#			+Modified contents of statistics file for aWebVisit-Map
#			+Configurable removal and/or replacement of some parts
#			 of URLs (parameters, anchor points, long paths, ...)
#			+Configurable exclusion/inclusion of visit entry-points
#			 (e.g. robots should start with '/robots.txt')
#			+Configurable exclusion/inclusion of hosts (networks,
#			 domains, ...)
#			+Configurable exclusion/inclusion of URLs
#
#	0.1.5 07/02/99	+Create graphical maps of entries, exits and transits
#			+Support visits that cross midnight (assuming days
#			 follow each other without gaps in the logfiles)
#			+Review table outputs
#			+Drastically reduce memory requirements for big files
#
#	0.1.4 31/01/99	+Separate links into incoming, internal, outgoing and
#			 in&out links
#			+Add status information
#			+Add 1st level flow map
#			+Rewrite statistics code
#			+First try at entry and exit trees
#
#	0.1.3 24/01/99	+Separate hit & runs from entry and exit points
#			+Add transit points
#			+Add rank percentages
#
#	0.1.2 23/01/99	+Add least used entry and exit points
#			+Streamline table outputs
#			+Add some explanations
#
#	0.1.1 20/01/99	Generate HTML output
#
#	0.1.0 18/01/99	First public version
#
# FUTURE
#
#	This is probably as far as it goes, unless you send your suggestions
#	and wishes to (awebvisit@mikespub.net)...
#
#	E.g. take into account status code changes during a visit.
#
###########################################################################


[ Back to Summary | Back to top ]
Created with aWebVisit 0.1.7 on Mon Jun 2 14:44:48 2003