Sie sind auf Seite 1von 375

White Paper

Seizing the Opportunity with Digital Analytics


and Marketing for Financial Services
Optimizing the Online Channel to Drive Revenue and Build Profitable
Customer Relationships
W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Contents

Executive Summary 3

The Changing Online Landscape in Financial Services 3


Challenges and Opportunities for the Relationship Oriented Business 3

Most Financial Firms Are ‘Novice’ Online Marketers 4

The Limitations of Siloed Tools 4

Harnessing the Power of Online Behavior Data 5

Maturity Model for Digital Analytics and Marketing for Financial Services 5
Optimize the Visitor Website Experience 7

Drive Targeted Site Traffic from Across Digital Channels 9

Bring Visitors Back to the Website 10

Capitalize on the Mobile Channel and Social Media 11

Maximize Customer Lifetime Value Across Channels 13

Coremetrics for Financial Services 14

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 2


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Executive Summary The Changing Online Landscape


in Financial Services
In financial services, the battleground for customers
is shifting to digital marketing. To win, financial institu- Online customer interaction with financial services
tions have been working to attract and retain custom- continues to grow strongly. One such example came
ers through a superior website experience. Yet, the from a study by the American Bankers Association
digital space offers many more opportunities today (ABA), that showed the percentage of retail banking
beyond just website centric marketing. Leading digital customers who prefer online banking to branches,
marketers engage customers with personalized email, ATMs, phone and other channels rose to an all-time
targeted display and search ads, mobile marketing and high of 36 percent in 2010, up from 25 percent in
social media. 2009.1 Just 25 percent of consumers prefer branches,
15 percent ATMs, 8 percent snail mail, and 6 percent
These opportunities and challenges confront firms
telephone.
across the range of financial subverticals—retail banks,
credit unions, lenders, brokerages, insurance firms, This represents a reversal from just two years earlier—
and other institutions. They are only growing in im- when Internet banking ranked third in popularity in the
portance as consumers increasingly use web-based ABA’s annual study—and illustrates that online usage
financial services, and as younger, web-savvy Gen Y of financial products is still very much on the rise.
consumers broaden their usage of financial products And though banks, brokerages, insurers and others
and services. have developed robust transactional systems, many
remain behind the curve with web analytics and digital
In viewing real-world use cases of firms that have
marketing.
worked with Coremetrics, an IBM Company, learn how
financial services firms can capitalize on the full range
of digital opportunities for turning site visitors into high “As consumers increasingly research and apply
value customers and loyal advocates. By fusing analyt- for financial products online, financial institutions
ics with digital marketing automation, leading marketers must improve their ability to identify and convert
are able to orchestrate a personalized and compelling prospects over the web,” said a report by the
experience throughout each customer’s digital lifecycle. Aite Group2, a financial services consultancy.

Challenges and Opportunities for the


Relationship Oriented Business
Recognition is growing in financial services that online
engagement and marketing is essential to countering
several industry trends and achieving key strategic ob-
jectives of increasing revenue, building customer loyalty,
and reducing costs. Consider:

• Loyalty is down, churn is up. Retail banking


customer loyalty and brand perceptions have
decreased for four consecutive years, a study by
J.D. Power & Associates found3, while customers
are switching banks at an increasing rate. In 2010,
8.7 percent of customers changed banks, up
substantially from 7.7 percent in 2009, J.D. Power
reported.4

1
American Bankers Association, “ABA Survey Shows More Consumers Prefer Online Banking,” press release, October 2010.
2
Aite Group, “Online Marketing Maturity Model for Financial Institutions,” press release, February 28, 2011.
3
J.D. Power & Associates, “2010 US Retail Banking Satisfaction Study,” June 2010.
4
J.D. Power & Associates, “2011 US Retail Bank New Account Study,” March 2011.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 3


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

• Cost reductions are essential. Further cultiva- That assessment is corroborated in a study of 154
tion of online customer interaction and trans- banks and credit unions by The Financial Brand, an
actions is essential for firms to reduce the high online portal focused on financial services marketing.
operational costs of physical offices and call The greatest percentage—45 percent—of respondents
centers. Maturation and refinement of online characterized their firms at the “novice” level of online
transactional systems, and digital marketing of marketing, and only 8 percent claimed “advanced.”7
those services, is becoming a higher priority for Moreover, 65 percent of those engaged in online
all subverticals in financial services. marketing replied “not really” or “sometimes” when
• Mobile devices surge in adoption. Firms are asked if they tracked its effectiveness.
challenged to deliver a satisfying experience for
Which Best Characterizes Your Firm’s
mobile users and optimize websites with mobile-
Online Marketing Efforts?
friendly transactional functionality as the use of
smartphones grows exponentially. One example:
The percentage of bills paid by mobile devices is Novice 45%
expected to surge 377 percent from 2010 to 2013,
the Aite Group predicts.5 Intermediate 43%

• Customers demand relevancy and a multi- Advanced 8%


channel experience. Even though online financial
Non-existent 4%
transactions are increasing, consumers still want
access to traditional in-person and call center
channels. Firms benefit from replicating one-to- Figure 1. Source:
one interactions across all channels to increase The Financial Brand, 2010 Online Marketing Study
customer satisfaction and take advantage of
marketing opportunities at every touch point.
The Limitations of Siloed Tools
Studies have confirmed that online financial services Marketers’ experience with digital channels aside,
customers are more affluent, more profitable, more another great obstacle that financial firms face for
loyal, more satisfied, and more inclined to invest improving digital marketing effectiveness is the use
in multiple products and services. Building strong of standalone, disjointed tools dedicated to particular
relationships with online consumers is a key to building tasks and teams ranging from analytics to online media
a profitable financial services business. buyers, and customer marketing. The lack of integra-
tion among these technologies can make it virtually
Most Financial Firms Are ‘Novice’ Online Marketers impossible for firms to realize the single view of cus-
“Although many financial institutions have been using tomers and prospects they need to deliver a consistent,
the online channel for marketing purposes for 10 or personalized experience across digital channels.
more years now, many—by their own estimate—have “Each marketing technology—web analytics, email,
a long way to go in developing strong online marketing search—generates its own data, and lots of it. Data
capabilities,” says the previously referenced report by systems for each of these technologies may produce
the Aite Group.6 conflicting results, leaving marketers to explain data
mismatches,” states the independent research firm
Forrester Research, Inc.8 Worse than data mismatches,
unintegrated tools can generate misleading results
that make it impossible to make informed decisions on
marketing spend and channel focus.

5
Aite Group, “How Americans Pay Their Bills,” October 2010.
6
Aite Group, “Online Marketing Maturity Model for Financial Institutions,” press release, February 28, 2011.
7
The Financial Brand, “2010 Online Marketing Study,” August 2010.
8
Forrester Research, “Organizing for Site Optimization,” August 10, 2010.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 4


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Though counterproductive, an array of ad hoc tools is The two axes of the growth model for financial services
commonplace among financial institutions. Most were in Figure 2 are 1) the granularity with web analytics
deployed in haste to meet tactical needs in discrete and 2) the degree of integration across digital market-
areas, instead of planning for an integrated digital expe- ing channels. The more granular and integrated, the
rience for customers that would support the strategic more opportunities can be seized for digital marketing
marketing goals of a relationship oriented business. financial services.
Recognizing the risks and limitations of these disparate
Two Uses of Web Analytics:
technologies is a key first step to eliminating the defi- Aggregate and Individual-Level Insights
ciencies they introduce into a firm’s web analytics and
Web analytics is the practice of monitoring and mea-
digital marketing initiatives.
suring customer behavior online, both website usage
Progressive marketing, business, and IT leaders at and marketing campaign response. It covers a broad
financial services firms are transitioning to an integrated scope that can include analyzing and understanding:
platform for web analytics and digital marketing, built
• Drop-off points in a customer’s online application
with industry-specific features and metrics for financial
for a financial product
services. A unified platform can offer a decided advan-
tage in achieving key business objectives: 1) optimize • Response to email or online advertising by
the visitor website experience 2) drive targeted site clickthrough, conversion, application value,
traffic from across digital channels 3) turn site visitors other metrics
into repeat visitors and loyal advocates 4) capitalize on • Campaign effectiveness by segments of
the mobile channel and social media, and 5) maximize customers that can vary by asset level, gender
customer lifetime value across channels. and age, geographic information, and more
In sum, there is great opportunity for financial services • Consumer usage of the mobile channel and
marketers who can rise above the challenges and keep social media and optimizing for those mediums
pace with the leaders in digital today by harnessing the
power of online behavioral data fused with marketing Conventionally, web marketers use web analytics at an
automation technologies across digital channels. aggregate level, reporting on the performance of their
websites and online advertising, so they can adjust
Harnessing the Power of Online Behavior Data their efforts to improve the results. This is an extremely
worthwhile application that can deliver excellent return
Financial services companies collect huge volumes of on investment.
data on the online behavior of their customers, as well
as back-office data on accounts, transactions, cus- However, if marketers do not also leverage web analytics
tomer profiles, and more. Putting this data to use to as a rich source of insights on the digital journeys
increase revenue and deepen customer relationships of individual prospects and customers, they are
requires the synergistic use of web analytics and digital squandering a huge opportunity. Web analytics can
marketing technologies. It demands the development play a far more direct role in engaging customers,
of customer behavioral insights that reflect historic and improving customer experiences, and increasing sales
real-time activity, such as an online application for an by enabling companies to deeply personalize their
insurance policy, brokerage account, or home loan. communications and interactions.

Maturity Model for Digital Analytics and Marketing Two Levels of Digital Marketing: Website-Centric
for Financial Services vs. Integrated Across Digital Channels
For evaluating the range of opportunities that financial Online marketing has traditionally been website-centric.
services have with the synergistic use of web analytics The website has been the portal for all digital inter-
and digital marketing technologies, it helps to think actions and the use of online advertising and email
along the lines of a maturity model, i.e., growth path. marketing has primarily been used for the purpose of
driving traffic to the website.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 5


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Today, financial services are increasingly investing in 1) Optimize the visitor website experience through
mobile applications as another portal for customer measurement, testing, and constant optimization
interactions. Likewise, digital marketers can target of website and online advertising
emails and display ads that are personalized based on 2) Drive targeted site traffic from across digital
the preceding behavior of each individual prospect channels by targeting relevant ads and emails
or customer on the website or in a mobile application. based on insight into previous behavior of
As such, ads and emails become an off-portal con- anonymous or registered visitors
tinuation of on-portal experiences. Finally, customers’
3) Turn site visitors into repeat visitors by bringing
conversations on external social networks and
them back to the website by re-targeting through
employees’ interactions on internal social networks
display ads or email, by using insight into
further decentralize where the action is.
previously abandoned products, etc.
To make the step from conventional online marketing 4) Weave the mobile channel and social media
to today’s digital marketing means to orchestrate a into a cross-channel experience
consistent and compelling experience across digital
5) Maximize customer lifetime value across
channels, i.e., on- and off-portal. To create this experi-
channels and customers’ lifecycles by extending
ence, requires fusing analytics with digital marketing
digital marketing to offline channels
automation.

Maturity Model with 5 Digital Marketing Milestones Read on to learn real-life use cases and opportunities
for each of these milestones.
Five major milestones for financial services marketers
are plotted across the maturity model in Figure 2, to
indicate the granularity with web analytics and integra-
tion of digital channels required to capitalize on each
milestone opportunity.

Integration across
Digital Channels

On- and Maximize Customer


Lifetime Value Across Channels
Off-Portal
Integrated Capitalize on the Mobile Channel
and Social Media

Bring Visitors Back to the Website

Drive Targeted Site Traffic from


Across Digital Channels
Website
Centric Optimize the Visitor Website Experience

Granularity with
Web Analytics
Aggregate Level Insight into Individuals’
Dashboards, etc. Digital Journeys

Figure 2.
Maturity model for digital marketing in financial services.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 6


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Optimize the Visitor Website Experience The equation is further complicated by the increasing
Financial service websites—arguably a firm’s most numbers of practical tools and educational resources
powerful marketing vehicle—are uniquely complex. that financial firms offer in efforts to deepen customer
Compared to relatively straightforward sites run by engagement. These resources can include personal
retailers, the websites run by banks, insurers, broker- financial management microsites, investment guid-
ages, and credit unions offer a broader, more complex ance and tools, calculators for lending and insurance,
array of products and services. A large, diversified and others. Though their intent is sound, such initia-
financial services institution will offer dozens of prod- tives introduce the risk of a cul de sac and consumer
ucts and services across multiple lines of business that distraction from the firm’s objective. Simply posting a
can include retail banking, investing, lending, insurance, 401K primer guide or a student loan planning tool and
and more. neglecting to analyze its usage will limit its value.

This is a double-edged sword: While a rich website Systematic measurement is essential across a firm’s
offers firms a wealth of opportunities to cross-sell and website. Ideally done with an integrated analytics and
up-sell across lines of business, it also introduces the content management system platform such as
daunting task of optimizing multiple elements to deliver IBM® WebSphere® and supported by A/B testing to
a compelling cross-channel customer experience and determine the most effective techniques, web analytics
drive business. can deliver impressive results. A study by Forrester
Research found that testing site features, design and
It also means a perpetual balancing act, as firms must creative elements, navigation paths, and content place-
offer the advanced functionality that experienced users ment improved performance by multiple metrics.10
want, while making the site practical and inviting for (See Figure 3).
new users.

Forrester Research noted the converse effect of greater What Top 3 Benefits Have You Realized
from Site Testing?
functionality and diminished usability: “As the largest
banks continue to bring ever deeper functionality to
their secure sites, it is clear that usability has suffered Increased
82.3%
along the way.”9 conversion

These multiple variables make continuous website Increased landing


54.4%
measurement, testing, and optimization, the traditional page conversion
focus of web analytics technologies, an imperative. Increased
Firms are challenged to: 35.4%
form completion

Increased
• Continually test and tune site usability and secure 32.9%
average order value
transactional functionality and ensure clear calls to
action and conversion paths across product lines Increased
20.3%
• Collect and segment site user data to understand registration
which website experiences result in increased Improved customer
customer satisfaction and value over time 17.7%
segmentation

Base: 79 respondents who use


an online testing application
Figure 3. Source:
Forrester Research, Organizing for Site Optimization

9
Forrester Research, “2010 US Bank Secure Web Site Rankings,” November 11, 2010.
10
Forrester Research, “Organizing for Site Optimization,” August 10, 2010.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 7


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

An important principle in website optimization is to behavior. This was noted by Observed Online Financial
take nothing for granted. Experience has shown that Innovations, publisher of OnlineBankingReport.com—
customer behavior will inevitably surprise even the “Because online initiatives don’t have the proven track
most seasoned online marketer. Rigorous practices record of other marketing techniques, it’s important to
and a comprehensive analytics platform are the only measure every conceivable metric to demonstrate the
sure way to ensure you truly understand customer value of the online channel.”11

Web Analytics Business Use Case Examples

IBM Coremetrics has helped many financial services firms optimize their websites.
Below are three examples.
1) Account Application Completion
Challenge
• Low application completion rates despite site redesign
Solution
• Used Coremetrics Scenario Analysis to identify points of abandonment
• Optimized form field placement and eliminated unnecessary content
• Moved complex legal content to end of process
Results
• 29% increase in application completions

2) Increase Self-Service and Application Conversion


Challenge
• Reduce call center and live chat costs by promoting better self-service
Solution
• Used IBM® Coremetrics® Web Analytics to compare success with self-service vs. call/chat assistance
• Pinpointed problem area in key application steps
Results
• Increased online-only self-service conversions
• Reduced costs for call center and live chat
• Discovered online conversions are worth same if not more than call-chat assisted

3) Screen Resolution Optimization


Challenge
• Increase form and application initiation; increase cross-sell on bill pay
Solution
• Used IBM® Coremetrics® LIVEview and screen resolution reporting/segmentation
• Determined real estate used for cross-sell on account pages were outside standard screen resolutions
• Relocated cross-sell promotions
Results
• 20% increase in clickthrough and conversions
• Saved thousands of dollars in call center costs

11
Online Financial Innovations (OnlineBankingReport.com), “2009 Planning Guide.”

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 8


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Drive Targeted Site Traffic from • Cross-sell and up-sell products and services
Across Digital Channels • Increase usage of cost-effective self-service
As financial services move online, marketing of those transactional tools
services needs to move online as well. But there’s a • Reduce costs for branches, call centers,
huge gulf between simply running generic banner ads and online chat
and executing coordinated, cross-channel campaigns
• Enhance brand image and increase consumer
based on analysis of historical customer behavior;
mindshare
including both website activity and response to online
campaigns. In fact, research indicates that many
financial services firms have yet to take advantage of Digital marketing automation tools should interoperate
online marketing. closely with an analytics solution to drive continuous
measurement, targeting, and optimization: Consumer
Though email marketing has been around for a decade, response is collected by the analytics platform to enable
it is not used by a surprisingly high 31 percent of banks further refinement. Digital marketing tools include:
and credit unions surveyed by The Financial Brand.12
Usage of display ads, paid search, and Facebook
• Personalized email marketing: Email is a proven
pages were also lower than the norm in consumer-
means of communicating with customers and
oriented industries, the survey found. (See Figure 4).
deepening engagement. The best solutions will
integrate with an analytics platform to automate
Which Online Marketing Tactics
emails to select customer segments–those aban-
Does Your Firm Use?
doning application forms, in the market for a loan,
receiving paper statements in the mail, and more.
Email marketing 69%
• Targeted display ads: Display ads targeted to,
Off-site display advertising 54% especially anonymous, consumers’ known
On-site promotions 53% interests and website activity generate far higher
click through and conversion rates than generic
Facebook page 46% banner ads, and at less cost than paid search ads.
Paid search advertising 36% Display ads may be launched based on a
consumer’s browsing at your website in near
Figure 4. Source:
real-time, syndicated across multiple ad networks.
The Financial Brand, 2010 Online Marketing Study This does not require sharing any personally
identifiable information outside the business.
The stakes are high, especially as young Gen Y con- • Paid search advertising: Paid search, some-
sumers, who grew up with the web, look to acquire times called pay per click (PPC) advertising,
insurance, car loans, credit and checking accounts, presents ads based on keywords that consumers
and more. Unless you can engage a young consumer enter into a search engine. The ideal system offers
in ways relevant to his or her digital lifestyle, you’re apt flexibility in campaign creation and management,
to be disregarded as a brick-and-mortar dinosaur. cost-effective keyword bidding, and tuning and
optimization based on real-time results tracking.
Excellence in digital marketing depends fundamentally
on an advanced and integrated analytics platform that • On-site recommendations: Personalizing con-
creates and evolves individual customer profiles com- tent and marketing offers at your site to a custom-
prising the consumer’s digital journey of interactions er’s known interest triggers additional business.
with your website and digital marketing efforts. Digital For instance, if a customer browsed car loans
marketing has been shown to generate double-digit during her last visit, recommendations technology
and even triple-digit ROI by enabling firms to: prominently displays a car loan offer when she
returns. Advanced algorithms automatically gener-
ate intelligent recommendations more effectively
than possible by manual coding. To the degree
12
The Financial Brand, “2010 Online Marketing Study,” August 2010.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 9


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

that a customer is authenticated and logged


into the secure section of the financial services
website, recommendations can take into account
the customers’ CRM record, e.g. the fact that the
client already carries other car loans with the bank.

Digital Marketing Business Use Cases


Coremetrics has helped many financial services firms increase revenue through digital marketing.
Below are two examples.

1) Up-Sell for Lost and Stolen Cards


Challenge
• Up-sell fraud prevention service to customers who reported lost or stolen cards
Solution
• Used IBM® Coremetrics® AdTarget and IBM® Coremetrics® Intelligent Offer to display promotions with
targeted ads and on-site recommendations
• Also targeted new card customers
Results
• 23% increase in purchases of monthly fraud protection
• Increased fraud prevention sign-up by 30% among new card holders

2) Drive Site Traffic


Challenge
• High paid search advertising costs and decreasing rates of return on ad spend
Solution
• Engaged with Coremetrics Search Agency Services to integrate and optimize search marketing
• Leveraged granular analytics and search optimization solutions to eliminate low quality keywords,
optimize ads, and attribute keywords to conversions
Results
• Increased site traffic via increased clickthrough on optimized search ads
• Reduced ad spend by 50%

Bring Visitors Back to the Website


The more time that customers spend on your website, of U.S. online adults who own insurance had not visited
the better. It means they’re more likely to take advan- their insurer’s website in a year.13 Firms clearly have
tage of your self-service tools. They’re less likely to ample room for improvement.
phone your call center or visit a branch or snail-mail
Retargeting is the practice of using targeted display
a payment. They’re more likely to invest in additional
ads and personalized emails to engage with customers
products and services
after they’ve left your site. Retargeting is increasingly
Encouraging frequent site visits is a sound strategy in use among retailers to recapture shopping cart
sure to pay dividends. Yet many firms neglect to take abandoners and website departees, and it offers to
advantage of the opportunity. In insurance, for instance, financial services a powerful tool for re-engagement
a report by Forrester Research found that 55 percent and providing a personalized, relevant experience.

13
Forrester Research, “Increasing Online Insurance Self-Service Adoption,” February 8, 2010.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 10


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Retargeting uses a digital marketing platform and Capitalize on the Mobile Channel and Social Media
cookies that uniquely identify the visitor’s computer The emergence of powerful, feature-rich smartphones
and the pages browsed by that visitor. For anonymous and tablets has made the development of mobile appli-
visitors, re-targeting enables display ads (broadly cations and optimization of websites for mobile device
syndicated via ad networks). For registered visitors with usage of critical importance. Mobile usage of online
permission to market, retargeting also uses personal- banking and other financial services is in a growth
ized email. Messages can zero in on the user’s exact spike, with 54 percent of U.S. mobile users expected to
interest—if the visitor was browsing auto insurance, the conduct transactions with devices by 2015, a study by
ad can promote auto insurance. Mercatus found.14 In fact, Mercatus predicts that more
For a relationship-oriented business, retargeting offers consumers will use mobile banking than online banking
much greater opportunity than mere completion of by 2015.
abandoned online applications. Several other examples Yet financial services firms stand to be frustrated by
of retargeting in financial services, across a customer the mobile channel unless firms adopt a strategic
digital lifecycle, are to: approach to aggressively evaluate opportunities, track
• Offer a cross-sell or up-sell product or service mobile usage at a granular level (e.g., device type,
applications accessed, visit duration, location, operat-
• Promote new/updated account management tools
ing system), and optimize the website and marketing
• Offer online-only or seasonal promotions efforts to delight customers. “Despite increasing activ-
or features ity and more strategic spending, inconsistent data and
• Highlight online statements and bill pay analytics will plague mobile marketers hoping to make
to a new checking account customer a business case for testing emerging opportunities,”
• Reinforce customer-centric brand image Forrester Research observed.15

A firm’s ability to understand how mobile users inter-


One distinct advantage of retargeting is cost-effective- act with its website and mobile applications is the first
ness. Targeted display ads and emails cost substantial- step in pinpointing areas in need of improvement. With
ly less than paid search ads on Google, Bing, or other bounce rates typically 10 percent higher with mobile
search providers, and enjoy immediacy by communi- devices than computers, application optimization for
cating with the customer shortly after his or her website small screens and easy navigation are essential. Deep
departure. Beyond recent site departees, retargeted and incisive web analytics that present high-level
communications can also be aimed at dormant indi- reporting with drill-down into granular detail should be
viduals who haven’t visited your site in a while. a fundamental element of a mobile strategy.
Retargeting is made possible by an integrated web A prime area for web analytics is to assess usage
analytics and online marketing platform that tracks a by handset types and operating systems, including
user’s web activity not only in real-time but also over Google Android, Apple iOS, RIM BlackBerry, and more.
the course of the customer’s lifecycle. The analytics While a shotgun approach may seem easiest to ensure
group prospects and customers into different seg- you hit the right mobile audience, benchmarking de-
ments of opportunity. The analytics can also inform vices and platforms by usage enables you to prioritize
digital marketers on what the most successful products efforts to optimize mobile applications for individual
or campaigns were for moving customers to the next platforms.
lifecycle stage. Integrated digital marketing automa-
Beyond the website, search and display advertising on
tion then helps create and deliver targeted display ads,
mobile devices offers firms another means of engag-
emails, or website personalization. Organizations using
ing customers. To make the most of this opportunity,
retargeting typically report far greater success than
financial services marketers can and should leverage
those with the use of generic display ads or promo-
campaigns designed for conventional online users,
tional emails.
optimized for the mobile audience.
14
Mercatus, Consumer Mobile Financial Services Study, December 2009.
15
Forrester Research, “2011 US Mobile Marketing Predictions,” January 4, 2011.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 11


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

To the extent that customers are registered and authen- Opportunities in social media. In financial services,
ticated on both their mobile applications and the social media marketing will not become the force that it
website, the digital marketing opportunities previously is in other consumer-oriented industries until financial
discussed for personalized advertising and retargeting services come up with ways to make the channel work
extend to the mobile channel as well. Ads, offers, and to their unique situation. Many people are simply
recommendations shown within the mobile application disinclined to “like” a bank or lender. Even though they
can be selected based on analytical insight into each have many millions of customers, large diversified
customer’s past and current interactions so that they financial institutions typically have only several
are as relevant as possible. thousand Facebook fans.

In fact, less than half (46 percent) of the 154 banks and
credit unions surveyed by The Financial Brand have a
‘The End of Credit Cards’
Facebook page.17 Just 35 percent use Twitter,
Even as financial services firms strive to optimize 25 percent YouTube, and only 8 percent offer an online
mobile applications and capitalize on opportuni- discussion forum. But that’s not to say that social
ties with the mobile channel, they face a looming media is not a worthwhile investment—given its low
challenge in credit card usage. cost of entry and that younger generations have been
In the next few years, consumers will increasingly weaned on social media, social media can be an asset
purchase goods by scanning a smartphone at a that poses nominal risk. And though financial services
register rather than swiping a credit card. Called has been slow to embrace social media, that appears
contactless mobile payments, this transaction likely to change—in 2012, 90 percent of organizations
channel is expected to grow from “practically surveyed by the Aite Group said they would have
none” in 2010 to $22 billion by 2015, according dedicated social media funding in place.18
to the Aite Group in an article, “The End of Credit
Credit unions and brokerages are among firms making
Cards.”16
use of social media. Facebook pages provide an
Google has partnered with financial institutions opportunity to promote community events, highlight
and cell phone providers to roll out a test service special offers, engage investors with stock-picking
in 2011 that puts a consumer’s financial account contests, and generally enhance brand image.
information on a smartphone near-field com- As with other digital interactions, the key is to employ
munications (NFC) chip. Using smartphones at analytics that enable you to understand the value of
ATMs is also in the pipeline. This phenomenon your social media followers. Tools that can correlate
poses major implications for financial services typical interaction with your social media vehicles to
and underscores the need for speed and agility subsequent business activity are essential to deriving
to react swiftly to unexpected challenges and that value, and to plotting your social media strategies.
opportunities.

16
CNNMoney.com, “The End of Credit Cards is Coming,” January 4, 2011.
17
The Financial Brand, “2010 Online Marketing Study,” August 2010.
18
American Banker, “Facebook, Twitter Become Online Banking Attractions,” December 28, 2010.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 12


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Social Media Marketing Business Use Case

Coremetrics helps financial services firms leverage social media for customer engagement and brand
enhancement. One example:
Challenge
• Improve brand perception of financial institution
Solution
• Launch Facebook page focused on community involvement and financial services benefits
• Provide social forum for customer questions, comments
Results
• Social channel referrals had higher loan and account sign-up rate than more expensive paid search
and email channels
• Customers driven to site through Facebook are fastest to add second product to portfolio

Maximize Customer Lifetime Value


Across Channels
In a relationship-oriented business, encouraging Understand lifecycle progression. One way in which
customers to expand their portfolios of products and web analytics should help a relationship oriented busi-
services is vital to success. But cross-sell and up-sell ness build lifetime value is to provide insight into how
can be difficult when your offerings span multiple lines customers progress through long-term product and
of business, your customer interacts with you across service acquisition cycles. Figure 5 shows an analytical
multiple channels, online and offline, and you’re not approach that is based on segmenting visitors by life-
sure exactly who your site visitors are. cycle milestones, which could range from a new check-
ing account client who just registered for self service,
Using the digital channel to build lifetime customer
to a VIP customer holding a mortgage, auto loan, 401k
value requires recognizing customers when they inter-
account, insurance, and more. The most recent innova-
act. For instance, your site needs to distinguish whether
tions in web analytics provide insight into behavior over
a visitor is a wealthy senior citizen exploring CD rates or
time and thus enable you to explore trigger events that
a high school senior looking for a student loan. Pre-
contributed to moving customers from one stage in the
senting the right offer to the right customer at the right
lifecycle to the next, e.g., by portfolio expansion. By
time can repay itself many-fold if you gain a customer’s
learning what works and what doesn’t, digital market-
loyalty through superior service, relevant content, and a
ing can be tailored to each lifecycle segment to speed
personalized experience.
the progression from one milestone to the next.
Beyond that, you need to continuously monitor and
adapt customer online behavioral profiles to reflect a
customer’s current stage in their lifecycle (e.g. on-
boarding vs. retention), their needs and interests.
Through multichannel integration, a web analytics plat-
form can enable customer segmentation, and micro-
segmentation, by a variety of attributes such as income,
assets, gender, family relationships, product affinity,
and more. These profiles mature over time to reflect
responsiveness to marketing campaigns, helping you
determine which techniques to use for a given segment.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 13


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Figure 5. Lifecycle analytics for tracking long-term conversion cycles.

Leverage multichannel data. Many financial firms Coremetrics for Financial Services
have built customer relationship management (CRM) Coremetrics, an IBM Company, offers a set of solutions
and data warehousing systems based on customer suited specifically for the financial services industry.
transactional data and activity in offline channels. Con- Through the fusion of web analytics and digital market-
solidating data from online activity with other channels ing automation, Coremetrics empowers marketers to
provides a complete multichannel view of the customer turn site visitors into repeat customers and loyal advo-
that can sharpen both online and traditional marketing. cates by orchestrating a personalized and compelling
Extend digital marketing to offline channels. The experience throughout each customer’s digital lifecycle.
insight gained into customers’ digital journeys can and To achieve this, Coremetrics tracks customers and
should help make offline interactions more relevant and prospects as they interact with a business’ online pres-
helpful to customers. By extending web analytics and ence providing marketers with a comprehensive view
digital marketing into cross-channel marketing auto- into how consumers are interacting with their brands
mation, offline marketing interactions can be rendered online over time and across channels. This unique
more relevant. For example, customers calling into the insight is used to automate real-time personalized
call center can receive an offer that takes into account recommendations, email targeting, display ad target-
their previous website activity, e.g., that they browsed ing across leading ad networks, and search engine
mortgage pages but didn’t make an application. bid management—delivered to customers through any
digital vehicle including social, mobile, and web.
The long-term, even lifelong, relationship that many
customers have with their financial services providers
ups the ante. If you lose a customer because of a com-
petitor’s more effective marketing, you haven’t lost just
a single sale. You’ve lost years of value with that cus-
tomer, and the opportunity to extend that value across
generations as youngsters often select a financial firm
based on their parents’ guidance.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 14


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

Through a simple integration with other systems of Coremetrics LIVEmail


record such as CRM, advertising, and demographics, The IBM® Coremetrics® LIVEmail solution gives financial
Coremetrics enables marketers to further enhance marketers the flexibility to automatically deliver emails
not only their online marketing channels, but offline to customers based on specified scenarios. For
channels as well. Comprising the Coremetrics Digital instance, LIVEmail may be configured to generate
Marketing Optimization Platform, our financial services personalized emails to individuals who abandoned a
solutions include: loan application, or to send offers for fraud overdraft
protection to select customer segments. It features
Coremetrics Reporting and Analysis
prebuilt best practice metrics for financial services and
Coremetrics Web Analytics is our base analytics integrates with email service providers.
solution, enabling financial marketers to track website
activity and cross-channel campaign performance. Coremetrics Search
IBM® Coremetrics® Explore is an ad hoc reporting IBM® Coremetrics® Search is a pay per-click (PPC)
solution that provides a complete picture of visitor and management application that improves top-line busi-
customer behavior, enabling marketers to automati- ness performance, reduces operational costs, and
cally execute campaigns, drill down into granular detail, enables data-driven optimization of search advertising
and create precise and actionable customer segments. initiatives. It makes it easy for financial marketers
IBM® Coremetrics® Impression Attribution allows to identify top-performing keywords, automate and
marketers to understand how unclicked impressions fine-tune keyword bidding, and integrate with leading
(display ads, videos, social media presences, etc.) search providers.
affect behaviors and conversions.
Coremetrics Social Analytics
Coremetrics Lifecycle
IBM® Coremetrics® Social Analytics enables marketers
IBM® Coremetrics® Lifecycle provides visibility into to treat social media as another marketing channel
unique, event-driven customer segments that equip and measure ROI and engagement accordingly.
marketers with the most effective tools to cultivate high- Coremetrics Social Analytics provides online marketers
value customers. Marketers can track and understand with a centralized console for analyzing social media
how customers progress through long-term conversion channels and campaigns. Its seamless integration with
lifecycles. A lifecycle is characterized by milestones the Coremetrics platform allows users to track social
ranging for example from first-time visitors to registered channels alongside other online marketing channels to
self-service customers who may then proceed to inter- help understand the performance (ROI) of campaigns.
act with financial calculators, open additional accounts,
or purchase additional financial products, and more. Coremetrics Benchmark
IBM® Coremetrics® Benchmark provides aggregated
Coremetrics AdTarget
and anonymous competitive data for financial services’
Coremetrics AdTarget is a data syndication platform key performance indicators and is offered to customers
and online marketing application that enables targeting at no additional cost. Coremetrics Benchmark covers
and personalization of display advertising. Customized over 50 metrics vs. peers spanning marketing, social,
for financial services, it leverages granular visitor mobile, technographic, and site engagement and
activity captured by Coremetrics to deliver highly conversion metrics.
relevant display ads that increase visitor reacquisition,
power cross-sell and up-sell, and promote customer
use of self-service tools.

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 15


W h i t e Pap e r Seizing the Opportunity with Digital Analytics and Marketing for Financial Services

About Coremetrics®, an IBM Company

Coremetrics®, an IBM Company, a leading provider of web analytics and marketing optimization solutions helps
businesses relentlessly optimize their marketing programs to make the best offer, every time, anywhere, automati-
cally. More than 2,100 online brands globally use Coremetrics Software as a Service (SaaS) to optimize their online
marketing. Coremetrics integrated marketing optimization solutions include real-time personalized recommenda-
tions, email targeting, display ad targeting across leading ad networks, and search engine bid management. The
company’s solutions are delivered on the only online analytics platform designed to anticipate the needs of every
customer, automate marketing decisions in real time, and syndicate information across all customer channels.

For Additional Information


To learn more about Coremetrics, visit www.coremetrics.com or call (866) 493-2673.

Coremetrics has strongly supported online privacy since its inception.


To learn more, visit http://www.coremetrics.com/company/privacy.php

Corporate Headquarters Coremetrics Europe Ltd. www.Coremetrics.com


1840 Gateway Drive, Suite 320 Lotus Park
San Mateo, CA 94404 The Causeway
Tel: 866.493.2673 Staines
Middlesex
TW18 3AG
U.K.
Tel: + 44 (0)8706 006123

Copyright ©2011 Coremetrics, an IBM Company. All Rights Reserved. 5308


CUSTOMER SUCCESS STORY

LEMONADE DRIVES 6X GROWTH WITH 500%


MIXPANEL’S USER INSIGHTS increase in new policyholders in
15 months after taking actions
from Mixpanel insights

INDUSTRY
• Insurance
250%
• E-commerce Increase in company’s overall
quote-view to purchase rate
since launch

PRODUCTS USED

50%
• Core Reports
• People
• Machine Learning

increase in Extra Coverage’s


PLATFORMS product purchase rate after
making data-driven
improvements

GOALS
• Understand the full
10%
customer journey
• Drive business decisions
THE GOALS of product manager’s time
saved every year for those who
use Mixpanel
with user insights
Lemonade is not your average insurance company. With no brokers, no runarounds, and an experience powered by
• Measure and op;mize
acquisi;on channels ar;ficial intelligence, Lemonade offers consumers a new way to get affordable renters or home insurance, all from
• Increase overall company the comfort of the web or a mobile device.
conversion rates
So when a company like Lemonade is upending how a longstanding industry does business, it has to learn from user
SOLUTION behavior quickly and use that data effec;vely across the whole organiza;on. That’s where Mixpanel came in.
With Mixpanel, Lemonade
has the user insights “Everything we do is based on data,” said Gil Sadis, Head of Product at Lemonade. “Execu;ves, product, marke;ng,
necessary to determine analy;cs, customer service, and even underwri;ng teams learn how to use Mixpanel as soon as they join the
company strategy, help company.”
allocate ;me and resources,
and ensure that teams
could make data-informed Lemonade’s ul;mate business goal is to increase its number of policyholders over ;me. With more than 100,00
decisions every step of the policyholders and coun;ng, the company must op;mize partner and paid acquisi;on channels and educate its
way. consumers on how Lemonade is different than the tradi;onal insurance model. Then, the company quickly serves
personalized quotes so that it’s easy for a consumer to purchase a policy that’s right for them.

“For product people, Mixpanel is


By understanding their data, Lemonade knew they could drive bePer customer experiences and, ul;mately, bePer
our savior. It’s so easy to see the
business outcomes. Specifically, Lemonade wanted access to user behavior insights in Mixpanel to determine
en:re user journey in Mixpanel to
company strategy, to help allocate ;me and resources, and to ensure that teams can make data-informed decisions
uncover what’s going on with the
every step of the way.
business and how we can create a
trustworthy insurance purchasing
With Mixpanel’s suite of user analy;cs products, Lemonade has been able not only to measure and op;mize high-
experience for our customers.”
performing acquisi;on channels but also improve purchase funnels, making it easy for consumers to convert into
Gil Sadis, loyal policy holders.
Head of Product,
Lemonade

© 2017 Mixpanel Inc • www.mixpanel.com • Contact us at inquiries@mixpanel.com


CUSTOMER SUCCESS STORY

THE SOLUTION Ninety-five percent of its employees use Mixpanel, and their decisions
are driven by insights they discover in the tool.
“We’ve had Mixpanel implemented since Day 0 – even before our 

public launch, to test that our infrastructure was working,” Gil said. “Part of onboarding a new employee at Lemonade is crea;ng a
Mixpanel account. We want people to engage with data, ask ques;ons,
and find the answers in data to make the right decisions,” Gil said.
“I’VE USED MIXPANEL AT MANY
COMPANIES BEFORE – IT’S THE GO-TO In fact, their reliance on Mixpanel has allowed the company to operate
in an incredibly nimble and unconven;onal way: “At Lemonade, we only
MARKET SOLUTION WHEN YOU NEED focus on the most important stuff, so the en;re org can have the

PRODUCT AND USER ANALYTICS.” biggest impact. And while we have a very focused vision, we don’t
chase quarterly roadmaps, which many people find strange.
Through Explore and People profiles, the company has an end-to-end
understanding of the user: “Because our product and marke;ng is “However, Mixpanel helps us rigorously priori;ze and balance where
omni-channel, it’s really easy to see the en;re journey a user takes – we innovate versus iterate, all based on what we learn from our user
from acquisi;on, to moving through one of our hundreds of funnels, to behavior data. Nearly everyone here uses Mixpanel, so we are all
ul;mately purchasing a policy.” empowered to make informed decisions with data. For example, when
we release a new innova;ve feature and measure it with Mixpanel, we
see its direct impact on the business and how many people are
“THE DIFFERENCE BETWEEN interested in it.
MIXPANEL AND OTHER ANALYTICS
“Then, we can compare that to the impact a more itera;ve product
TOOLS IS THAT MIXPANEL’S MACHINE change has on our users. Since everything is measured in Mixpanel, we

LEARNING WILL NOTIFY YOU WITH have clear evidence about what to care about and how to balance
priori;es.”
WHAT YOU SHOULD CARE ABOUT.”
From the company level down to the individual, trust in and use of the
“In our busy day-to-day, we’re not looking at every funnel to make sure data saves ;me:
everything is working. But Mixpanel’s anomaly detec;on features tells
us when there’s a steep drop or unexpected up;ck. This is exactly why
we want everyone on Mixpanel. When everyone can access the tool, “MIXPANEL SAVES EVERY PRODUCT
we quickly spot issues or no;ce the trends to capitalize on. MANAGER, AT MINIMUM, HALF A
“In every product spec and with every feature we build, we have DAY’S WORK, EVERY WEEK. AND WITH
designated placeholder for Mixpanel. That way, we can determine the
ANOMALY DETECTION, THE MOST
events we track and how this feature is helping us measure and work
toward our larger business goals.” IMPORTANT USER PRIORITIES
SURFACE TO US INSTANTLY.”
THE RESULTS Increasing purchase rate of Extra Coverage by 50%, thanks to
Fueling high growth and produc2vity with Mixpanel anomaly detec2on
As newcomers to insurance, an industry that has operated without
With the launch of Lemonade’s Extra Coverage product, Mixpanel’s
major innova;on for the past 150 or more years, Lemonade has taken a
machine learning helped the team uncover a key insight: new
design and customer-centric approach, and the strategy is paying off. In
policyholders weren’t fully comple;ng the purchase flow.
just one year, the company secured 70,000 policies, serving more than
100,000 policyholders, and coun;ng.
“To finalize a policy with Extra Coverage, a user has to submit their
policy for review. At this stage there’s no need for payment, but
In fact, Lemonade’s rate of acquiring new policyholders doubles every
Mixpanel helped us find that most people didn’t submit it for review;
10 weeks.
they went through the en;re flow and then suddenly stopped,” Gil
recounted.
In addi;on to its focus on AI and mobile-first design, Lemonade credits
much of this growth to the insights their teams are able to find and
“Aaer anomaly detec;on no;fied us of the staggering drop-off rate, we
leverage with Mixpanel.
dug deeper into Funnels to find that part of the problem was a

© 2017 Mixpanel Inc • www.mixpanel.com • Contact us at inquiries@mixpanel.com


CUSTOMER SUCCESS STORY

browser-based technical issue. In addi;on, the the call to ac;on was User analy2cs for marketers
not clear enough. It felt like you already submiPed something, even
though there was s;ll one last step.” In addi;on to product and UX improvements, Mixpanel has helped the
Marke;ng team measure and op;mize their paid acquisi;on channels
Tackling both the technical and the UX issues led to a drama;c and spend.
improvement: the product team saw a 50% increase in overall
conversion for Extra Coverage. “When we found that a lot of our paid channels came from mobile
devices, we started to direct people to download our app and not go
Improving company’s overall quote-view to purchase rate by through the mobile web flow. However, we saw a steep decline.
250% since launch
“This prompted us to really improve our mobile web flow, instead, and
In the beginning, Lemonade’s new user acquisi;on rate wasn’t always drive app downloads later in the process. The improvements we made
doubling every 10 weeks. to the mobile web flow led to conversion rates that were hundreds of
percent bePer than what we had before,” Gil said.
“When we launched in 2016, we didn’t know how our customers would
behave. Lemonade is a completely new product and insurance buying “THE MARKETING TEAM USES
experience. We did a lot of user tes;ng, but we didn’t really know how
our acquisi;on funnel would perform,” Gil remembered. MIXPANEL THE SAME WAY AS THE
REST OF US. WITH FUNNELS, WE
“Post-launch, we saw a huge drop in the funnel where users first see
their quote view. We went straight to Mixpanel. It was easy not only to MEASURE PERFORMANCE PER
understand at what stage users dropped, but also find and connect
with the specific users to ask for qualita;ve feedback to improve the
CHANNEL AND AT THE CAMPAIGN
user experience.” LEVEL.”

SINCE MAKING PRODUCT AND UX “Moving forward they’ll be u;lizing Plagorm even more, especially in
tying marke;ng and email campaign data to user behavior data.
CHANGES FIRST IDENTIFIED
THROUGH MIXPANEL, THE OVERALL That way they can answer ques;ons like: How did the campaign
perform end-to-end? Did the email language convert? And, did these
PURCHASE RATE SINCE LAUNCH HAS campaigns prompt users to take ac;on within the product?”

IMPROVED BY 250%.   By syncing mul;ple data sources to Mixpanel, all of Lemonade’s teams
can see the downstream effects and direct impact they have not only
“We made the top naviga;on really lean. It was a bit cluPered at first within the product, but ul;mately the business.
and introduced too much insurance talk – jargon like ‘annual
deduc;ble.’ We simplified everything so if you went through the
experience all you would see in the quote view is the price and a CTA
to buy. Then there was a small arrow ushering you to send you to the
next sec;on.

“We couldn’t have tracked, measured or understood changes without


Mixpanel. We tested the placement of features, order of different
sec;ons, and even the copy. Today, the user experience is a
culmina;on of all those smaller Mixpanel-backed experiments.”

© 2017 Mixpanel Inc • www.mixpanel.com • Contact us at inquiries@mixpanel.com


CASE STUDY

Large Pharmaceutical Gains Significant Insight


into Digital Marketing Performance with
Liaison’s Web Analytics Platform

A key Liaison pharmaceutical customer approached Liaison with a desire to improve


tracking of their digital marketing campaign effectiveness. The company was looking for
a solution that would provide a unified view of their global web analytics data across a
Quick Facts
variety of digital assets such as web sites, banner advertising and social media.

Industry
The Challenge
Pharmaceutical
With marketing campaign metrics and clickstream data coming from a multitude of
& Life Science
diverse analytics platforms (e.g. Google Analytics, Facebook Insights, YouTube, etc.),
the company had no cohesive way to analyze the data as a whole. Their only recourse Sales & Marketing
for obtaining this data was to visit the metric provider sites individually or to request a
report from their marketing agency. Liaison Solution
Liaison Healthcare’s
A solution to consolidate all these campaign metrics into a single unified view was Cloud-based Web Analytics
sorely needed for business intelligence decision-making. Through consolidation, the Dashboard and Data
company would be able to compare effectiveness data as a whole with defined key Integration Platform
performance indicators (KPIs).

However, in establishing a global, consolidated and automated reporting and analysis


service hub, the customer was also concerned about the rapidly evolving analytics
technology space. The solution approach would need to protect and scale the initial
investment and provide a future proofed path forward to leverage the best of rapidly
evolving technologies such as big data.

The Solution
The customer selected Liaison because it could provide the differentiated value of
faster time to market and flexibility by leveraging its multi-tenant preexisting and
configurable services. In addition, the scalable solution design offered the future
proofed path the client was looking for to eliminate the obsolescence risk associated
with rapidly evolving technology advances.

The end result was a Liaison solution that seamlessly captures data from seven web
analytics providers and harmonizes it into a single data model for access and analysis
via a thin web application dashboard. Built as a platform as a service solution, it
enabled our pharmaceutical client to dramatically increase the speed and agility in
which valuable information is delivered across the business, while maintaining a high
level of user confidence in its accuracy.
Case Study: Large Pharmaceutical

On the Server Side: Capturing and Harmonizing Analytics Data


Liaison’s solution retrieves data from Each provider offers an application programming interface (API) that Liaison
seven web analytics providers: interfaces with in order to retrieve the disparate data points each provider
offers. These data points can be broken down into two primary types: metrics
• Atlas Solutions by Facebook and dimensions. Metrics consist of the actual data (e.g. numbers of likes,
• Bing Ads views, clicks, impressions, subscribers, etc.). Dimensions are categorical
• Facebook Insights variables that provide attribute filters on the metrics. The data points retrieved
• Google Adwords are already in an aggregate form, but Liaison’s harmonization solution builds yet
• Google Analytics another layer of aggregation to enable insight across all of the organization’s
• Google DoubleClick digital assets.
• YouTube

On the Client Side: the Dashboard


All retrieved analytics data is available for access from Liaison’s client-facing dashboard. This dashboard supports considerable
functionality, allowing users to run reports; create custom data filters and views; build graphs and tables; and import and export
analytics data. By providing one easy central point of access to all analytics, the dashboard allows for metrics analysis across all
brands and regions for unprecedented insight into campaign performance.

Customer Benefits
Immediately, Liaison’s customer gained greater visibility of their customer’s buying power and what products resonated in which
regions. With such detail at their fingertips, they were able to focus marketing dollars on initiatives that generated actual revenue,
while reducing overall marketing dollars spent due to improved targeting of potential customers.

Atlanta – US HQ United Kingdom The Netherlands © 2014 Liaison Technologies

3157 Royal Drive Tel +1.866.336.7378 +44 (0) 1425 200620 +31 (0) 20 700 9350 All rights reserved.
Building 200, Suite 200 +1.770.442.4900 Finland Sweden Liaison is a trademark of
Alpharetta, GA 30022 Fax +1.770.642.5050 +358 (0)10 3060 900 +46 8 518 365 00 Liaison Technologies. www.liaison.com
For Customer Insights Professionals

The Forrester Wave™: Web Analytics, Q4 2017


Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

by James McCormick
November 7, 2017

Why Read This Report Key Takeaways


In our 32-criteria evaluation of web analytics Adobe And AT Internet Lead The Pack
providers, we identified the seven most significant Forrester’s research uncovered a market in
ones — Adobe, AT Internet, Cooladata, Google, which Adobe and AT Internet lead the pack. IBM
IBM, Mixpanel, and Webtrekk — and researched, and Google are Strong Performers. Webtrekk,
analyzed, and scored them. This report shows how Mixpanel, and Cooladata are Contenders.
each provider measures up and helps customer
Web Analytics Is Still Essential To Digital
insights (CI) professionals make the right choice.
Success
The web analytics market is critical to businesses.
Forrester finds that it is core to digital intelligence
practices, essential to understanding web
engagement and delivering optimal digital
customer experiences, and has evolved way
beyond browser analytics.

Privacy, Security, And Artificial Intelligence Are


Key Differentiators
We identified many areas of differentiation
during our evaluation of 32 web analytics
criteria. Highlights include privacy and security,
syndication of web analytics data, application of
AI, and browser instrumentation.

forrester.com
For Customer Insights Professionals

The Forrester Wave™: Web Analytics, Q4 2017


Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

by James McCormick
with Gene Leganza and Emily Miller
November 7, 2017

Table Of Contents Related Research Documents


2 Web Analytics Is The Digital Heart Of The Insights-Driven Businesses Set The Pace For
Insights-Driven Business Global Growth

Evaluated Vendors Sit Within Three Of The Optimize Customer Experiences With Digital
Seven Digital Intelligence Categories Intelligence

5 Web Analytics Evaluation Overview Vendor Landscape: Digital Intelligence


Technology Providers You Should Care About
Evaluated Vendors And Inclusion Criteria

7 Vendor Profiles

Leaders

Strong Performers Share reports with colleagues.


Enhance your membership with
Contenders Research Share.
11 Supplemental Material

Forrester Research, Inc., 60 Acorn Park Drive, Cambridge, MA 02140 USA


+1 617-613-6000 | Fax: +1 617-613-5000 | forrester.com
© 2017 Forrester Research, Inc. Opinions reflect judgment at the time and are subject to change. Forrester®,
Technographics®, Forrester Wave, TechRadar, and Total Economic Impact are trademarks of Forrester Research,
Inc. All other trademarks are the property of their respective companies. Unauthorized copying or distributing
is a violation of copyright law. Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

Web Analytics Is The Digital Heart Of The Insights-Driven Business


By 2021 firms that excel at leveraging data and analytics to drive insights at scale will earn total
global annual revenues of $1.8 trillion.1 Common to all insights-driven firms is their ability to leverage
and scale the understanding of their customers as they digitally engage. In other words, insights-
driven firms are digitally intelligent.2 While the practice of web analytics has been around for as long
as websites themselves, it has not stood still. Web analytics continues to evolve as customer digital
engagement changes and business undergo digital transformation. Indeed, these technologies are
more relevant to business today, as we find that:

›› Web analytics is core to digital intelligence practices. Many digital data management, analytics,
and optimization technologies compete for attention within a modern digital intelligence (DI)
practice. However, web analytics remains the single most dominate technique (see Figure 1). Almost
three-quarters of respondents in Forrester’s Q2 2017 Global Digital Intelligence Platforms Forrester
Wave™ Customer Reference Online Survey used web analytics from their DI platform providers.
Web analytics adoption is significantly greater than the next three dominant analytics techniques:
application analytics (48%), interaction analytics (43%), and cross-channel attribution (41%).

›› Understanding customer web engagement is still critical to business success. The past
decade has seen a significant shift of proportion of internet traffic and customer engagement from
PC web browsers toward mobile apps, mobile browsers, tablet and TV apps, and other internet-of-
things (IoT) devices.3 However, the vast majority of active internet users still interact via browsers,
which also remain the most important digital channel for consumers to make their purchases.4

›› Modern web analytics technologies extend beyond browser analytics. Although understanding
visitor behaviors is still important to CI pros, today’s digital practitioners are using their web
analytics systems for a lot more. The majority also use these systems to aid behavioral targeting
efforts to personalize customer engagements (70%), manage their digital data within a data
warehouse (63%), integrate it with online testing technologies (57%), and perform application
analytics (53%).5

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 2
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

FIGURE 1 Web Analytics Remains The Most Dominant Digital Intelligence Platform Capability

Proportion of enterprises using each technology listed via a digital intelligence


technology platform

Web analytics 73%

Behavioral targeting 59%

Online testing 55%

Digital data warehousing 50%

Application analytics 48%

Interaction analytics 43%

Cross-channel attribution 41%

Tag management 34%

Predictive analytics 34%

Recommendations 32%

Digital performance management 30%

Voice of the customer 25%

Social analytics 18%

Internet-of-things (IoT) analytics 7%

Spatial analytics 2%

Base: 44 digital intelligence platform users


Source: Forrester’s Q2 2017 Global Digital Intelligence Platforms Forrester Wave™ Customer Reference
Online Survey

Evaluated Vendors Sit Within Three Of The Seven Digital Intelligence Categories

Vendors that provide web analytics technologies are part of a broader DI landscape of vendors that
either 1) manage digital interaction data; 2) analyze digital interaction and related data; 3) inform and/
or optimize customer interactions based on insights gained from data, analytics, testing, and machine
learning; or 4) provide any combination of these three.6 Forrester has identified seven categories of
digital intelligence, each representing a different combination of capabilities (see Figure 2). The web

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 3
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

analytics vendors evaluated within this Forrester Wave come from one of three DI categories. These
three DI categories all have strong digital analytics capabilities but are differentiated from each other:

›› Category 2 DI vendors are digital analytics specialists. AT Internet falls into this category. Other
analytics specialists within this group include those that focus on mobile analytics (e.g., Appsee),
social analytics (e.g., Sprinklr), and interaction analytics (e.g., User Replay).

›› Category 5 DI vendors emphasize data management with analytics. Cooladata sits within
this category. Other vendor types within this group include those focused on digital performance
management (e.g., New Relic), streaming analytics (e.g., Keen IO), and tag management (e.g.,
Commanders Act).

›› Category 7 DI vendors offer data, analytics, and optimization capabilities. Adobe, Google,
IBM, Mixpanel, and Webtrekk reside within this category. These vendors were also evaluated for
their DI capabilities in our Q2 2017 Forrester Wave evaluation of digital intelligence platforms.7 The
majority of category 7 vendors have data warehousing technology; provide a combination of mobile
analytics, web analytics, and attribution analytics; and include some form of behavioral targeting.

FIGURE 2 Evaluated Web Analytics Vendors Come From Three Of The Seven Digital Intelligence Categories

Category 3
Digital engagement
optimization technologies
Category 4

Category 6
Adobe
Google
IBM Category 7
Mixpanel
Webtrekk
Category 1 Category 2 AT Internet
Digital data Digital
management analytics
technologies technologies

Category 5
(Cooladata)

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 4
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

Web Analytics Evaluation Overview


To assess the state of the web analytics market and see how the vendors stack up against each other,
Forrester evaluated the strengths and weaknesses of top web analytics vendors. After examining past
research, user need assessments, and vendor and expert interviews, we developed a comprehensive
set of 32 evaluation criteria, which we grouped into three high-level buckets:

›› Current offering. The capabilities we evaluated included data ingestion (e.g., web browser
instrumentation and visitor tracking); data repository, model, and access (e.g., data storage and
customer profiling); data ownership (e.g., syndication, privacy, security, and portability); analytics
and reporting (e.g., segmentation; attribution, and mobile analytics); dashboards and alerts;
artificial intelligence; web analytics usability and user experience; and web analytics technology
ecosystem (i.e., APIs and first-party DI products and integrations).

›› Strategy. We reviewed each vendor’s strategy, evaluating its product vision, execution road map,
performance, supporting services, and partner ecosystem. Partner ecosystem is critical because
the web analytics solution sits within a much larger DI and customer engagement technology stack.

›› Market presence. The elements of market presence we evaluated include web analytics revenue,
number of enterprise customers, and average deal size. Forrester examines market presence to
provide assurance that evaluated vendors are financially stable and viable for enterprise customers.

Evaluated Vendors And Inclusion Criteria

Forrester included seven vendors in the assessment: Adobe, AT Internet, Cooladata, Google, IBM,
Mixpanel, and Webtrekk. Each of these vendors has (see Figure 3):

›› A web analytics solution that motivates client inquiries. Forrester clients often discuss the
vendor’s web analytics products through inquiries; alternatively, the vendor may, in Forrester’s
judgment, warrant inclusion or exclusion in this evaluation because of web analytics technology
trends or market presence.

›› A dedicated web analytics software solution. The vendor offers a software solution specifically
built to deliver web analytics functionality. This functionality is core to the solution and is not
simply an add-on to other analytical functionality, such as interaction analytics from session-based
replay data, customer analytics, or insights-driven optimization. Enterprise customers use the web
analytics solution as a standalone software tool.

›› A solution with a complete set of advanced enterprise web analytics functionality. Evaluated
vendors provide a complete set of advanced web analytics functionality, including in-browser
instrumentation for collecting visitors’ contextual and behavioral data, the provision and
management of an extensive set of out-of-the-box metrics and dimensions, and customizable self-
service dashboards.

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 5
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

›› An enterprise web user base. Ten or more enterprises are users of the vendor’s web analytics
solution.

FIGURE 3 Evaluated Vendors: Product Information And Selection Criteria

Product version
Vendor Product evaluated evaluated

Adobe Adobe Analytics Spring 2017 release

AT Internet Analytics Suite V1.04-07-17

Cooladata Cooladata SaaS; continuously updated

Google Google Analytics 360 SaaS; continuously updated

IBM IBM Watson Customer Experience Analytics SaaS; continuously updated

Mixpanel Mixpanel Mixpanel

Webtrekk Webtrekk Analytics Webtrekk Analytics

Vendor inclusion criteria

1. The web analytics solution has sparked client inquiries and/or the vendor has web analytics
technologies that put it on Forrester’s radar. Forrester clients often discuss the vendor’s web
analytics products through inquiries; alternatively, the vendor may, in Forrester’s judgment, warrant
inclusion or exclusion in this evaluation because of web analytics technology trends or market
presence.

2. The vendor provides a dedicated web analytics software solution. In other words, the vendor offers
a software solution that has been specifically built to deliver web analytics functionality. This
functionality is core to the solution and is not simply an add-on to other analytical functionality, such as
interaction analytics from session-based replay data, customer analytics, or insights-driven
optimization (e.g., A/B testing or online testing). The web analytics solution is offered to and used as a
standalone software tool by enterprise customers.

3. The software solution has a complete set of functionality for advanced enterprise web analytics
needs. Evaluated vendors provide a complete set of advanced web analytics functionality, including
in-browser instrumentation for collecting visitors’ contextual and behavioral data, the provision and
management of an extensive set of out-of-the-box metrics and dimensions, and customizable
self-service dashboards.

4. The vendor has an enterprise user base. Ten or more enterprises are users of the vendor’s web
analytics solution. Forrester defines enterprise-sized customers as firms with at least $1 billion in
annual revenue.

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 6
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

Vendor Profiles
This evaluation of the web analytics market is intended to be a starting point only. We encourage
clients to view detailed product evaluations and adapt criteria weightings to fit their individual needs
through the Forrester Wave Excel-based vendor comparison tool (see Figure 4).

FIGURE 4 Forrester Wave™: Web Analytics, Q4 ’17

Strong
Challengers Contenders Performers Leaders
Strong

Adobe
Go to Forrester.com
to download the
Forrester Wave tool for
AT Internet more detailed product
evaluations, feature
comparisons, and
customizable rankings.
IBM

Current Google
offering
Webtrekk
Cooladata Mixpanel

Market presence

Full vendor participation

Weak Incomplete vendor participation

Weak Strategy Strong

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 7
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

FIGURE 4 Forrester Wave™: Web Analytics, Q4 ’17 (Cont.)

et
gh r’s

ta
g

kk
l
rn

ne
da
tin
ei te

tre
te
e
w res

pa
la
ob

In

eb
oo

ix
r

Ad
Fo

AT

IB

W
M
C
Current offering 50% 4.42 3.79 1.82 2.92 1.78 2.13

Data ingestion 10% 4.10 4.00 1.90 2.90 1.70 2.80

Data repository, model, and access 10% 4.20 3.00 1.00 3.80 1.60 2.40

Data ownership 10% 3.50 4.00 1.50 3.00 2.00 2.00

Analytics and reporting 30% 4.80 3.95 1.60 3.15 1.50 2.20

Dashboards and alerts 10% 4.00 3.00 2.00 2.00 2.00 2.00

Artificial intelligence 5% 5.00 3.00 0.00 1.00 1.00 1.00

Web analytics usability and UX 15% 5.00 5.00 4.00 2.00 3.00 1.00

Web analytics technology ecosystem 10% 4.00 3.00 1.00 4.50 1.00 3.50

Strategy 50% 4.40 4.60 1.80 3.80 2.20 2.20

Product vision 20% 3.00 5.00 1.00 3.00 1.00 1.00

Execution road map 20% 5.00 5.00 1.00 5.00 1.00 1.00

Performance 20% 5.00 4.00 3.00 3.00 3.00 3.00

Supporting services 20% 4.00 5.00 2.00 4.00 3.00 3.00

Partner ecosystem 20% 5.00 4.00 2.00 4.00 3.00 3.00

Market presence 0% 4.80 3.40 1.00 3.00 2.00 2.40

Web analytics revenue 40% 5.00 3.00 1.00 3.00 2.00 2.00

Number of enterprise customers 40% 5.00 4.00 1.00 2.00 2.00 3.00

Average deal size 20% 4.00 3.00 1.00 5.00 2.00 2.00

All scores are based on a scale of 0 (weak) to 5 (strong).

Leaders

›› Adobe maintains its dominant position and strength within the web analytics market. With its
Adobe Analytics spring 2017 release, Adobe has sought to emphasize features that democratize
meaningful and actionable digital insights to anyone in the enterprise. It has concentrated on
making the UI more intuitive and building on capabilities that allow the exploration of data

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 8
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

breakdowns, relationships, and comparisons. While Forrester has seen some success with this
endeavor, some customers still feel that the interface is a bit daunting and that it can be a challenge
to bring new hires up to speed with the product. B2C customers are by far the largest group using
the product and come from a broad range of buyer types and industry verticals.

›› AT Internet has quietly continued to strengthen its position in the market. Sometimes an
unsung web analytics leader, AT Internet prides itself on Analytics Suite’s focus on helping global
enterprises comply with modern data security, privacy, and confidentiality requirements, backed
by enterprise-class data SLAs — not surprising given the vendor’s European heritage.8 The
vendor’s customer satisfaction scores were some of the best for many of the web analytics areas
assessed in this evaluation. However, some customers we spoke to felt that the UI could be made
simpler and is due for a face-lift. Over half of AT Internet’s revenue comes from the media and
entertainment, financial services, and retail sectors. Its biggest buyers within the enterprise are
digital analytics leads, CMOs, and chief technology officers.

Strong Performers

›› IBM wants to embody easy-to-use web analytics. IBM’s Watson Customer Experience Analytics
product enables its customers to optimize websites, marketing, and digital applications to increase
conversion, loyalty, and customer lifetime value. In absolute terms, IBM has seen success with this.
However, relative to some of its traditional web analytics competitors, the vendor has lost ground
in its current offering and strategic approach. The large majority of enterprise buyers for IBM’s web
analytics product are either CMOs or digital/customer analytics leaders. About half come from the
financial services, retail, and insurance sectors.

›› Google continues its foray into the enterprise web analytics market. With its Analytics 360
Suite, Google combines its digital analytics capacities with that of tag management (Google Tag
Manager 360), site optimization (Google Optimize 360), data visualization (Google Data Studio),
a market survey/research tool (Google Surveys 360), attribution (Google Attribution 360), and
audience management (Google Audience Center 360). The product continues to show its strength
in application usability and UX. However, even though it is now part of a suite, the product still
lacks the level of support and capability for data management, data ownership, analytics, and
reporting provided by some of leading vendors in this report. Google’s product appeals to a broad
range of users within CMO and digital analytics teams, which make up roughly half of the buyers.
Many buyers are from retail, financial services, and media, entertainment, and leisure. Google did
not actively participate in this Forrester Wave evaluation.

Contenders

›› Webtrekk impresses with good web analytics capabilities at a competitive price. A new
entrant into the Forrester Wave, Webtrekk wants to provide easy access to raw data to all types
of users and third-party tools for in-depth analysis. The vendor also boasts a DMP and marketing

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 9
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

automation tool into which the web analytics product is closely integrated.9 Many customers feel
the product offers great value for the price range; however, some say that the vendor must improve
supporting material, third-party integration APIs, and a bulky UI. The vendor’s products appeal
mostly to buyers from CMO, chief digital officer, and digital analytics teams at B2C enterprises from
the retail, financial services, and media, entertainment, and leisure verticals.

›› Mixpanel is building web analytics capabilities from the ground up for ease and scale. From
its inception, Mixpanel took advantage of the reducing cost of data storage and processing power
to capture large amounts of user detail and event properties to provide deep insights into user
behaviors. However, some customers feel the experience with the UI, dashboarding/reporting, and
administration could be improved. Some customers don’t use the product as their primary tool
for web analytics and insights democratization. Rather, analytics specialists used it for deep-dive
ad hoc analysis. Most buyers come from the chief product officer, CMO, and customer analytics
teams at companies within the media, entertainment, and leisure; consumer services; and high-
tech industries.

›› Cooladata strives for differentiation by measuring websites and beyond. Cooladata provides
a web analytics product with the ability to capture data from many different digital touchpoints
and perform analysis on the fly using a dynamic data scheme. Some customers comment that the
UI needs some work and that it is easier to generate relevant reports in other systems. This partly
explains why some use the vendor’s product less as their primary web analytics tool and more as a
specialist business intelligence tool for products and product teams. Over two-thirds of buyers for
Cooladata’s web analytics product are from chief product officer and chief digital officer teams. The
vendor’s three largest verticals are online gaming, eCommerce, and media.

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 10
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

Engage With An Analyst


Gain greater confidence in your decisions by working with Forrester thought leaders to apply
our research to your specific business and technology initiatives.

Analyst Inquiry Analyst Advisory Webinar

To help you put research Translate research into Join our online sessions
into practice, connect action by working with on the latest research
with an analyst to discuss an analyst on a specific affecting your business.
your questions in a engagement in the form Each call includes analyst
30-minute phone session of custom strategy Q&A and slides and is
— or opt for a response sessions, workshops, available on-demand.
via email. or speeches.
Learn more.
Learn more. Learn more.

Forrester’s research apps for iOS and Android.


Stay ahead of your competition no matter where you are.

Supplemental Material

Online Resource

The online version of Figure 4 is an Excel-based vendor comparison tool that provides detailed product
evaluations and customizable rankings. Click the link at Forrester.com at the beginning of this report to
download.

Data Sources Used In This Forrester Wave

Forrester used a combination of three data sources to assess the strengths and weaknesses of each
solution. We evaluated the vendors participating in this Forrester Wave, in part, using materials that
they provided to us by October 13, 2017.

›› Vendor surveys. Forrester surveyed vendors on their capabilities as they relate to the
evaluation criteria.

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 11
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

›› Executive briefings. An executive backed by a product team from each vendor presented and
answered questions on the vendor’s product strategy and market sizing.

›› Product demos. We asked vendors to conduct demonstrations of their products’ functionality and
to answer clarification questions posed to them. We used findings from these product demos to
validate details of each vendor’s product capabilities.

›› Customer surveys and reference calls. To validate product and vendor qualifications, Forrester
also surveyed and conducted phone interviews with three of each vendor’s current customers.

The Forrester Wave Methodology

We conduct primary research to develop a list of vendors that meet our criteria for evaluation in this
market. From that initial pool of vendors, we narrow our final list. We choose these vendors based on:
1) product fit; 2) customer success; and 3) Forrester client demand. We eliminate vendors that have
limited customer references and products that don’t fit the scope of our evaluation.

After examining past research, user need assessments, and vendor and expert interviews, we develop
the initial evaluation criteria. To evaluate the vendors and their products against our set of criteria,
we gather details of product qualifications through a combination of lab evaluations, questionnaires,
demos, and/or discussions with client references. We send evaluations to the vendors for their review,
and we adjust the evaluations to provide the most accurate view of vendor offerings and strategies.

We set default weightings to reflect our analysis of the needs of large user companies — and/or
other scenarios as outlined in the Forrester Wave evaluation — and then score the vendors based
on a clearly defined scale. We intend these default weightings to serve only as a starting point and
encourage readers to adapt the weightings to fit their individual needs through the Excel-based tool.
The final scores generate the graphical depiction of the market based on current offering, strategy, and
market presence. Forrester intends to update vendor evaluations regularly as product capabilities and
vendor strategies evolve. For more information on the methodology that every Forrester Wave follows,
please visit The Forrester Wave Methodology Guide on our website.

Survey Methodology

Forrester fielded its Q2 2017 Global Digital Intelligence Platforms Forrester Wave™ Customer
Reference Online Survey to 44 individuals who were current clients of the vendors included in “The
Forrester Wave™: Digital Intelligence Platforms, Q2 2017.” We asked each vendor to supply at least
three customers. For quality assurance, we required all respondents to provide contact information and
answer basic questions about their firms’ revenues and budgets. Forrester fielded the survey between
January and February 2017.

Exact sample sizes are provided in this report on a question-by-question basis. Panels are not
guaranteed to be representative of the population. Unless otherwise noted, statistical data is intended
to be used for descriptive and not inferential purposes. During this research, Forrester questioned

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 12
Citations@forrester.com or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market

end users about the features and state of their DI practices. We also asked about the value that DI
approaches are currently providing and their intentions to mature such approaches to attain greater
value in their respective firms. This research was intended to generate a qualitative understanding of
the state of continuous optimization.

Integrity Policy

We conduct all our research, including Forrester Wave evaluations, in accordance with the Integrity
Policy posted on our website.

Endnotes
Forrester estimates that insights-driven businesses are growing at an average of over 30% annually, which will enable
1

them to globally earn $1.8 trillion per annum by 2021. See the Forrester report “Insights-Driven Businesses Set The
Pace For Global Growth.”

Forrester defines digital intelligence as the practice that includes the capture, management, and analysis of customer
2

data and insights to deliver a holistic view of customers’ digital interactions for the purposes of continuously
optimizing business decisions and customer experiences across the customer life cycle. See the Forrester report
“Optimize Customer Experiences With Digital Intelligence.”

The Global Mobile Report from comScore shows that just less than three-quarters of the global digital population use
3

their desktops to access the internet. And a significant proportion of the population in many countries still only uses
desktop computers. Source: Ben Martin, “The Global Mobile Report,” comScore, September 12, 2017 (https://www.
comscore.com/Insights/Presentations-and-Whitepapers/2017/The-Global-Mobile-Report).

Seventy-eight percent of internet users in the US made their most recent purchase via computers and browsers.
4

Source: Forrester Data Consumer Technographics® North American Consumer Technology, Media, And Telecom
Customer Life Cycle Survey, Q2 2017 (US).

Source: Forrester’s Q2 2017 Global Digital Intelligence Platforms Forrester Wave™ Customer Reference Online Survey.
5

Forrester has organized DI vendors into seven categories based on their digital data, digital analytics, and digital
6

optimization capabilities to guide decision makers on the combination of technology vendors they need to partner with
to build a mature DI practice. See the Forrester report “Vendor Landscape: Digital Intelligence Technology Providers
You Should Care About.”

In Forrester’s 26-criteria evaluation of DI platform providers, we identified the 10 most significant ones — Adobe,
7

Cxense, Evergage, Google, IBM, Localytics, Mixpanel, Optimizely, SAS, and Webtrekk — and researched, analyzed,
and scored them. See the Forrester report “The Forrester Wave™: Digital Intelligence Platforms, Q2 2017.”

SLAs: service-level agreements.


8

DMP: data management platform.


9

© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 13
Citations@forrester.com or +1 866-367-7378
We work with business and technology leaders to develop
customer-obsessed strategies that drive growth.
Products and Services
›› Core research and tools
›› Data and analytics
›› Peer collaboration
›› Analyst engagement
›› Consulting
›› Events

Forrester’s research and insights are tailored to your role and


critical business initiatives.
Roles We Serve
Marketing & Strategy Technology Management Technology Industry
Professionals Professionals Professionals
CMO CIO Analyst Relations
B2B Marketing Application Development
B2C Marketing & Delivery
Customer Experience Enterprise Architecture
›› Customer Insights Infrastructure & Operations
eBusiness & Channel Security & Risk
Strategy Sourcing & Vendor
Management

Client support
For information on hard-copy or electronic reprints, please contact Client Support at
+1 866-367-7378, +1 617-613-5730, or clientsupport@forrester.com. We offer quantity
discounts and special pricing for academic and nonprofit institutions.

Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with
business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary
research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a
singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations.
For more information, visit forrester.com. 136199
WHITE PAPER

INSIGHTS DRIVEN
COMPUTING REALIZING WEB
ANALYTICS STRATEGY

Paper discusses the methods and techniques to leverage the web


analytics based services to gain insights into customer behavior
to enhance user’s online experience by providing a personalized
and customized experience. Enhanced user experience would
translate into increased business revenues. The paper also discusses
a framework to realize the analytics seamlessly from the business
team without the involvement of implementation team. Finally the
relevance and utility of Web analytics for the emerging markets is
discussed.
Introduction most popular ways to understand the user
behavior on Web was done in one of the
Traditional online platforms were a following ways: Sample Business Required Analytics
one-way traffic wherein the key design Scenario

elements including the main user interface, • Market research


How effective is my online Track clicks on campaign and conversion
seasonal campaign? ratio
navigation, flow etc. were designed by the • Beta testing the web site and collecting
organization. The end-user had little or no the feedback How useful is the existing
Track site usability metrics including
site traffic, visitor profile, exit rate,
online platform?
say in that. With the emergence of web 2.0, • Perform usability testing from
visits duration.

the web is now more user-focused than specialized third-party agencies.


How can we improve Track the recently viewed products and
the product user interests to provide personalized
ever. The complete online experience is recommendation? product recommendations
However all these methods lacked one
shifting towards the intuitiveness which is
important factor: real-time end-user Table 1 : Web analytics sample scenarios
likely to engage the end-users and keep
feedback and analysis. Some of the
them interested an increase the likeliness
e-commerce sites need to analyze user Understand customers by their clicks :
of their participation.
behavior in real-time to promote and Many a times customers do not explicitly
User centered experience is changing the recommend their products. They also provide feedback/rate their online
way information is presented. It is in the need this information to understand any experience. However we can get a rough
interest of business that the presentation seasonal patterns, emerging trends etc. so idea of their overall satisfaction by
is designed which appeals to the end that they can customize their selling based “connecting the clicks” user did on the site
user and the information is presented around it. and other key metrics like time spent on
in its most intuitive fashion. The same site, conversion ratio etc. Web analytics
In scenarios such as these, the business
thing applies for navigation design as play a vital role in making sense of these
needed an analytics system which
well. Analytics is a broad area which deals user clicks and metrics to uncover any
continuously monitored user behavior
with capturing and understanding user’s hidden trends, providing customer insights
and provide real-time feedback in intuitive
explicit and implicit actions and trying to and to make a sense of overall online
reports so that the business heads can
make sense out of which. This in turn will strategy.
make sense out of their customer behavior
be used to optimize the business operates.
and prepare their action plan around The importance of metrics also varies
Web analytics mainly deals with collecting,
it. It would also help them measure the across industries. For instance following
measuring, analyzing, and reporting the
effectiveness of their web site UI, online metrics would assume importance in
data on presentation components.
campaign etc. e-commerce industry [9][10] :
In this paper, I introduce the key business
Measuring analytics of Web is a two-way • Clickthrough rate
process steps in designing a web analytics
traffic: • Traffic
strategy for an online platform. In addition
to that, I have also narrated an intuitive • Business wants to measure and • Revenue per visit
way of realizing a web analytics framework improve their intended and promoted
• Bounce Rate
from business team in real-time without online strategies
the need of implementation or operations • Page view
• Business also wants to understand
team. The paper is structured as follows: the patterns emerging from their • Value per order
It provides the general advantages and customers’ behavior • Repeat customer frequency
relevance of web analytics in section II.
Web analytics play a key role in analyzing • Conversion rate
Section III discusses the key business
both the intended and emerging topics in
process steps required for implementing a
real time. Following are some of the key
web analytics strategy. Section IV discusses
benefits by successfully implementing a
about steps in the implementation of web
web analytics strategy:
analytics and Details of realizing a web
analytics framework is discussed in section • Visualize the key site data (traffic, time
V. Section VI provides the performance spent, exit rate etc.) in intuitive fashion.
management details and Section VII • Track and measure the effectiveness of
provides the relevancy of web analytics online campaigns
for emerging markets and section VI
• Derive patterns, emerging trends and
summarizes the paper.
gain valuable customer insights.
Following table provides web analytics
Analyzing Web usage for few typical web scenarios:
Prior to the arrival of web analytics the

External Document © 2018 Infosys Limited


Analytics Driven Online key metrics and monitored. In addition, main business goals. This set would
the business also needs to set-up generally include all key features of the
Business
appropriate real-time notification website.
“If you cannot measure it, you can’t and exception handling process to • Step-2 Mark and measure the
improve it” holds true for managing and manage and improve these metrics. categories, flows and transactions:
improving online business. Hence the For instance if the page view metrics Once the boundaries of the online
primary step for a business is to layout the drop below a pre-configured threshold, business transaction is established
criteria for measuring their online platform. the notification mechanism should through key categories, flows and
I am providing below two broad strategies alert the relevant business team transactions they can be marked. This
which follow the “measure-and-improve” to investigate this further and take is done by page tagging using a web
model. appropriate action. Some of the critical analytics framework. The page tagging
metrics like page load times; availability will add client-side JS event handlers
A. Measure effectiveness of needs immediate attention to avoid which triggered when the event occurs.
intended strategy risking customers. The event could be as small as clicking
In this strategy the business wants to • Monitoring process
a link to as large as completing a full
Measure,
measure the effectiveness of their intended •

Notification Process
DR process Manage & Maintain order processing flow.
online strategy. A three step process in •

Site usability metrics
Conversion metrics • Step-3 Identify patterns: This is the
implementing this strategy is given below: • Items per order Identify Key Metrics
• Page views complex step in the overall process.
• Step-1 Establishing business goals: • Increase Information
discoverability
The business analysts have to monitor
Establish Business
• Maximize revenue
Identify the key business success per visit Goals the graphs regularly and identify any
factors for the online business. For patterns that emerge. For instance if
instance a manufacturing website Figure 1 : Hierarchy of steps in a particular type of clothing product
is designed with the key intent implementing web analytics intended outsells all other types in summer
strategy
of enabling “easy information season, business needs to perform
discoverability” for its customer; detailed analysis of the product
B. Identify and act on emerging
an e-commerce site is designed to attributes, seasonality factor to offer
trends/patterns strategy
“maximize revenue per customer” more attractive packages. Similarly if
It is not enough that business monitor and
• Step-2 Identification of metrics the bounce rate has increased after a
manage their intended business strategies;
for measuring business goals: The site redesign it needs to be analyzed
more often than not they need to watch
business needs to identity the metrics if any of the UI elements caused this
out for emerging and hidden trends from
which will measure the business goals issue.
their customers’ behavior. In the absence
identified in step – 1. Below mentioned It needs deep business and customer
of such strategy, business will not be able
table provides an indicative mapping understanding to derive the patterns from
to react to seasonal trends and patterns.
of goals – metrics: customers’ behavior. Often a combination
Implementing this strategy is easier said
Business Goal Measuring metrics than done as it is often difficult to arrive at of multiple factors can contribute to
Improve information Site Usability metrics (page views, time on emerging patterns. Following are key steps establish the correct pattern. For instance,
discoverability site, click paths, exit rate, bounce rates,
traffic source) in implementing this strategy: if there is an increase in exit rate/bounce
Conversion statistics (site traffic, new
visitor, returning visitors) rate in the order page there could be
• Step-1 Identify the categories
Increase online revenue Metrics for measuring cross-sell and various reasons. However when we
per customer up-sell (revenue per visit, repeat purchase and flows: Identify the category
rate, items per order, conversion rate). discover that the page load time of order
duration. of consumers, transactions and
page has increased off-late it helps us to
flows which form the major chunk
Table 2 : Indicative business goal-metrics conclude the cause-effect analysis.
of business. For instance a banking
mapping It is always recommended that
site can have all transactions which
a customer can perform ranked by organizations adopt both the strategies
• Step – 3 Measure and manage
priority into the “key transactions” to get sound customer insights and
metrics: Once the business has
group. Similarly a multi-geo site can understand the trends to stay ahead of
established the metrics to measure
identify the locales and countries into competition.
the effectiveness of intended online
business strategy. All the web analytics “key geo factors” group.
frameworks provide intuitive graphs Note: These categories and flows/
which can be developed around the transactions need not be part of their

External Document © 2018 Infosys Limited


done in the previous step:
Site usability Sources Conversion
statistics


Fast moving products
Seasonality trends • Page views • Referrals • New visitors
• Increase critical information
• Product mix and bundles Recognize


Pain points
Analyze reason for exit
patterns • Time on site • Search engines • Returning discoverability for the site
• Downloads • campaigns visitors

• Keyword frequency Measure metrics for




Click Map
Click Paths Visitor Profile


Leads
Exit rate
• Enhance site design
• Site usability metrics key transactions • Click-through • Bounce Rate
Rate
• User
• Visit duration
• Design effective campaigns for dealers,
• Search segmentation
Identify key
• Information
discoverability categories and
• Checkout
• Search keywords
• Task completion
distributors etc.
Abandonment rate
• Shopping cart per visit flows/transactions • Geography


Rate
Items per order
• New Visitor
Conversion Rate
• Promote targeted content for users
Performance
rate • Return Visitor
• Make site more navigable
Figure 2 : Hierarchy of steps in implementing • Landing Page • Landing Page Conversion Rate
Bounce Rates Load Times • Visitor Loyalty &
web analytics emergence strategy • Availability Visitor Recency • Fix issues in the steps/pages where
• Speed
• Geo specific page users are exiting
load times
Steps in implementing web • Improve SEO to increase site traffic
analytics Few common metrics obtained during this
from external search engines
This section provides key steps for step for an e-commerce site is given below: • Understand site performance in
implementing the Web analytics. different geographies
These metrics need to be carefully selected
Essentially any effective web analytics based on the critical business goals and • Personalize the information and flow
implementation should cover these three flows/functionalities. The list should based on user’s interest/usage history
steps: also need to be refined based on the
industry domain. For instance checkout
• Acquire the key metrics A business framework for
abandonment rate is an essential part
• Analyze the information acquired
of e-commerce site whereas it is not
automated web analytics
• Act based on the analysis applicable for information display sites. implementation
Following diagram depicts these three Now we have understood the AAA model
steps with sample metrics information. B. Analyzing of web analytics implementation, we can
Once the required information about key now focus on implementation details. A
metrics is acquired, the business analysts typical implementation process involves
Acquire Analyze Act need to analyze to make sense of the following sequence of steps:

• Site usability • Site usability • Increase obtained information. Most of the web • Business identifying metrics
discoverability
• Sources • Sources
• Enhance site analytics frameworks provide intuitive
• Visitor Profile • Visitor Profile design • Business providing detailed
• Conversion • Conversion • Design effective visualizations and dashboards to provide
statistics statistics
campaigns requirements to implementation team
• Promote targeted
content for users holistic view of the captured information in
• Make site more
navigable near real-time. Following list provides the • Implementation team updates relevant
• Fix issues in the
steps/pages information that can be analyzed from the page tags and performs testing
where users are


exiting
Improve SEO
metrics captured in the previous step: • Operations team deploys the updated
• Understand site
performance
Conversion
code to production
• Personalize the Site usability Sources
information statistics
• Information • Campaign • Visitor stickiness As we can see that above process involves
discoverability effectiveness • Transaction
multiple teams and there is definitely
Figure 3 : The AAA framework of web • Issues with page • SEO effectiveness Completion/aba
design • External ad ndonment rate a lag when the business decides to its
analytics implementation • Site interactivity effectiveness • Exit rate
• Path/Information steps/pages actual implementation in production
Visitor Profile
architecture
environment. However in some mission
effectiveness • Search
A. Acquiring effectiveness critical applications the lag is not an
• Geo/browser
acceptable option.
This step consists of acquiring the metrics visitor percentage

related information. The metrics can be This section discusses methods and
either obtained by mapping them to techniques to completely automate the
business goals (intended strategy) or can C. Act above sequence of steps. We will also
be obtained from markers at key flows and go one step ahead and eliminate the
After the analysis, the business team can
transactions (emerging strategy). involvement of other teams. So when
come up with an action plan to “act” on
business decides the changes to its
the analysis. This is the crucial step in the
web analytics framework it can use this
model. Following is an indicative list of
framework which deploys the changes
actions that can be taken from the analysis
automatically.

External Document © 2018 Infosys Limited


The following diagram depicts the business team to fully automate the end- can specify “how” the framework needs to
high level working of this automation to-end management of a web analytics assign those variable names.
framework: framework. There are two simple ways to assign a value
Here are the detailed components of the to a tracking code variable: static wherein
Web Analytics Implementation Framework
framework: the business specifies a hardcoded value
Transformer
Framework
and dynamic which depends on the page
Interface JS A. Interface and user/session context. Static values are
JavaScript prefixed with “S_” followed by <constant_
Intuitive interface to
The framework offers an intuitive interface
value> whereas dynamic values are
capture the essential
Elements for automatic
built for the business community. It’s a web
prefixed by “D_”. The framework has code
page tagging including based interface wherein the business team
URL, element class, event to populate the values prefixed with D_ at
name Etc. can enter specific details of web analytics
Publishing Job runtime. For instance in the above example
that need to be tracked. Let’s consider a
values like URL, locale etc. are dynamic. The
simplistic scenario for discussion purposes:
Business Team WWW framework assigns the appropriate values
Deployment business wants to track the traffic and
to these variables at runtime and provides
page visit data for their product landing
Figure 4 : Web analytics framework information the meaning of these tracking
page (www.ecommerce.com/products).
components code variable names and how the dynamic
Normally a page visit tracking requires
value would be calculated at runtime.
minimal tracking. Let’s say that we require
Before the explanation of the web analytics following variables for tracking page visit: Another feature of the framework is
implementation framework a brief that it “inherits” the tracking code. In the
• URL
discussion of how most of web analytics above example, all pages which matches
frameworks work is given below: • Browser details products/* which are subsequent product
1. T ypically web analytics frameworks • User details pages will inherit all the values specified for
need the supporting JavaScript files products landing page. In addition it can
• Geo
which can be optimally placed in the also override and/or add new values. In this
• Locale example we are adding a dynamic value for
footer section of the page. For instance
Google Analytics requires ga.js file[6] • Campaign tracking product name.

2. U
pon page load the required tracking Business team has to specify that the Let’s consider another scenario wherein the
code/JS variables are populated. products landing page wants a “page visit” business wants to track the click of button.
tracking and also the values for the above The business team wants to specify the
3. T he tracking code will be sent to the values as given below:
variables. They can specify that details in
web analytics server as an image
the interface: • Page: The URL of the page
request. The code is sent when the
Page Analytics Tracking Code
appropriate trigger happens (page load • Section/Module: The module which
www.ecommerce.com/products URL=D_url
or event) Browser= D_browser contains the button
User = D_user
4. W
eb analytics server builds a near Geo = D_geo
Locale = D_locale
• Element: Id or class name of the button
real time reports based on the data it Campaign = S_1234
• Event name: Click or mouseover
receives. www.ecommerce.com/products/* productName = D_prodname
• Tracking code: Tracking code as
Web analytics implementation framework Table 3 : Framework interface details specified in the previous table.
is built on top of these steps and provides
The interface also offers highly intuitive
following additional capabilities: Let’s examine what the above table does:
features to visualize the sections while
• Automatic injection of tracking code The “page” column indicates to the web
specifying markers. For instance they
values. For instance population of analytics framework that the URL specified
can preview the section and component
values related to locale, user attributes needs to be tracked. In the “Analytics
(button) for which the tagging is specified
etc. Tracking code” column the business team
to ensure that the marking is done for the
can also specify the values for tracking
• Automatic addition of the events correct component.
code.
that need to be tracked. For instance Specifying tracking code in the interface
tracking the click of a new button. Note: The details of tracking code is
is the only key step that business needs to
technical; to abstract the inner details of
• Automatic re-deployment of the perform. Once it is done they can submit
this tracking code from business team, the
updates to all publishing servers. the details to the framework and it takes
interface pre-populates the tracking code
These steps would essentially equip a care of handling all subsequent steps till
names required for tracking. Business team

External Document © 2018 Infosys Limited


the live deployment. Details are specified in Performance Management • Challenges in infrastructure including
the subsequent sections. low bandwidths
Though traditionally the web analytics
frameworks are used for tracking key • Price sensitivity
B. Transformer
parameters, another facet is related to • Multi device enablement
The transformer component collects all performance. It is possible to understand Let’s look at the implications for each of
the details submitted and converts it into a the following key performance parameters these factors on web analytics framework
JSON object. JSON object is subsequently using the same techniques [8] and implementation.
used by the framework JS files to inject
• Complete Page load times across
the values on to the page. A sample JSON
geographies A. Infrastructure challenges
object for the table III is given below:
• Perceived page load time across As many locations of emerging markets
{ geographies have low bandwidth following web
page: ‘www.ecommerce.com/products’, analytics metrics gain prime importance
URL:’D_URL’, • Load time for assets
Browser:’D_Browser’,
when implementing the web analytics
• Availability across geographies
User:’D_user’, strategy:
Geo:’D_Geo, • Total transaction/process time across
• Perceived page load time is important
Locale:’D_Locale’, geographies.
Campaign:’S_1234’ as the landing pages need to be light
} Cross channel/device performance weight and load quickly.
These parameters would help the business • Availability
Once the JSON is created, the transformer understand how their online platform
• Site usability metrics to enable easier
sends it to the framework JS files. is performing across geographies. As
information discovery
performance is becoming the key success
C. Framework JS factor for the online strategy it is important
B. Price Senstivity
The Framework JS files are the core to track the metrics related to these for a
successful online campaign. E-commerce sites need to identify the
components of this framework. They are
fast moving products and seasonal trends
developed in JS and will perform following Performance needs special consideration
and provide price-sensitive product
tasks: as it is one of the critical factors for
recommendations. Following metrics need
• Read the values from the JSON file business continuity. In this regard the
to be closely monitored:
business has to adopt following process:
• Parse the page DOM after page load • Average order value
• Setup appropriate real time and cross-
• Inject the tracking code to the tracking • Exit rate/bounce rate
geography monitoring process. This
code variables
can be achieved through web analytics
• Inject the tracking code to the specific and other monitoring tools. C. Multi device enablement
components and the events.
• Setup an appropriate notification Enterprises also need to look out for
These JS files need to be included in the process. For instance if the site multi-channel avenues for their online
footer section of the page. response time falls below a pre- strategy. Following are the key metrics in
configured threshold limit, it should this context:
D. Publishing Job send automatic notifications to the • Multi-device load time
Once the JSON and framework JS files site admins.
• Click path analysis
are ready, the publishing job will push/ • Setup a disaster recovery (DR)
• Downloads
deploy these two assets to the appropriate environment to handle business
publishing locations. continuity Note: In addition to identifying the key
metrics, it is always recommended that
Post deployment all the user requests
firms also have a strong strategy for
would get the updated assets once their Relevancy of web analtyics identifying emerging trends.
local cache is cleared.
for the emerging markets
As we can notice that above four
Emerging markets challenges in
components would eliminate the need for
infrastructure and are developing
development team and operations teams.
economic countries. Following are the
It promotes a self-service model wherein
key factors that need consideration
business can deploy their changes on-
while adopting web analytics strategy in
demand. emerging economies:

External Document © 2018 Infosys Limited


Summary effectiveness of intended strategies and There was a discussion of automating the
to discover the emerging patterns along web analytics realization framework from
In this paper we started the discussion with various metrics relevant for these two business alone. We saw how web analytics
with a brief introduction of web analytics strategies. A sound web analytics strategy can be used for performance management
concepts and its importance. We discussed for a business adopts both of these and how the discussion points are relevant
the utilities and few sample scenarios of strategies. for emerging markets.
web analytics. Then we also discussed
two broad strategies to measure the We then touch based AAA model of
implementing a web analytics framework.

References
1. Eric T. Peterson, “Web Analytics Demystified: A Marketer’s Guide to Understanding how Your Web,”

2. Avinash Kaushik, Web Analytics 2.0: The Art of Online Accountability and Science of Customer.

3. Jason Burby and Shane Atchison, “Actionable Web Analytics: Using Data to Make Smart Business Decisions”

4. Bernard J. Jansen, “Understanding User-Web Interactions Via Web Analytics”

5. Dennis R. Mortensen, “Yahoo! Web Analytics: Tracking, Reporting, and Analyzing for Data-Driven”

6. Brian Clifton, “Advanced Web Metrics with Google Analytics”

7. Alistair Croll and Sean Power “Complete Web Monitoring: Watching Performance, Users, and Communities”

8. 4 Key metrics for performance

9. What are the most important metrics for e-commerce companies?

10. Seven Key Performance Metrics for Any E-Commerce Business

11. Empowering customer centricity

About the Author

Shailesh Kumar Shivakumar


Shailesh Kumar Shivakumar is a Technology Architect with the Consulting and Systems Integration division at Infosys. He has over 11 years
of industry experience. His areas of expertise include Java Enterprise technologies, portal technologies, User interface components and
performance optimization.
He can be reached at shailesh_shivakumar@Infosys.com

External Document © 2018 Infosys Limited


For more information, contact askus@infosys.com

© 2018 Infosys Limited, Bengaluru, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys
acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this
documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the
prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document.

Infosys.com | NYSE: INFY Stay Connected


Croatian Operational Research Review 373
CRORR 6(2015), 373–386

Web analytics tools and web metrics tools:


An overview and comparative analysis
Ivan Bekavac1 and Daniela Garbin Praničević1,∗
1
Faculty of Economics, University of Split
Cvite Fiskovića 5, 21000 Split, Croatia
〈{ivan.bekavac8@gmail.com, daniela@efst.hr}〉
Abstract. The aim of the paper is to compare and analyze the impact of web analytics
tools for measuring the performance of a business model. Accordingly, an overview of
web analytics and web metrics tools is given, including their characteristics, main
functionalities and available types. The data acquisition approaches and proper choice of
web tools for particular business models are also reviewed. The research is divided in two
sections. First, a qualitative focus is placed on reviewing web analytics tools to exploring
their functionalities and ability to be integrated into the respective business model. Web
analytics tools support the business analyst’s efforts in obtaining useful and relevant
insights into market dynamics. Thus, generally speaking, selecting a web analytics and
web metrics tool should be based on an investigative approach, not a random decision.
The second section is a quantitative focus shifting from theory to an empirical approach,
and which subsequently presents output data resulting from a study based on perceived
user satisfaction of web analytics tools. The empirical study was carried out on
employees from 200 Croatian firms from either an either IT or marketing branch. The
paper contributes to highlighting the support for management that available web
analytics and web metrics tools available on the market have to offer, and based on the
growing needs of understanding and predicting global market trends.

Key words: web analytics tools, web metrics, user satisfaction, business models, survey
of the IT and marketing sectors in Croatia

Received: October 5, 2014; accepted: October 5, 2015; available online: October 31, 2015

DOI: 10.17535/crorr.2015.0029

1. Introduction

The world is becoming increasingly aware that the Internet is evolving rapidly
and constantly growing as more and more users get online. A presence in the
web sphere is necessary for all organizations and businesses. The Internet
provides numerous multimedia features enabling and changing the way
organizations communicate with their customers, suppliers, competitors and
employees [8]. The web sphere has a direct impact on a user's perception of


Corresponding author.

http://www.hdoi.hr/crorr-journal ©2015 Croatian Operational Research Society


374 Ivan Bekavac and Daniela Garbin Praničević

business success [4] and the strategic importance of web context for modern
business. It also shifts numerous business activities towards the web creating in
the same time new context of business models so called web business models.
According to [10], a business model is described as a business method used by a
particular company to generate revenue and add new value to its
product/services. The same author has also distinguished nine basic categories
of web business models such as: (1) brokerage model, (2) advertising model, (3)
model of information agent, (4) commercial model, (5) manufacturing model, (6)
affiliate/collaborative model, (7) virtual community model, (8) subscription
model and (9) utility/ancillary services model.
Within these business models five common goals [10] can be identified:
• Selling products or services online and the measuring the outcomes by
the number of products sold or services
• Creation of potential client databases and measures of the outcomes
based on the number of collected visitor contacts via web sites
• Content publication directed towards attracting as many visitors as
possible and thereby increase revenue from advertising
• Providing information to the website visitors
• Company branding
Without proper web metrics applied to a business model on a website, it is
almost impossible to measure the effects on visitors, hence proper the proper
choice of a web analytics tool is important. On the other hand, gaining insight is
also important, as to what tool is apt for unique user needs. Based on what has
been said so far, the research task is to investigate the following: Web analytics
tools track and improve a user’s satisfaction with web-based business models.

2. Basic concepts related to web analytics and web metrics

2.1. Web analytics

According to the official definition of [13], web analytics refers to a combination


of (a) measuring, (b) acquisition, (c) analyzing and (d) reporting of data
collected from the Internet with the aim of understanding and optimizing web
experience.
Measuring (a) incorporates different metrics [1] and expressed in the form
of numbers, ratios and key performance indicators (KPI’s). Data acquisition
activity (b) is mainly done through one of the two most widely used methods:
(1) using log files that gather data from a server, and (2) using popular methods
of tagging websites supported by JavaScript code.
Log files are contain data collected from a company server, regardless of
the visitor’s type of Internet browser. Data acquisition activities on a server
come in the form of a text file containing server-side collected data. These
activities are related to requests directed to a web server, such as displaying
Web analytics tools and web metrics tools: An overview and comparative analysis 375

pages, images or PDF files [3]. On the client side, site tagging is carried out
using JavaScript code inserted into every web page and is run and recorded
each time a user opens a tagged webpage. Visitor behavior is then recorded in a
separate file [12].
Furthermore, the purpose of analyzing data (c) is to transform data into
information useful for a decision-making process [12]. In that sense, special
attention should be given to selecting appropriate web analytics tools while
taking into account a company’s specific characteristics and goals, as well as
employing the staff who are competent in “discovering” useful information for
supporting decisions that are based on large amounts of acquired data.
Finally, reports are generated (d) based on selected metrics outputs which
in turn are useful for company management.
Data originating from the Internet offers relevant information on website
traffic, website transactions, server performance and information submitted by
users themselves [9]. Understanding of the web and website optimization
provides a more adapted approach to a target audience with the goal of
increasing conversion rates [12], as well as customer loyalty [4]. Analyzing
website traffic provides insight into the number of visitors, their geolocation,
visitor locations, time spent on websites and other parameters. Web analytics
also provides other advantages such as increasing efficiency and cost reduction
[3]. Marketers can also find web analytics data useful for improving
products/services and evaluating the success of a marketing campaign. In
addition, web designers and web developers use such data for improving website
usability and consequently, website user satisfaction. Web analytics provides
company management with the insight into how to generate revenue from a
website, how to create appropriate user experience and improve its competitive
advantage [6], as well as to support continuous improvement and
competitiveness [12].
The said author proposes a definition of web analytics as follows: the
analysis of qualitative and quantitative data on the website in order to
continuously improve the online experience of visitors, which leads to more
efficient and effective realization of the company’s planned goals. Quantitative
data provide insight into visitor behavior such as the previous web page prior to
reaching the actual website. In addition, the acquisition of qualitative data
provides answers as to why visitors behave in a certain way. Continuous
improvement of online user based on information obtained in web analytics is a
key aspect of the web analytics concept.
Improved business results based on decisions supported by information
gained from web analytics certainly justify further expenditure in web analytics.

2.1.1. Web analytics as a process

Web analytics is not a technology for just reporting, but a cyclical process of
website optimization which, among other things, measures costs, identifies the
376 Ivan Bekavac and Daniela Garbin Praničević

most profitable user behavior and optimizes a website by improving performance


and profitability. Waisberg & Kaushik [12] identified the following steps in a
web analytics process:
• Objective (goal) determination
• KPIs definition
• Data collection
• Data analysis
• Change implementation
The issue of “the objectives of a website” is complex and there is no single
answer. It depends on type of business models. As an example and in terms of a
commercial business model, the objective of a website is to support product
sales by provide all the relevant information to visitors with the intention of
transforming them into buyers. Determining these activities is important, as
they lead to selecting the appropriate metrics to track the success of a website,
especially while keeping in mind that a website should achieve a certain return
on investment (ROI). Measuring the success of objectives is accomplished by
defining key performance indicators (KPIs) that show the particular progress or
detect lagging in achieving goals. Each KPI is expected to correlate with a
specific action and match the criteria/attributes of timelessness, simplicity,
relevance and usefulness. It is recommended that a defined KPI has to be
promptly available (timelessness), simple for decision maker to understand
(simple), as well as relevant (relevance) and useful (usefulness) to the specific
company.
The example [12] of one qualitatively defined KPI that covers all of these
attributes is the bounce rate defined as the percentage of visitors who leave a
website after the first page visit. This metric is simple, i.e. easy to understand
and explain. It is relevant because it identifies sites with a lower content,
technically or in terms of quality. This KPI is also timely since is readily
available in all web analytics tools. Finally, the observed KPI offers instant
usefulness given that a decision maker can act immediately and focus on pages
with a high bounce rate.
The next activity in the considered process is accurately collecting data and
saving it in a database for subsequent data analyses. Data analysis includes
observing and transforming previously collected data in order to discover useful
information that supports future decisions. During this activity, numerous
associations, patterns or trends that exist in the data set are revealed. Finally,
implementing change is the last but not least activity in the process. In other
words, collected and analyzed data provide new information and support change
implementation. Otherwise, all the previous activities are baseless. The scheme
of the web analytics process is shown in Figure 1.

Figure 1: Activities in the web analytics process, Waisberg and Kaushik [12]
Web analytics tools and web metrics tools: An overview and comparative analysis 377

2.1.2. Web analytics tools

A variety of web analytics tools have been developed and are available on the
market that aim to obtain quantitative and qualitative data as a basis for the
decision-making process, The author [11] has classified web analytics tools into
five categories:
1. Traditional web analytics tools that mostly relied on clickstream data
obtained by the visitors themselves, competitors and data from the
company’s internal sources.
Clickstream data generally corresponds to the question “what happens
on websites” or “visitor behavior while browsing the website” and “how
many conversions have been achieved on the website”.
2. Web analytics tools that track performance on social networks
3. Web analytics tools for gathering visitor feedback aim to answer the
question "reasons for the visitor behaving or not behaving in a certain
way".
4. Web analytics tools for mobile websites with a growing importance in
line with an increase in website turnover caused by the use of mobile
devices. These tools provide insight into visitor behavior on websites
accessed via mobile devices similar to traditional web analytics tools,
and are necessary for achieving compatibility with mobile devices [5].
5. Web analytics tools for experimenting, testing and find optimal
technical or design solutions that should improve visitor satisfaction.
In terms of the process of selecting a web analytics tool, the working team is
responsible for the following [7]:
• Distinguishing whether the company needs to implement either
reporting or an analyzing process in its business model. Accordingly,
certain ineffective tools are eliminated.
• Assessing a company’s temporary IT capabilities.
• Taking into consideration web tool features in line with a company’s
requirements.
Here, the focus is reduced solely to traditional web analytics tools based on
clickstream data within can be used to identify two categories based on the form
the tool is available: first, the software is installed on an organization’s
computers and, secondly, as a service (SaaS - Software as a Service) provided by
ASP - Application Service Provider [4]. The traditional web metrics tools are
available on the market as open source or a commercial package. Each of these
has its advantages and disadvantages (Table 1).
378 Ivan Bekavac and Daniela Garbin Praničević

Web analytics tools † Advantages Disadvantages


customizable dashboard,
conversion visualization, data
Google Analytics -
export in various formats,
advanced segmentation features
provides detailed information,
Webtrends Analytics
excellent heat maps feature, (relatively) high price
access to real time data
not recommend for
FireStats easy to use, downloadable data
beginners due to install
Web log tools

raw logs, real time data


requirements
not possible to provide
reveals how much time visitors
AWStats an in-depth analysis,
spend on site, processes raw log
neither to measure user
files, open-source
activity
does not use cookies,
Webalizer updates log files throughout the
possible overestimation
day, easy to understand reports
of data
StatCounter access to real time data,
outdated user interface
provides two levels of analysis
Mint needs server
customizable, friendly interface
configuration
customizable, data control, real
Piwik
time traffic reports, -
segmentation
Clicky real-time data, heat maps, split- interface is not user-
testing and A/B testing friendly.
Chartbeat real time data, friendly user no detailed historical
interface data available
GoSquared pinging feature that reveals how
Page tagging tools

monthly page view limit


long visitor stayed on site
FoxMetrics tracks events like newsletter key features available in
views premium plan
GoingUp includes SEO related tools,
-
provides heat maps
tracks visitor mouse movement,
eTracker (relatively) high price
survey options
IBM Unica NetInsight very flexible and customizable
(relatively) high price
reports, customizable dashboards
Stuffed Tracker feature that enables comparing installation could be
organic and paid traffic difficult for beginners
Crazy Egg
excellent heat map feature -

Table 1: Traditional features of web analytics tools analyzed by the authors


The online source for each presented web analytics tool is found at the end of this paper.
Web analytics tools and web metrics tools: An overview and comparative analysis 379

The wide range of web analytics tools makes the selection process more
complex and time consuming. Accordingly, selecting the appropriate tool should
take into consideration a company’s unique characteristics. In the process of
selecting the web tool, the team using the web tool should consider usability,
functionalities, technical details and the total cost of the tool. In other words,
the market should be studied while focusing on the tool’s features such as the
possibility of installing and deploying software locally, customer support, costs,
data segmentation possibilities, download options, ownership of collected data
and the possibilities of integrating data from other sources in the actual web
analytics tool.

2.2. Web metrics

The common features of web metrics including collecting specific visitor actions
and the exclusion of search engine robots that search content on the website
while indexing it. Effective web metrics has to be based on generally accepted
terms, definitions and practices [13]. Web analytics incorporate web metrics,
thus providing benefits for online businesses [14] such as the ability to analyze
and increase sales, ability to track revenue generated by the site, ability to
identify exit pages, and consequently improving website content, the monitoring
of visitor traffic and detection of website errors. The most common types of web
metrics are available as option in web analytics tools as presented in Table 1
[13]:

• Metrics for describing visits – the category refers to dimensions such as


the front page (entry page), the target page (landing page), exit pages
(exit page) and metrics such as duration of the visit (visit duration -
time on site), source of traffic (referrer) and number of clicks on the
link (click-through rate).
• Metrics for describing visitors - the metrics in this category present
different attributes that characterize website visitors and support the
process of visitor segmentation. The metrics are: new visitors (the
number of unique visitors who created a session for the first time on the
site during the reporting period, returning visitor (indicates how many
unique visitors have interacted with the site during a reporting period,
supposing that it is their first visit before observed period, repeat visitor
(number of unique visitors who have made two or more visits to the site
during a reporting period), visits per visitor (number of visits per
visitor), recency (the time elapsed since the specific action was taken by
a unique visitor to the site) and frequency (the number of times required
by a unique visitor to take a specific action such are visit the site,
purchase or download from the site during a reporting period).
• Metrics for describing the visitor engagement – the category includes
metrics that describe the degree of visitor interaction. Accordingly, the
380 Ivan Bekavac and Daniela Garbin Praničević

related metrics calculate the proportion of those leaving a website with


respect to the total number of webpage views (page exit ratio), the rate
of withdrawal (bounce rate) and the number of visited pages per visitor
(page views per visitor).
• Conversion metrics – the collection of special website activities that
provide business value, such as metrics that indicate the number of
successfully achieved set goals (conversion) and the ratio between the
number of realized conversions and other relevant metrics (conversion
rate).
Examples of conversion rate are the number of conversions in relation to
the total number of site visits, or the number of conversions in relation to total
number of site visits where products are added to a cart.
These mentioned metrics are often a company's key performance indicators
(KPI’s) relating to easily understood indicators covering website performance
and website changes in an observed period [9].
The choice of metric depends, as mentioned above, on the company and its
goals. In spite of this, the most common key performance indicators can be
categorized by type of business model. In addition, Peterson [9] distinguishes
business models and the respective performance indicators as listed in Table 2.
Business model KPI's
Ratio of new to returning visitors, percentage of new visitors, referring
Commercial
domains, search keywords/phrases, average order value, key
model
conversion rates.
Overall traffic volume, number of visits, ratio of new to returning
Advertising model visitors, percentage of new visitors, referring domains, average number
of visits per visitor, average time on site.
Overall traffic volume, percentage of new visitors, ratio of new to
Subscription
returning visitors, referring domains, search keywords/phrases, key
model
conversion rates.
Utility model Overall traffic volume, percentage of new visitors, referring domains.
Ratio of new to returning visitors, percentage of new visitors, referring
Model of
domains, overall traffic volume, percentage of new visitors, average
information agent
pages viewed by visitor.
Manufacturing Search keywords/phrases, entry pages, referring domains, overall
model traffic volume, key conversion rates, number of visits.
Affiliate model Overall traffic volume, referring domains, average time spent on site.
Community Overall traffic volume, average time spent on site, referring domains,
model percentage of new visitors
Number of returning visitors, frequency of visit, number of visits,
Brokerage model
search keywords/phrases.
Table 2: Business models and associated performance indicators [9], modified
Web analytics tools and web metrics tools: An overview and comparative analysis 381

Keeping this in mind, information gathered using these indicators offers a


relevant background for different ways of analyzing and generating reports,
which is operationally and strategically important for any business system.

3. Empirical research based on perception of the usefulness


of web analytics tools

The empirical section of the paper presents the results of a one-month research
(March 2015) conducted among the employees from 200 Croatian IT and
marketing firms. Employees were asked to assess their satisfaction with web
analytics tools used by their company and for the associated business model.
Therefore, the questionnaire was created using the Google Form option on
Google Drive and was sent out via e-mail. The return rate was almost 54%
i.e.107 questionnaire were completed.
The methodology analyzed the use of web analytics tools and user
satisfaction, and included descriptive statistics, Friedman test, bivariate
correlation and multiple regressions. The software package SPSS Statistics 17.0
software package was used for statistical processing and calculations.
Descriptive analyses revealed that the major part of survey respondents
were male (58.9 %), in the age group of 21 to 30 years of age (73%), university
educated (85%), introduced as users of web metrics for business purposes (67%).
Web analytics tools are used for different purposes by the respondents:
marketing (75.7%), management (36.4%), web development (13.1%) and others
fields (4.7%). The most frequently used web analytic tool was Google Analytics
(93, 5%) with the other tools (6, 5%) being Webtrends Analytics, FireStats,
Webalizer, Tableau, Flurry and ARIS Connect.
When comparing the frequency of using web analytics tool for activities
such as measuring, collecting, analyzing and reporting data, it was observed
that web tools are mostly used for collecting (mean= 3.43) and analyzing data
(mean = 3.75). Each activity also involves analyzing the user’s satisfaction with
using a web tool in regard to proper support for particular activities. The
respective mean values are presented in Table 3.

Mean
Activity
Frequency Satisfaction
Measuring 3.07 3.64
Collecting 3.43 3.75
Analyzing 3.35 3.73
Reporting 2.96 3.52
Table 3: Mean values of frequency and satisfaction with web analytics tools according
to activities (N=107)
382 Ivan Bekavac and Daniela Garbin Praničević

Using the Friedman nonparametric test, the differences in frequency of


usage for a web analytic tool according to activities have been found significant
(p=0.00), as well as the differences in level of user satisfaction (p=0.05).
The results indicate that web analytics tools are most frequently used for
advertising (mean=3.93) and commercial business models (mean=3.43) followed
by community (mean=3.42), information agent (mean=3.31), affiliate
(mean=3.26), subscription (mean=3.20), utility (mean=3.07), manufacturing
(mean=3.05) and brokerage (mean=2.92) business model. The users also express
the highest level of satisfaction relating to the advertising (mean=3.65) and
commercial (mean=3.38) models followed by the community (mean=3.35),
affiliate (mean=3.34), information agent (mean=3.27), subscription
(mean=3.22), utility (mean=3.18), manufacturing (mean=3.09) and brokerage
(mean=3.05) business model. Hereafter, Friedman nonparametric test verified
that the differences in frequency of a web analytics tool according to the
business model are also significant (p=0.00), as well as differences in the level of
satisfaction (p=0.00). A positive correlation has been proved between the usage
frequency and user satisfaction of a web analytics tool based on each activity.
The associated correlation coefficients and significances (p=0.00 for each
activity) are shown in Table 4. The activities such as measuring
(correlation=0.588) and data analysis (correlation=0.581) indicate a higher
correlation when compared with other activities.

Frequency vs. Pearson p


Satisfaction Correlation (2-tailed)
Measuring 0.588** 0.000
Collecting 0.470 **
0.000
Analyzing 0.581 **
0.000
Reporting 0.531** 0.000
**. Correlation is significant at the 0.01 level (2-tailed).
Table 4: Bivariate correlations between usage frequency and usage satisfaction of web
analytics tools based on each activity (N¬=107)

Furthermore, a positive correlation has been proven using a bivariate


correlation between the usage satisfaction and usage frequency of web tools
based on observed business models (p= 0.00 for each model). Business models
such as brokerage (correlation = 0.597) and advertising (correlation=0.588)
indicate a higher correlation in comparison to other models. These results are
presented in Table 5.
Web analytics tools and web metrics tools: An overview and comparative analysis 383

Frequency vs. Pearson p


Satisfaction Correlation (2-tailed)
Brokerage 0.597** 0.000
Advertising 0.588** 0.000
Information agent 0.420** 0.000
Commercial 0.467** 0.000
Manufacturing 0.565** 0.000
Affiliate 0.465** 0.000
Community 0.418** 0.000
Subscription 0.533** 0.000
Utility 0.493** 0.000
**. Correlation is significant at the 0.01 level (2-tailed).
Table 5: Bivariate correlations between the usage frequency and usage satisfaction of
web analytics tools based on business models (N=107)

In order to explore which activities that are supported with certain web
analytics tool contribute more to the level of satisfaction with the business
model integration ability, the multiple regression is applied. It was found that
activities such as collecting and reporting activities, significantly contribute the
level of satisfaction with a business model. Table 6 indicates that the users are
significantly satisfied with the integration of web analytics tools in business
models during data collection activities (p=0.035) and the reporting of data
(p=0.065).

Coefficients a

Unstandardized Standardized
Model Coefficients Coefficients t p
B Std. Error Beta
(Constant) 1.821 0.282 6.447 0.000
Satisfaction_measuring -0.009 0.079 -0.013 -0.109 0.913
Satisfaction_collecting 0.188 0.088 0.258 2.133 0.035
Satisfaction _analyzing 0.076 0.083 0.107 0.913 0.363
Satisfaction_ reporting 0.143 0.077 0.220 1.864 0.065
**. Dependent Variable: Satisfaction of web analytical tools integration/models
a.

Table 6: Multiple regression, the business model using web analytics tools (N=107)

In summary, the results indicate a significant presence and use of web


analytics in various activities within web business models, the conclusion is that
web analytics tools support and improve user satisfaction with these business
384 Ivan Bekavac and Daniela Garbin Praničević

models. Nevertheless, there is room for improvement in terms of better


integration of web analytics tools in web-based business models.

4. Results interpretation and conclusion

The empirical results based on the analysis of data collected from 107 survey
respondents indicate that web analytics tools are well accepted and applied in
IT and marketing companies. A descriptive statistic output revealed that
acceptance of web analytics tools conforms to any other acceptance of
technological innovation. In this survey, younger users dominate the age group
of users, possibly implying that the use of web analytic tools is more popular
among the younger generation [2]. The use of web analytics tools in business
fields, as expected, is most frequent in the marketing industry.
Specifically, a comparison of results points out that web analytics tools are
mostly used for data collection and analysis (Table 3), indicating the highest
observed correlation corresponds to satisfaction with measuring and analyzing
activities (Table 4), and evidently there is some room for improvement in
functionalities regarding user satisfaction in collecting data. The
recommendation is that software companies developing tools should place
additional focus on this particular set of functionalities. The survey has shown
that data collection is the most frequent tool activity and any improvement or
enhancement in this activity could have a significant impact on user
satisfaction.
Moreover, according to the obtained results, web analytics tools are most
frequently used in advertising and commercial models where users also expressed
satisfaction. Additional results indicated that the highest correlation between
the usage frequency of web analytics tools and the satisfaction with such tools is
observed in the brokerage model and in the advertising model. The expectation
was that more frequent usage of web analytics tools in the advertising model
implies a higher user satisfaction. This research has shown that use in the
brokerage model is low, and the correlation between usage frequency and usage
satisfaction of web analytics tools is the highest. It implies that web analytics
tools in the brokerage model are not recognized enough and promotion
campaigns for additional use in the model is required. The results providing a
correlation between frequency and satisfaction for the mentioned activities
(Table 4) and a correlation between frequency and satisfaction in the business
models (Table 5) are as expected.
Finally, emphasis should be placed on the fact that user satisfaction with
web analytics tools used for the collection of data and reporting of activities
improves significantly when such tools are integrated in the business models.
The conclusion that the authors stress, based on the outcome of theoretical
research, is that web analytics tools and the associated web metrics as statistical
indicators of website activity can potentially improve user satisfaction if a
Web analytics tools and web metrics tools: An overview and comparative analysis 385

business model’s website. Regardless of the fact that some of analyzed web
analytics tools and the associated web metrics are freely available, whereas
others are not, it is undisputable that each of the tools can be integrated into
the respective business models. Successful implementation of web analytics tools
requires proper selection given that each website is unique and determined by
the nature of related business model and its supporting technologies. Thus
focusing on the proper either tools either metric, form the basis to strengthen
management support and, that may imply better business results.
The empirical research results based on the perception of web analytics
tools indicate that web analytics tools support user satisfaction on the web of
business models although further modifications are evident, at the technical and
organizational level.
Future studies in this field should include an assessment of web analytics
tools based on clickstream data for web analytics tools featured to track
performance on social networks or mobile devices, as well as tools for collecting
feedback from visitors, and for conducting different testing and experiments.

References

[1] Burby, J., Brown, A. and WAA Standards Committee (2007). Web Analytics
Definitions – Version 4.0, Web Analytics Association.
[2] Chunga, J. E, Parkb, N., Wangc, H., Fulkd, J. and McLaughlind, M. (2010). Age
differences in perceptions of online community participation among non-users: An
extension of the Technology Acceptance Model. Computers in Human Behavior, 26,
6, 1674–1684. doi:10.1016/j.chb.2010.06.016.
[3] Clifton, B. (2010). Advanced Web Metrics with Google Analytics (2nd ed.). Indiana:
Wiley Publishing, Inc.
[4] Creese, G. and Veytsel, A. (2000). Web Analytics: Translating Clicks into Business.
Boston: The Aberdeen Group, Inc.
[5] Gupta, R., Mehta, K., Bhavsar, K. and Joshi, H. (2013). Mobile web analytics.
International Journal of Advanced Research in Computer Science and Electronics
Engineering (IJARCSEE) 2, 3, 288–292.
[6] Kaushik, A. (2007). Web Analytics: An Hour a Day. Indiana: Wiley Publishing, Inc.
[7] Kaushik, A. (2009). Web Analytics 2.0: The Art of Online Accountability and
Science of Customer Centricity. Indiana: Wiley Publishing, Inc.
[8] Omidvar, M. A., Mirabi, V. R. and Shokry, N. (2011). Analyzing the impact of
visitors on page views with Google Analytics. International Journal of Web &
Semantic Technology (IJWesT), 2, 1, 14–32. doi:10.5121/ijwest.2011.2102.
[9] Peterson, E.T. (2004). Web Analytics Demystified: A Marketer's Guide to
Understanding How Your Web Site Affects Your Business. Celilo Group Media and
CafePress.
[10] Rappa, M. (2010). Business models on the web.
http://digitalenterprise.org/models/models.html [Accessed on 12 March 2014].
386 Ivan Bekavac and Daniela Garbin Praničević

[11] Teixeira, J. (2011). Get Involved: 5 Types of Web Analytics tools to start using
today! http://www.morevisibility.com/blogs/analytics/get-involved-5-types-
of-web-analytics-tools-to-start-using-today.html [Accessed on 03 June 2014].
[12] Waisberg, D. and Kaushik, A. (2009). Web Analytics 2.0: Empowering Customer
Centricity, SEMJ.org 2, No. 2.
[13] Web Analytics Association (2008). Web analytics definitions – draft for public
comment.
http://www.digitalanalyticsassociation.org/Files/PDF_standards/WebAnalytic
sDefinitions.pdf [Accessed on 10 September 2014].
[14] Zara, I. A., Velicu, B. C., Munthiu, M. C. and Tuta, M. (2012). Using analytics for
understanding the consumer online.
http://fse.tibiscus.ro/anale/Lucrari2012/kssue2012_129.pdf [Accessed on 10
September 2014].

Web analytics tools online sources:

• AWStats. http://www.awstats.org/ [Accessed on 01 August 2014].


• Chartbeat. https://chartbeat.com/ [Accessed on 01 August 2014].
• Clicky. http://clicky.com/ [Accessed on 01 August 2014].
• Crazy Egg. http://www.crazyegg.com/ [Accessed on 04 August 2014].
• eTracker. https://www.etracker.com/en.html [Accessed on 01 August 2014].
• FireStats. http://firestats.cc/ [Accessed on 01 August 2014].
• FoxMetrics. http://foxmetrics.com/ [Accessed on 01 August 2014].
• GoingUp. http://www.goingup.com/ [Accessed on 01 August 2014].Google
Analytics.http://www.google.com/analytics/ [Accessed on 01 August 2014].
• GoSquared https://www.gosquared.com/ [Accessed on 01 August 2014].
• IBM Unica NetInsight. http://www-03.ibm.com/software/products/en/on-
premise-web-analytics [Accessed on 01 August 2014].
• Mint. http://haveamint.com/ [Accessed on 01 August 2014].
• Piwik. http://piwik.org/ [Accessed on 01 August 2014].
• StatCounter.http://statcounter.com/ [Accessed on 01 August 2014].
• Stuffed Tracker. http://www.stuffedtracker.com/ [Accessed on 01 August 2014].
• Webalizer http://www.webalizer.org/ [Accessed on 01 August 2014].
• Webtrends Analytics. http://webtrends.com/ [Accessed on 01 August 2014].
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/266207034

Drive a Website Performance Using Web Analytics: A Case Study.

Article · January 2013

CITATIONS READS

0 309

5 authors, including:

Constantinos Coursaris Wietske Van Osch


Michigan State University Michigan State University
79 PUBLICATIONS   1,015 CITATIONS    54 PUBLICATIONS   288 CITATIONS   

SEE PROFILE SEE PROFILE

Carolina López-Nicolás Francisco-Jose Molina-Castillo


University of Murcia University of Murcia
40 PUBLICATIONS   1,401 CITATIONS    55 PUBLICATIONS   1,539 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

The Rise of the Promoters: User Classes and Contribution Patterns in Enterprise Social Media View project

Boundary spanning over enterprise social media View project

All content following this page was uploaded by Constantinos Coursaris on 28 June 2016.

The user has requested enhancement of the downloaded file.


Coursaris et al. Website Performance Optimization via Web Analytics

Driving Website performance using Web Analytics:


A Case Study
Research-in-Progress

Constantinos K. Coursaris Wietske van Osch


Michigan State University (USA) Michigan State University (USA)
coursari@msu.edu vanosch@msu.edu

Carolina López-Nicolás Francisco-José Molina-Castillo Nicolas Rapp


University of Murcia (SPAIN) University of Murcia (SPAIN) Noomiz.com (FRANCE)
carlopez@um.es fjmolina@um.es rappnicolas@gmail.com

ABSTRACT
With the introduction of Web Analytics into Web Marketing, organizations now have the opportunity to measure, track, and
analyze the behavior of website users. The REAN model, standing for Reach, Engage, Activate and Nurture, appears to be
the most relevant model to plan and measure activities. This model is used to set goals, objectives and define metrics in order
to improve a website’s performance using Web analytics. Based on academic papers, official sources, white papers, and best
practices, the main research objective of this paper is to establish a list of optimization actions to be implemented, and to test
if these actions have a positive impact on website performance. Preliminary findings from this research-in-progress paper
may assist managers on: 1) how to attract new visitors to expand website traffic, 2) how to transition visitors to users with an
increase in registrations, and 3) how to build a loyal audience with repeat website visitors.

Keywords
REAN model, Web Analytics, website performance, optimization actions.

INTRODUCTION
In January 2012, there were more than 500 million websites worldwide (Netcraft, 2012). The abundance of websites made
the Internet a highly competitive environment, which mandated the need for websites to constantly improve performance in
order to be the best among its competitors. Visitors are key for websites. Therefore, improving website performance involves
being able to follow and measure visitor/user behavior. There are several measurement tools that provide large database
reporting on users behavior.
With the emergence of Web analytics tools early in 2000, data analysis became a strategic element for website optimization.
The availability of measures for quantifying website visibility—such as user traffic and behaviors—has resulted in growing
strategic importance of these measures for companies. However, having a manager monitor audience behaviors and manage
measurement tools is not enough as most websites also use a Web analytic tool. Indeed, it is hard to determine what actions
to undertake in order to have a positive impact on traffic analytic indicators. Besides, Web analytics tools are efficient for
assessment and optimization of a website. A Web manager’s time should therefore be used to respond to new items requiring
in-depth, critical thinking to inform strategic decision-making and problem solving. In addition to following Key
Performance Indicators (KPI), it is necessary to master the different levers of website performance and, more specifically, the
actions that can lead to website optimization. Data collected by Web analytics tools allow advertisers to not only rely on the
feeling that this particular lever works properly but also drive business strategy based on concrete data while optimizing
operations in real time.
This research aims to provide an overview of website performance based on pre-selected criteria as well as show if defined
optimization actions have a positive impact on performance. As there are endless ways to define a website’s performance, the
main objective is to determine a model showing the main topics that illustrate performance. Additionally, there are infinite
ways to improve a website. The second objective is therefore to select optimization actions from different sources according
to the defined objectives and to test each of these actions and verify if indeed there is a positive impact on said performance.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 1
Coursaris et al. Website Performance Optimization via Web Analytics

Finally, it is necessary to select relevant metrics to measure performance in order to be able to monitor the performance
evolution according to different actions.
Hence, the research has two main objectives: 1) Establish a list of optimization actions to be implemented based on extant
academic papers, official sources, white papers, and best practices; and 2) Test if these actions have a positive impact on
website performance. The main query of this research is how to drive a website so as to improve its performance. As it is a
very broad issue, it is mandatory to factor performance aspects of a website and to define the appropriate assessment criteria.
Based on our research objectives, here are the sub questions that we want to answer which form the guidelines for the study:
o How does a website attract new visitors, thereby increasing site traffic?
o How are visitors transformed into users by generating registrations?
o How does a website establish a loyal audience base and incite users to return?

LITERATURE REVIEW
Measuring a website’s performance is complex because there is a multitude of criteria to define performance. Fortunately, the
literature on this issue is plentiful and broad and it is therefore necessary to choose an appropriate model covering relevant
criteria which reflect website performance measurement. At first sight, the Awareness, Interest, Desire and Action (AIDA)
model seemed to be relevant because it offers a general understanding of the effectiveness of communication endeavors with
respect to advertising (Lewis, 1898; Glowa, 2002). This model could be applied to illustrate website performances in the
sense that it implies a website must generate awareness, interest, desire, and action (Ber and Jouffroy, 2012). However, given
the inherent subjective nature of the desire construct, quantitative operationalization and measurement through Web analytics
is infeasible (Jackson, 2009).
Most studies report using the ACT model (Kabani, 2010), a three pillars model based on the Attraction, Conversion and
Transformation. However, the ACT model only accounts for the initial attraction of users, but ignores the equally important
activity of user retention. Therefore, the REAN model appears to be the most relevant and complete to cover the key research
questions (Blanc, Kokko, 2006). Indeed, it takes into account four essential goals—reach, acquire, convert and retain—that
define a successful website (Kermorgant, 2008).

The REAN model


The REAN model (Figure 1) can be defined as follows: “every business website is affected by REAN. They all need to reach
their potential customers, then they need to engage with them, activate them and finally you need to nurture them, in other
words encourage them to come back” (Jackson, 2009).

Source: Jackson, 2009


Figure 1. : The REAN Model

The REAN model is a powerful framework that gives a clear overview of a website’s performance structure and helps to
define a measurement strategy (Jackson, 2009). The model can be used to define and plan online activities for optimization in
order to measure Return On Investment (ROI) (Shannak and Qasrawi, 2011).

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 2
Coursaris et al. Website Performance Optimization via Web Analytics

Reach
“Reach sources the methods you use to attract people to your offer. It also includes how you raise awareness among
your target audience” (Jackson, 2009).

Thus, the aim is to generate traffic to the website. According to Visser and Weideman (2010), there are four types that traffic
is composed of:
• Direct Traffic: “when a visitor visits the website directly (by typing in the URL directly into the browser or by
means of bookmarks and/or favorites),”
• Referral Traffic: “when a visitor visits the website via a link from another website, also without making use of a
search engine,”
• Search Traffic (Organic): traffic generated by “unpaid search result listings,”
• Search Traffic (Paid): traffic generated by “paid search result listings” (Visser and Weideman, 2010).

High ranking on a search engine’s results page increases website traffic (Oneupweb, 2005). Moreover, 91.8% of search
queries in France (the context of our case study) are made from Google (Médiametrie, 2012). Finally, 75% of users do not
look beyond the first page of the search engine results (Jenkins, 2011). This is why it is necessary to optimize a website’s
presence on search engines.
To do this, there are Search Engine Optimization (SEO) actions that are essential to implement and likely to dramatically
increase the number of page visits (Berger, 2011). Such SEO actions are summarized in Table 1.
• In-Page optimization
It is mandatory to optimize web pages at its HTML source. Several in-page criteria must be respected (King, 2008). Google
provides a starter guide for SEO, which includes several actions to be implemented within website source code, as
summarized in Table 1.
• Off-Page optimization
The very first search engines (i.e. Altavista) used to operate solely on the basis of in-page criteria. Then, Google arrived and
started to use relevance criteria based on context, environment, or popularity (Andrieu, 2012). Google measures page
popularity with Page Rank (PR), which is essentially based on the number of pages redirecting to the website (Brin and Page,
1998). Net linking is the most efficient strategy to promote a website by creating new links redirecting to it, and therefore
improve its PR (Prat, 2011).
Name Action to implement

Description meta tag Use a 150-character description within the meta description summarizing the page's content.

Improve URL structure Use simple-to-understand URLs in order to enhance Google's spider crawling.
In-Page

Use a sitemap Make a sitemap consisting of a hierarchical listing of the pages of the website.

Use <hn> tags Present in <hn> tags the structure of the page.
Off-Page

Net linking Create links pointing to the website.

Table 1. Actions to reach new users

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 3
Coursaris et al. Website Performance Optimization via Web Analytics

Engage
“Engage is how people interact with your business. Engage is essentially the process before a point of action that
helps your prospect come to decisions” (Jackson, 2009).
However, as aforementioned, user engagement is not studied in this research and, consequently, will not be further discussed.

Activate
“Activate means a person has taken a preferred point of action. Typical examples include a person purchasing a
product, a newsletter subscription or a sign-up” (Jackson, 2009).
“Conversion Rate (CR) is the art and science of persuading your site visitors to take actions that benefit you” (King, 2008,
P111). As defined in the research questions, one of the strategic objectives of a website is to increase registrations.
To this end, it is essential to understand the impact a registration form can have on a website. Eighty-six percent of Internet
users are likely to leave a website, because they are asked to sign in (Rolka, 2012). According to Rolka (2012), 42% of users
think that this process is too long. Therefore, it is essential to simplify the process to enhance conversions. Making
registration processes quick and easy with intuitive navigation and a minimum number of clicks will decrease the
abandonment rate, and therefore, increase the CR (Dodson and Davis, 2011).
In addition, in order to increase the CR, it is essential to provide reasons to register by clearly defining and communicating
why the visitor should register on the website with benefit-oriented headlines (Page, Ash, and Ginty, 2012). Table 2 shows
actions to transform visitors into users.

Name Action to implement


Call to action Display a catchy slogan defining reasons to register in order to make users eager to sign up.
Enhance conversion funnel Simplify the registration process to decrease give-up rate.
Call to action Display a pop-up window for unregistered users to remind them to sign up.
Table 2. Actions to transform visitors into users

Nurture
“Nurture describes the method of retaining and re-engaging with activated consumers. The consumer is a person
who has already taken at least one preferred point of action” (Jackson, 2009).

Nurture can be also defined as the capacity of the website to make users return (Kermorgant, 2008). In other words, it is
necessary to ensure that visitors will have a reason to come back again, thus building visitor (customer) loyalty. There are
very few actionable resources that deal with the concept of website e-loyalty. However, in order to better understand the
concept of loyalty it is relevant to review the best practices by Social Networking Sites (SNSs). As SNSs are highly addictive
(Kuss and Griffiths, 2011), they can provide insights regarding features that incite users to return. Thus, it is relevant to
review key success factors of some SNSs.
First, the Logged-In Landing Page (LP) has to be user centric. This means that the LP needs to have a personalized
dashboard, which contains the main features available to that user (Fanelli, 2010). The “who's visited your profile?” feature is
popular (Glad, 2011), while the query "who's viewed my Facebook profile" in Google generates 497,000,000 results. To
illustrate this features popularity, the popular professional SNS, LinkedIn, provides it, and Viadeo even monetizes it.
“News Feed highlights what's happening in your social circles” (Sanghvi, 2006). The news feed feature is one of the key
success factors of SNSs and is a reason why users become so loyal, i.e. why they come back (Yu, Hsu, Yu and Hsu, 2012).
Table 3 shows these nurture options.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 4
Coursaris et al. Website Performance Optimization via Web Analytics

Name Action to implement


Landing page Design a special landing page for signed-in users.
Implement in the Logged-in LP a news-feed widget highlighting what's happening in the user's
News-feeds
social circle.
Who's visited my
Implement a feature permitting the user to see who has visited his profile.
profile?
Table 3. Actions to make visitors come back

Web Analytics
Web analytics is a fairly recent domain with implications and value added benefits still being discovered. The Web Analytics
Association defines it as: “the objective tracking, collection, measurement, reporting and analysis of quantitative Internet data
to optimize websites and marketing initiatives” (Burby and Brown, 2007).
Three factors have contributed to the emergence of this discipline within companies of all sizes and from all sectors (Arson,
2012): 1) the possibility to measure a greater part of actions performed by website users, 2) the increasing contribution of
online activities in earnings, and 3) the growing availability of Web analytics tools and options.
Therefore, the literature on this subject is plentiful, and Web analytics belongs in the family of Web marketing. This is its
cornerstone: without analytics, it is impossible to measure the Return On Investment (ROI) of Web marketing or any other
actions (e.g. an e-mail campaign). Web analytics allows the analysis of quantitative and qualitative data of a website and its
competitors in order to bring continuous improvement of its users’ experience (Chardonneau, 2011). There are many Web
analytics solutions on the market that can measure different variables (i.e. Omniture, Urchin, Google Analytics, and many
others).
The free tool Google Analytics was already implemented within our case study’s website—which will be further explained
below— meaning that data have been collected over a period of two years, which was the primary reason for the selection of
this tool in this study. Google Analytics is a quantitative analytics tool that measures the volume of clicks, informs about
where visitors come from, and informs web administrators about users’ behaviors. Google Analytics provides several metrics
that can be categorized according to the Digital Analytics Association (DAA), such as:
- “Count: the most basic unit of measure; a single number, not a ratio”
- “Ratio: typically a count divided by a count, although a ratio can use either a count or a ratio in the numerator or
denominator”
- “KPI (Key Performance Indicator): while a KPI can be either a count or a ratio, it is frequently a ratio” (Burby and Brown,
2007).

CONCEPTUAL FRAMEWORK
We propose a conceptual framework (see Figure 2) which encapsulates the various optimization actions according to the
REAN Model, and can be used to develop our hypotheses below:
Ø H1: Improved URL structure implemented in a sitemap with the use of <hn> tags to operate Net Linking and
improve description meta-tags will have a positive impact on the traffic.
§ H2: Enhance the conversion funnel by implementing a call-to-action feature will have a positive impact on the
number of registration members.
§ H3: Change the sign-in page by implementing news-feeds and the “who's visited my profile?” widget will have a
positive impact on user loyalty therefore resulting in repeat visits.
It should be noted that this study will only focus on the first of the three hypotheses, although all three were presented for the
sake of providing a comprehensive consideration of the REAN model in action.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 5
Coursaris et al. Website Performance Optimization via Web Analytics

Figure 2. Conceptual model

METHODOLOGY
As the proposed study will be conducted within the context of a specific website—i.e., through a case study—it is necessary
to determine a clear plan of action in line with managerial objectives. The action plan includes four main phases. The first
phase involves preparation, consisting of the determination of the objectives, followed by the choice of the Key Performance
Indicators (KPIs). The next phase is essentially composed of a literature review: according to the given objectives,
optimization actions have to be found resulting from various studies in order to answer to the managerial problem (or
objectives). Then, those proposed actions have to be implemented. Finally, the implemented actions need to be evaluated, so
this step defines how and where data are extracted from.
Preparation Phase
Define the purpose of the website.
For any business, it is key to define the website’s purpose, and how this site will contribute to the success of the
business.

Define the strategic objectives of the website.


To arrive at these objectives, the following two foundational questions were asked:
• What are the elements that will lead to the achievement of the website’s objectives?
• What user actions do we require in order to achieve the objectives of the website?

The strategic objectives are explicitly tied to the website’s purpose, however, they need to be measurable. Hence, the
following three main strategic objectives were defined:
1. Increase site traffic
2. Increase the number of user registrations
3. Increase site loyalty

Define operational objectives of the website.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 6
Coursaris et al. Website Performance Optimization via Web Analytics

Setting operational objectives can pave the way to the achievement of the aforementioned strategic objectives of a
website. Operational objectives are distinct from strategic objectives in that they are more closely connected to an action.
Hence, if an operational objective does not reach its target, it is easier to determine what action plans to subsequently
implement. We propose the following list of key operational objectives:
• Increase Search traffic (Organic)
• Increase Conversion Rate (CR)
• Increase Click-Though-Rate (CTR)
• Increase visit frequency
• Increase direct access
With the various types of objectives having been defined, it is now necessary to translate them into metrics.

Define the metrics.


The selection of relevant metrics is an essential step for a successful data analysis and the generation of
managerial/web admin knowledge. The Digital Analytics Association (DAA) provides a set of key metrics (see Table 4).

Metrics Definition
Search engine traffic refers to the volume of visitors who arrive at a website by clicking search
Search Traffic
results leading to that particular website.
The number of times a visitor completes a target action divided by the number of times that
Conversion Rate
link was viewed.
The number of click-throughs for a specific link divided by the number of times that page was
CTR
viewed.
A visit is an interaction, by an individual, with a website consisting of one or more requests for
an analyst-definable unit of content (e.g. “page view”). If an individual has not taken another
Visits per month
action (typically additional page views) on the site within a specified time period, the visit
session will terminate.
Direct Traffic Visitors who visited the site by typing the URL directly into their browser.
The number of inferred individual people (filtered for spiders and robots), within a designated
Unique Visitors reporting timeframe, with activity consisting of one or more visits to a site. Each individual is
counted only once in the unique visitor measure for the reporting period.
Registrations The number of users who have completed the registration process.
Returning The number of Unique Visitors with activity consisting of a Visit to a site during a reporting
Visitors period and where the Unique Visitor also Visited the site prior to the reporting period.
Source: DDA, Web Analytics Definitions, (2007)
Table 4. List of Key Metrics for Web Analytics

Check the availability of data for each Metric.


Once a metric is defined, it is necessary to check that the related data are available. “Technical feasibility should be
studied after studying the needs in terms of indicators” (Fétique, 2010). Indeed, it is necessary to check if we can measure the
impact of each optimization action. Thus, it is important to check if each metric is available in the selected Web analytics
tool, here, Google Analytics.

Choice of actions to be implemented


Based on academic papers, best practices, white papers, book, and official sources, a list of optimization actions have been
proposed. In order to have a clear overview of the recommended plan, actions are classified in relation with their correlated
objectives and metrics in Table 5.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 7
Coursaris et al. Website Performance Optimization via Web Analytics

Objectives Metric Proposal:


1) Increase Traffic Unique Visitors Description meta tag; Improve URL
Reach

structure; Use a sitemap; Use <hn> tags;


- Increase Search Traffic Organic Search Traffic Net Linking.
2) Increase number of
Registrations
registrations
Activate

Call to action; Enhance conversion


- Increase conversion rate Conversion Rate funnel; Call to action.
- Increase CTR CTR
3) Increase loyalty Returning Visitors
Nurture

Landing page; News-feeds; Who's visited


- Increase Visit frequency Visits per month
my profile?
- Increase direct access Direct Traffic
Table 5. Summary of optimization actions

Implementation Plan
Once the previous work is done, it remains to convince managers of both the accuracy of the findings and the validity of the
recommended actions. The ability to convey a message properly is essential for any web practitioner.
Visual data analysis and presentation skills are indeed key topics. It is mandatory to be able to “give a picture to information
and ideas” (McCandless, 2011). With visuals it is easy to quickly and easily highlight findings, lessons, and actions to be
taken out of the considerable amount of data provided. Well-designed graphics highlight the facts and reveal opportunities.
Indeed, the presentation of results should lead to decisions and actions. The audience must be convinced by the
demonstration based on facts and figures and leave the presentation with the idea of implementing the recommendations
made.
Data Collection Plan
For each Google Analytics Report it is possible to export data as a CSV (Comma Separated Value) format (and possibly be
used in Excel) and reviewed for insights. At the present time, results are not available yet for presentations.

REFERENCES

1. Andrieu O., (2012), Réussir son référencement web, Eyrolles, Paris.


2. Arson B., (2012), Web Analytics: Méthode pour l'analyse web, Pearson, Paris.
3. Ber, G. and Jouffroy, J. (2012) Internet Marketing, Elenbi Editeur, Paris.
4. Berger V., (2011), Musique & stratégies numériques, Irma, Paris.
5. Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engin, Stanford Infolab
Publication Server, ilpubs.stanford.edu:8090/361/1/1998-8.pdf
6. Burby, J. and Brown, A. (2007), Web Analytics Definitions, Web Analytics Association, WWA Standards Committee
http://c.ymcdn.com/sites/www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsV
ol1.pdf
7. Chardonneau, R. (2011) Google Analytics: Améliorer le trafic de votre site pour améliorer ses performances, ENI
editions.
8. Digital Analytics Association (2012) About Us, Digital Analytics Association,
http://www.digitalanalyticsassociation.org/?page=aboutus
9. Dodson, J. and Davis, K. (2011) A new method for website engagement and success: a manifesto of optimization, White
Paper, P1-6.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 8
Coursaris et al. Website Performance Optimization via Web Analytics

10. Fanelli, M. (2010) Guide pratique des réseaux sociaux - Twitter, Facebook... des outils pour communiquer, Dunod, Paris.
11. Fétique R., (2010), Internet Marketing, Elenbi Editeur, Paris.
12. Glad V., (2011), Peut-on savoir qui visite son profil Facebook?, Slate, http://www.slate.fr/story/29619/peut-savoir-qui-
visite-profil-facebook
13. Glowa, T. (2002) Advertising Process Models, White Paper, P8-10.
14. Jackson, S. (2009) Cult of Analytics: Driving online marketing strategies using web analytics, Butterworth-Heinemann.
Oxford, UK.
15. Jenkins K., (2011) Overview of Search Marketing: SEO & SEM, Sanger & Eby Design,
http://www.sangereby.com/uploads/overview_of_sem__seo_3.24.11_77.pdf
16. Kabani S., (2010), The Zen of Social Media Marketing, BenBella Books, Inc. Dallas, USA.
17. Kermorgant V., (2008), Evaluating your on-line success with web analytics, White Paper, P3-5.
18. King A., (2008), Website Optimization, O'Reilly Media Inc. California, USA.
19. Kuss D., Griffiths M., (2011) "Online Social Networking an addiction - a review of the psychological literature"
International Journal of Environmental Research and Public Health, 8 (9), 3529-3552.
20. Linkedin (2011), Qui a consulté votre profil?, Linkedin, http://www.linkedin.com/static?key=pop/pop_more_wvmp
21. McCandless, D. (2011) Datavision, Robert Laffont Ed. Paris
22. Médiamétrie (2012) La fréquentation des sites internet français, Médiamétrie,
http://www.mediametrie.fr/internet/communiques/telecharger.php?f=55743cc0393b1cb4b8b37d09ae48d097
23. Netcraft (2012) March 2012 Web Server Survey, Netcraft http://news.netcraft.com/archives/2012/03/05/march-2012-
web-server-survey.html
24. Oneupweb (2005) Target Google's Top Ten to Sell Online, http://www.oneupweb.com/wp-
content/uploads/google_topten.pdf
25. Page, R.; Ash, T.; and Ginty, M. (2012) Ten best practice to drive on site engagement, Janrain | Amplifying the Power of
the Social Web, http://boletines.prisadigital.com/10_Best_Practices_to_Drive_On-Site_Engagement.pdf
26. Prat, M. (2011) Référencement de votre site Web, Editions ENI. Paris.
27. Rolka, L. (2012) How to Solve the Online Registration Challenge, Janrain, http://janrain.com/blog/infographic-how-to-
solve-the-online-registration-challenge/
28. Sanghvi, R. (2006) Facebook gets a facelift, Official Facebook Blog,
https://blog.facebook.com/blog.php?post=2207967130
29. Shannak, R. and Qasrawi, E. (2011) Using Web Analytics to Measure the Effectiveness of Online Advertising Media: A
Proposed Experiment, Eurojournal, www.eurojournals.com/EJEFAS_42_08.pdf
30. Viadeo (2010) Guide d'utilisation, www.viadeo.com/guide.pdf
31. Visser, B. and Weildeman, M. (2010) An empirical study on website usability elements and how they affect search
engine optimisation, South African Journal of Information Management, 13 (1), 1-9.
32. Yu, S.; Hsu, W.; Yu, M.; and Hsu, H. (2012) Is the use of Social Networking Sites Correlated with Internet Addiction?
Facebook Use among Taiwanese College Students. World Academy of Science, Engineering and Technology, 68, 1659-
1661.

Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 9

View publication stats


From Strategy to Analysis:
A Guide to Navigating Google Analytics!
!
Kristen!Sorek!West
!

!
!
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 1!
!

INTRODUCTION can help managers better serve their

organization’s mission and achieve

organizational goals. Analyzing the activity


“Half of the money I spend on advertising is
of an organization’s website enables
wasted; the trouble is I don’t know which
managers to identify successful
half.”
communication strategies and observe user
-John Wanamaker
behavior. For instance, Google Analytics can

help managers determine the return on


Whether working in digital or traditional
investment for promotional campaigns,
channels, quantifying impact and
quantify the effectiveness of using social
engagement is a challenge. Often, it is
media content to drive website activity,
difficult to find the causal link between
monitor website content participation rates,
marketing efforts and programmatic
and create active audience segments. With
success. With more tangible, clearly
regular observation and analysis, managers
differentiated issues to address like print
can use the program’s data to make quick,
deadlines and discount programs, it’s no
informed, and effective decisions to connect
wonder that 42% of nonprofit art managers
to audiences on a deeper level, broadening
never meet to discuss their website’s
the impact of an organization’s work in their
performance (Capacity). Moreover, for
communities. In addition, managers can use
those organizations which sell goods online,
Google Analytics data to support managerial
over two-thirds are unable to draw a
decision making and engage critical
connection between sales and website
stakeholders in organizational progress.
usage behavior (NTEN).

Consider data analysis to be a series of


Some managers simply face a lack of
small scientific studies wherein the manager
technical know-how. Other common barriers
hypothesizes, tests the hypothesis,
include a lack of time and available
examines the results, and responds
technology. Each of these factors inhibits
accordingly. Every marketing manager
managers from learning, implementing, and
anticipates certain outcomes when
using data analytics programs, like Google
developing a marketing strategy. Just like
Analytics, on a regular basis.
scientists, marketers test these
However, pushing past these barriers and
expectations by implementing the strategy.
investing resources into Google Analytics
!

!
For instance, a museum marketing manager

believes hosting web content by teen RUMBLE THEATRE: A Case


bloggers might attract more teens to its Study
after school programs. To test this theory,
Nancy, the marketing director of Rumble
she engages five current teen participants
Theatre, just got out of a meeting with
to blog once a month for five months. She senior staff. In three months’ time Rumble
will measure success by monitoring website
Theatre will open their final main stage
traffic created by teens to the museum’s
show, Red Warrior.
website as well as program attendance. By

using Google Analytics, marketers can easily Rumble!Theatre!!


chronicle and analyze the results of digital Location:!Pittsburgh,!PA!
Type:!Production!
marketing activities. Afterwards marketers Focus:!Original!works!by!American!playwrights!
Annual!budget:!$2!million!
apply the knowledge gained by changing a Annual!marketing!budget:!$85,000!
website feature or altering its content. Marketing!department!size:!2!!
Current!audience:!middle!income,!35N55!yearN
Every alteration resets the scientific old!women,!educated,!suburban!!
process, enabling the marketer to embark !
on another exciting series of hypothesis

testing and data analysis.

The Artistic Director


Using a fictional organization, Rumble and Business
Theatre, as a case study, this report Manager seek to
demonstrates how marketing strategy can increase ticket sales
be implemented through Google Analytics, by 15% over the run
and how a marketing director can use of the show. The
Google Analytics to derive insights from main stage theatre
data. The approach this paper takes is can seat 250
inspired by the scientific process and will audience members.
use the following methodology:

measurement, analysis, insight, and action. Red!Warrior!!


A!group!of!friends!and!family!struggle!to!be!heard!
amidst!the!political!and!civil!unrest!of!the!1960’s.!
Inspired!by!artists!like!Sister!Mary!Corita!Kent!they!
embrace!visual!storytelling!and!art!to!catalyze!
conversation!and!actions!among!themselves!and!
their!country.!

!
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 3!
!

Every main stage production runs 7 times a GOOGLE ANALYTICS PREP:


week for 5 weeks. If the house sold out Defining Objectives,
each performance Rumble Theatre would Benchmarks, and Key
sell 8,750 tickets total. Average attendance
Performance Indicators

at each performance is 175 people. Over

the course of the production’s run this Objectives: Before accessing Google

means that on average Rumble Theatre sells Analytics, Nancy defines her objectives and

6,125 tickets. For Rumble Theatre to measurement tools. After all, “you can’t

increase total tickets sold by 15% the configure Google Analytics if you don’t

theatre would need to sell just over 25 know what you need” (Cutroni). She selects

additional tickets per show, or about 920 two broad objectives: Increase online ticket

additional tickets. sales and raise awareness

Nancy believes that by publishing Red Increase online ticket sales: Audience

Warrior related content on the theatre’s members can purchase tickets by visiting

website and diverting audiences to its the theatre’s website, calling the box office,

website through its promotional strategies, or visiting the box office in-person. Rumble

audiences will engage with the storyline, Theatre’s website recently underwent a

intellectual and emotional themes, and renovation. And it’s now easier than ever

actors more deeply, increasing the likelihood before to buy a ticket online. The theatre

that a website visitor will buy a ticket. has received positive feedback on this

Nancy decides to direct audiences to the feature, especially from those in its current

website in all marketing materials so can she dominant demographic. Prior to the new

can use Google Analytics to monitor website design only 35% of Rumble

fluctuations in traffic and user behavior in Theatre’s total main stage performance

response to marketing campaign efforts. tickets, about 2,144 tickets, were sold

through its website. By the end of Red

Warrior’s run Nancy hopes to increase the


proportion of total tickets sold by its

website to 50%.

!
Raise awareness: Based on Rumble frequency, time frame, and digital

Theatre’s market research of the Pittsburgh component of each tactic leading up to

area, Nancy believes that she can increase opening night and continuing through its

the number of tickets sold in Rumble’s final show. She uses this calendar to

current dominant demographic. She will differentiate those strategies meant to

choose marketing strategies that were increase awareness vs ticket sales, as well

previously successful in attracting those as denote the digital components of each.

audiences, and combine them with Nancy will monitor Rumble Theatre’s

strategies that reach a larger amount of website activity as each new marketing

people within her target audience. Not only campaign tactic is deployed and each new

will she monitor the number of tickets sold alternation to the website is made. Isolating

through the website but she will also expect these attributes will help her in assessing

to see an increase in the amount of traffic the effect of her marketing strategies in

of her target segment to the Rumble Google Analytics.

Theatre website.

Nancy believes her marketing strategies will For instance, if Google Analytics displayed

attract an additional 25 people per show. If social referral spikes similar to those seen in

she is successful, Rumble Theatre’s total exhibit 2 Nancy would be able refer to her

ticket sales will be 7,000 tickets. A 14% editorial calendar to quickly identify which

increase in ticket sales. Referring back to marketing tactics were active when the

her first objective, if Rumble Theatre sold spikes occurred. Using this knowledge,

7,000 tickets Nancy hopes to sell 3,500 of Nancy could dive into Google Analytics

those tickets through the website. targeting relevant reports, pages, and

She crafts the following high-level editorial visitor metrics.

calendar (exhibit 1), which outlines the


!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 5!
!
Channel' Digital 1-Apr 15-Apr 1-May 15-May 1-Jun 15-Jun
Blog
Blog%series%A:%actor%interview%
series% Featured(on(Rumble's(website( ! ! !
Blog%series%B:%behind%the%
scences%production%series Featured(on(Rumble's(website( ! !
Social'Media
Twitter%(daily)
Blog(series(A,(B Share(link(to(blog(posts,(#RumbleNightly ! ! ! ! !
Related(news Share(link(of(press,(related(news(and(images ! ! ! ! ! !
Facebook
Blog(series(A,(B Share(link(to(blog(posts,(#RumbleNightly ! ! ! ! !
Related(news Share(link(of(press,(related(news(and(images ! ! !
Web(address,(social(media(handles,(hashtag,(and(
Newspaper'ad contact(information !
Web(address,(social(media(handles,(hashtag,(and(
Radio'spot contact(information !
Include(web(address,(social(media(information,(
hashtag,(and(contact(information,(link(to(Red(
Email Warrior(show(page ! ! !
Exhibit 1: Rumble Theatre’s high-level editorial calendar for its show, Red Warrior.

Exhibit 2: Sample social media referral timeline displaying a surge in activity occurring during Red
Warrior’s production.

Benchmarks benchmark, determining the relevance of

In order for Nancy to understand how her site metrics, for instance a 22% bounce rate, would

and strategies are performing, she needs some be impossible. Nancy decides to compare the

frame of reference. Comparison is a critical website activity of Red Warrior’s marketing

element of measurement. Without a campaign to that of Rumble Theatre’s last main

!
stage show, It’s All Right With Me. Even if by a manager’s best guess or by previous

Nancy didn’t have a past show to use as a experience, is useful because it advances a

measurement tool, she still could find an manager’s analysis: managers can use the

alternative benchmarking option. For instance, results from that comparison to inform the

Nancy could compare Rumble’s website usage next round of measurement (Kanter 49).

against that of a peer organization, against a Google Analytics commonly displays data

time period on Rumble Theatre’s website comparisons using side-by-side bar charts,

without an active performance, or against an tables, or lines graphs, as seen below in exhibit

industry standard. A baseline, be it determined 3.

Exhibit 3
For!KPI!inspiration!read!‘The!Big!Book!of!KPIs’!by!Eric!
Peterson!and!Beth!Kanter’s!‘Measuring!the!Networked!
Key Performance Indicators Nonprofit.’!
Now that Nancy knows where she wants

to end up and how she will gauge her

progress, she needs to decide what Objective:!Increase!online!ticket!sales!!

variables she will use to measure her • KPI:!Buyer!conversion!rate:!Total!customers!


converted/all!visitors!!
progress. There are hundreds of metrics
o ‘Conversion’,,which,will,be,addressed,in,
and dimensions in Google Analytics, so greater,depth,momentarily,,in,this,instance,
is,a,user’s,action,of,buying,a,ticket.,This,KPI,
Nancy needs to be very selective. Key
compares,the,total,number,of,Rumble’s,
performance indicators (KPIs), also known website,visitors,against,the,number,of,
visitors,who,purchase,tickets.,!
by some as the ‘kick butt index,’ are often
• KPI:!Percent!of!order!from!a!specific!segment:!
a single metric that helps organizations Total!orders!from!a!visitor!segment/total!orders!!
grasp how close they are to achieving a
o This,KPI,could,also,be,known,as,,‘SegmentC
specific,buyer,conversion,rate,”,wherein,
goal (Kanter 49). For instance, a KPI Nancy,is,comparing,all,ticket,orders,against,
those,ticket,orders,made,by,her,target,
segment.,!
Exhibit 4: Screen shot of the RedWarrior F/35-55/PGH
segment creation window.

!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 7!
!
derive meaningful insight on activity relating

to that goal. Nancy has settled on the

following KPIs for Red Warrior:

GOOGLE ANALYTICS
IMPLMENTATION: Creating
Audience Segments, Goals, and
Dashboards
measuring increased community interest may

be the percentage increase of e-newsletter


Creating an audience segment
signups compared to the total amount of
When embarking into Google Analytics, Nancy’s
visitors to the site.
first step is to define segments so she can

isolate the behavior of her target audience. At


Because the definition of ‘achievement’ is
the top of any report page there is a ‘+Add
subjective and particular to each project, KPIs
Segment’ window framed by a dotted line
generally vary according to the project, team,
outline. Nancy selects this button and, using
and organization. KPIs are typically associated
Hubspot and Google Analytics Support as ‘how-
with a single objective and often offer deeper
to’ references, creates an audience segment
insights than a standard metric, like session
called ‘RedWarrior F/35-55/PGH’.
duration or number of shares, because they

come in the form of a percentage, ratio, or

average (Peterson). As such, KPIs analyze the


Nancy’s new ‘RedWarrior’ segment contains
relationship between two relevant metrics to
only basic demographic information, like age,

gender, and location, but she makes note that


Objective:!Increase!awareness!of!show!
to use some of the more advanced segment
• KPI:!Percentage!new!visitors!(segment!
features in later projects. Below are two
specific):!Total!new!visitors/all!visitors!
segmentation categories that caught her
Nancy,is,interested,in,learning,if,her,marketing,
tactics,reach,and,compel,her,target,audience,to, attention.
learn,more.,Because,Nancy,wants,to,broaden,
• Technology: under ‘Operating
awareness,in,her,current,demographic,she,is,
primarily,interested,in,seeing,increased,traffic,from, System,’ Nancy can narrow her
new,visitors,within,her,target,audience.,She,will,
segments according to whether visitors
compare,the,number,of,new,target,segment,
visitors,to,all,website,visitors., are Windows or Mac users. Lower in the

!
‘Technology’ window Nancy could also views, time spent on a page, or PDF

segment by mobile activity. downloads.

• Traffic Sources: under ‘Source,’ Google Analytics tracks four different types of

Nancy can narrow segmentation goals:


Objective:!Increase!online!ticket!sale!
according to the previous page a user destination, revenue!by!15%.!!
Google!Analytic!goal:!Track!the!
was on before arriving at the Rumble duration,
quantity!of!website!visitors!who!land!on!
Theatre website. Pages/Scre the!“Thank!you!for!your!purchase!”!
page.!!
ens per
Google!Analytic!goal!type:!Destination!
Creating ‘Goals’ session and !
events. For
Because!website!visitors!can!purchase!
show!tickets!through!Rumble!Theatre’s!
In order for Google Analytics to measure more on website,!Nancy!instructs!Google!
progress associated with Nancy’s current each type
Analytics!to!track!how!many!visitors!
arrive!at!the!final!page!webpage!visitors!
objectives, Nancy can either track preexisting of goal see!in!the!online!ticketNpurchase!
metrics, like bounce rate, or instruct the check out
process.!!
!
program to track specific actions associated Kissmetrics

with her objective. She does this using Google . To learn how to construct goals in Google

Analytics’ ‘goals’. A traditional goal differs Analytics visit AMT Lab or Google Analytics

slightly than a goal in Google Analytics. While Support.

both types of goals seek a specific outcome, a

‘goal’ in Google Analytics tracks a specific,

measurable action taken by a website by a Objective:!Increase!awareness!of!show!!


Google!Analytic!goal:!Track!number!of!visitors!to!
visitor. The action could be anything from land!on!the!Red,Warrior!show!page.!!
visiting an organization’s ‘About’ page to Google!Analytic!goal!type:!Destination!
!
signing up for a newsletter to watching a video Since!Nancy!is!asking!Google!to!track!visitors!
hosted on the website. arriving!at!a!certain!page,!this!is!another!
‘Destination’!goal.!Consequently,!Nancy!uses!the!
same!procedure!as!above!to!create!the!goal.!!
In their current form, Nancy’s marketing goals !

are not defined using measurable data points

but are merely stated as broad business

objectives. Google Analytics requires Nancy to

define what constitutes awareness and In Google Analytics, a visitor’s completion of

increased sales in terms of types of data that one of these actions is referred to as a

it can collect within the program, like page ‘conversion’. Google Analytics is outcome-
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 9!
!
oriented, meaning that there are desired outcome. (Tonkin). After a goal is

measurements in place, like goals, to help users created, Nancy can then sort data in each of

assess whether or not a desired outcome was the reporting categories according to that

achieved. The conversion of a visitor is a goal, as seen in exhibits 5 and 6.

Exhibit 5: The Acquisition Overview within the Acquisition reporting tab allows Nancy to sort by goals
to see the completion rate for each.

Exhibit 6: Nancy can also filter result according to goals, juxtaposed with other metrics.
.
Selecting Metrics to follow There are hundreds of metrics and dimensions

in Google Analytics, and many of them are

So far Nancy has programmed Google Analytics bound to be irrelevant. Nancy uses her KPIs as

to track one audience profile and two actions. a starting point for identifying which metrics

But Nancy also wants to collect website data would best supplement her website analysis

relating to the effects of her marketing and illustrate the activity surrounding his KPIs

campaign. For instance, Nancy wants to know and objectives.

how website activity changes after the

newspaper ad is released.

!
There are eight data collection categories in In addition to evaluating which metrics are

Google Analytics, as required to calculate her KPIs, Nancy also

seen in exhibit 7. refers back to her editorial calendar to

anticipate how and where website activity will

change as the result of each new tactic.

• Blog: As Nancy distributes Red Warrior

blog posts pre-opening and during the

run of the show, Nancy expects traffic

through those blog post pages to

increase.

• Social media: Nancy and her assistant


Exhibit 7 Nancy will share links to Rumble

Theatre’s Red Warrior blog posts,

videos, and related information through

Nancy explores the ‘Audience,’ ‘Acquisition,’ Twitter and Facebook. Possible results:

‘Behavior,’ and ‘Conversions’ categories. Nancy o Higher traffic to those

constructs a table, exhibit 8, to isolate which respective pages;

metrics are necessary to calculate her KPIs and o Visitors from social media

identify their location in the program. purchasing tickets;

o And, potentially, a higher

KPI$ Objective Metric Metric$location


Buyer$conversion$rate
Total&customers&converted/total&visitors Increase(online(ticket(sales Total(visitors Acqusition>(All(traffic
Conversions>(Goals>(
Total(customers(converted( Overview
Percent$of$order$from$a$specific$segment

Conversions>(Goals>(
Total&orders&from&a&visitor&segment/total&orders Increase(online(ticket(sales Total(customers(converted
Overview
Total(orders
Percentage$new$visitors$within$a$specific$segment
Total&new&visitors/all&visitors Increase(awareness(of(show Total(visitors Acqusition>(All(traffic
Audience>(Behavior>(
Total(new(visitors( New(vs(Returning

Exhibit 8
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 1
! 1!
quantity of website visitors website would immediately
sharing that webpage outward redirect the user to the Rumble
back over social media Theatre’s main webpage. The
• Newspaper ad: The newspaper ad is advantage of this feature is
taken out in a local weekly paper and Google Analytics will chronicle
incorporates a redirect site, the redirect website as a new
www.rumbletheatrewarrior.com, so traffic source. Because Nancy
Nancy can track the traffic stimulated only used this site on the
by the ad. During the ad’s week on the newspaper ad she knows that
stands, Nancy expects to see higher any metrics recorded as a result
website traffic to the Red Warrior show of this traffic source are
page. specifically attributed to the
o Marketers have a difficult time newspaper ad readers.
attributing traffic to specific o Discount codes are unique
advertising methods like coupon codes users can upon
billboards, radio spots, or checkout to reduce the cost of
magazine or newspaper ads. their purchase. If Nancy wanted
Google Analytics cannot tell to track ticket purchases from
Nancy if the radio ad or the radio listeners one discount
newspaper ad prompted a user code could be ‘rumbleWXPN’. To
to visit the website or purchase use discount codes in Google
a ticket without additional Analytics E-Commerce
assistance, like the use of a capabilities must be enabled and
redirect link, exit survey, or html coding written into the
discount code. program. Google Analytics
o Redirect links are websites that allows the user to sort
automatically redirect visitors to purchases by code usage, much
an alternative site. For instance, like traffic sources sort visitor
Nancy could refer newspaper origins. For more on
readers to programming discounts codes
www.rumbletheatre.org/rumble click here.
warrior. When visited, this

!
• Radio: Higher visitor counts to the

mainpage and Red Warrior show page.

Visitor'Activity' Reporting'category Sub5category Report' Seconday'dimension


Blog Higher'traffic Behavior Site'Content Content'drilldown
Landing'pages
Social'media Higher'traffic' Acquisition' Social Landing'pages Social'network
Visitors'from'social'media'
purchasing'tickets Acquisition' Social' Conversions
Newspaper/radio Higher'traffic Behavior Site'Content Content'drilldown
/email Landing'pages
Page5specific' Higher'traffic Behavior Site'Content Content'drilldown
metrics:'bounce'
rate,'avg'session'
duration,'and'
pages/session Landing'pages
User'flow User'movement'through'website
Acquisition Social Users'Flow

Exhibit 9: When Nancy outlined the activity she expects to see resulting from his campaign strategies
she did not know which specific metrics she would be looking for or which will be most relevant to
report. Nancy’s’ strategy was to use this exercise to understand where this type of activity is
recorded and what supplementary data is recorded as well

• Email: Each email sent out will have the Nancy will examine single traffic metrics like

mainpage and Rumble webpage bounce rate, pages/session, and average

hyperlinked. Nancy anticipates session duration associated with the above

increased traffic to both of those pages webpages. Traffic behavior on the Rumble

for 1-3 days after the email is sent out. Theatre website will help Nancy determine if

she is successful in guiding visitors to the

Throughout the entire campaign, Nancy will use pages associated with her goals and if visitors

Facebook’s, Twitter’s, and her emailing engage with the site’s content. Google

program’s analytics to compliment the website Analytics ‘User Flow’ chart, exhibit 10, nicely

data she is processing in Google. In addition, visualizes users’ movements.


!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 1
! 3!

Exhibit 10

Using Dimensions • Source: registers which website users

Most reporting categories enable the user to were on prior to visiting the Rumble

sort metrics according to certain dimensions. Theatre website

Dimensions are shared attributes, or categories • Medium: how a user finds the website

in which metrics can be grouped. For instance, be it a referral (shared link), organic

in the “Acquisition” reporting category, Google

Analytics provides four primary dimensions:

source/medium, source, medium, and keyword,

as seen in exhibit 11. Nancy is distributing

Rumble Theatre’s web address over socials,

radio, and newspapers. She wants to gauge

how traffic is arriving – if it’s via a shared link

or directly typed into the web browser, or

through keyword searches in search engines.

Exhibit 11
!

!
(keywords in a search engine), direct • Source/Medium: cross filters data

(arrived at the website by directly according to source then the medium.

typing in the URL into their browser) or Each reporting category provides different

email link. primary dimensions that are relevant to the

• Keyword: sorts data according the data in that category.

primary words used by visitors when

searching for a site In “Acquisition,” under “All Traffic,” Nancy

wants to examine visitors who arrived via a

shared link, so she sorts her metrics by the

dimension ‘Medium’ (see exhibit 12). Nancy is

For!more!resources!on!metrics,!
dimensions,!and!KPIs!visit!Avanish!
Kaushik’s!‘Occam’s!Razor’!blog.!

most interested in ‘referrals’ as they indicate

visitors arriving from a shared link. To

understand which sources are driving traffic to

the website Nancy applies a second dimension

(see exhibit 11) to filter the data further.

Exhibit 12 Under the ‘Secondary Dimension’ drop down

menu she selects ‘Acquisition’ then ‘Source’ so

the referrals are broken up according to the

visitor’s previous location (see exhibit 13).

Building a dashboard to manage the

metrics

By this point, Nancy is juggling a lot of metrics.

To stay organized and save time she is going

to build a dashboard. Dashboards are a

collection of Google Analytic widgets (graphs,

charts, maps, timelines, etc.) sourced from

individual reports, but maintained in one


Exhibit 13
location. For instance, if Nancy wanted to
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 1
! 5!
monitor metrics associated with Rumble fighting blog.

Theatre’s blog she could build a ‘Blog Activity’

dashboard. Because blog metrics are diverse, Dashboards help users efficiently review

from traffic sources to popular pages to unique important metrics and dimensions and can be

visitor information, rather than hunting for easily printed or shared for quick reporting In

each metric every time she wanted to check its addition, because dashboards are traditionally

statues she saves individual graphs, charts, created around a theme (eg. Blog activity

etc. to a dashboard. Exhibit 14 displays a metrics) and rely on visual displays of data, like

sample blog dashboard built for a professional table and charts, they help create data-friendly

Exhibit 14: The dashboard includes visitor information, traffic sources, audience demographic
information (location), popular page metrics, and social media information. Source:
http://meraustaad.blogspot.com/2014/12/how-to-create-google-analytics.html

!
cultures through visually storytelling. Google Analytics for immediate use. For

assistance in building dashboards check out:

Nancy’s approach to building a dashboard is to Google Analytic s Support and KISSmetrics.

revisit her KPI table and supplementary metrics

tables and identify any trends in frequently ANALYZING GOOGLE ANALYTICS:


noted metrics or metrics types. Nancy sees Deriving insights and acting
that conversion rates comprise the majority of

her KPIs. Meanwhile in her supplementary Example 1


metrics table she notes that social media, It’s the week of April 15th, Nancy has released
traffic and website behavior metrics are her first blog post about the production of the
popular. Nancy decides to build three social Red Warrior set. She wants to analyze the
media dashboards as a result: conversion, impact of promoting the post on social media
social media, and traffic. on Rumble Theatre’s website traffic. She is

hoping to see increased website traffic through


Just like KPIs, dashboards will vary based on the Red Warrior show page, a trend indicative
the need and the organization. Webpages like of greater overall awareness. Referencing her
Dashboard Junkie and Moz pages share traffic referrals (exhibit 15), Nancy processes
dashboards that can either serve as a source her data by reviewing what she is measuring,
for inspiration or can be downloaded right into identifying trends, drawing insights and making

decisions based on those conclusions.

Exhibit 15
!
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 1
! 7!
Measure: Nancy measures the number of Action: Nancy first decides to test this trend

sessions created by referrals from social media by tweeting more frequently, while maintaining

between April 1 and April 17. This information the same posting schedule for Facebook. If

can be found in the “Acquisition” report under Twitter continues to drive an increased volume

“All Traffic” and “Referrals”. She applies the of visitors to the website, Nancy will decide to

Red Warrior segment on to his search to maintain the more frequent posting schedule

narrow the results to her target audience. because it is successfully engaging with her

target audience. Should Twitter maintain or

Analyze: As seen within the red outline in decline in traffic referrals Nancy’s conclusion

exhibit 15, it appears that Twitter (t.co) is could be that once past a certain tweeting

driving about 3 times as much traffic to the frequency, Rumble’s Twitter activity elicits

Rumble Theatre website compared to Facebook little response out of the target audience or

(facebook.com). the audience is fatigued by the amount of

similar content published by Rumble.

Insight: One conclusion Nancy could draw is

that Rumble’s target audience is active on To analyze when her target audience is active

Twitter and positively responding to its on Twitter, and thus more likely to follow a link

content. Although these reasons most likely embedded in a tweet, Nancy conducts another

explain the Twitter trend, Nancy must keep test. She selects two timeframes and issues

other possible reasons in mind for the surge in similar tweets in each. Still using the

Twitter traffic, like the time of day the Tweet Acquisition report, Nancy can break down

was published or the day of the week the traffic referrals by hour to identify trends, as

content was published. seen below.

!
Exhibit 16. Source: https://blog.kissmetrics.com/the-science-of-twitter-timing/

Example 2 time period leading up to the opening of It’s All

By the first week of May, Nancy has published Right with Me.
Red Warrior promotional content on Facebook,
Twitter, and the website, sent two email blasts, Nancy uses her “Warrior tkts” goal, which

and released the newspaper ad. A change tracks how many visitors arrive at the “Thank-

between It’s All Right with Me’s marketing you for your purchase” online ticket

campaign and Red Warrior’s campaign is the completion page, to track and compare ticket

addition of a second pre-opening email blasts sales data in Google Analytics. She visits the

and the newspaper ad. Nancy wants to know if “Goals Overview” report in “Conversions.”

there are more ticket sales leading up to the Within the “Overview” report, under the

opening of Red Warrior compared to the same “Overview” tab, she selects the two goes she

wishes two visualize from the drop-down

menus to visualize the data (exhibit 17).


!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 1
! 9!

Exhibit 17

Measure: She enters in two time frames, April campaign: the second email blast and

1-June 28 (Red Warrior) and Jan 1-March 27 newspaper ad. Nancy used these channels to

(It’s All Right with Me), and compares her raise both awareness and ticket sales.

“Warrior Tkts” ticket sales goal and her Consequently, Nancy will analyze the amount

previously existing “AllRight Tkts” ticket sales of traffic referrals and the amount of tickets

goal, as seen in exhibit 15. purchased inspired by each marketing tactic.

Analyze: The second email blast and Email analyses: Nancy, with the assistance of

newspaper ad were released on May 8th, a week a few how-to websites, like Constant Contact,

before opening. Nancy sees that pre-show Campaign Monitor, and Web Market Central,

ticket sales for It’s All Right With Me outpaced and her IT Director, enabled Google Analytics

that of Red Warrior, until May 11th when ticket to track visitor traffic originating from her

sales grew significantly and continued to promotional emails.

remain higher than that of It’s All Right With Nancy goes back to the ‘All Traffic-Channels’

Me prior to opening. report within ‘Acquisitions’. After sorting the

As mentioned earlier, there were two traffic by the ‘Medium’ primary dimension,

distinguishing features between the Red ‘Email’ appears as a referral method. She

Warrior campaign and the It’s All Right With Me double checks her timeline to make sure her

!
two pre-show time periods are still in place. by side. She sees that not only have the

Under the ‘Explorer’ tab she selects ‘Goal Set amount of sessions increased but has the rate

1’ then, using the below drop down menu, of online tickets sold compared to It’s All Right

selects a ‘AllRight tkts’ to compare data side With Me (exhibit 18)

Exhibit 18
Newspaper ad analyses: Nancy chose to Analytics. Nancy visits the ‘All Traffic-

incorporate the use of a redirect website, Source/Medium’ report under ‘Acquisition’ and

www.rumbletheatrewarrior.com, in the ad so organizes the information according to the

she could clearly delineate traffic caused by dimension ‘Source/Medium.’ In exhibit 19 she

the ad. Since a newspaper ad wasn’t used in sees the URL for her redirect site as well as

the It’s All Right With Me campaign Nancy conversion metrics for ‘Warrior tkts’ ticket sale

removes the older time period from her Google goal.

Exhibit 19
She sees that she had success in attracting Of the 87 sessions created, the vast majority

website visitors from her target demographic. belonged to her target segment. In addition, of
!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 2
! 1!
the 7 tickets sold online, 5 of them belonged not have older show data to use as a

to her target demographic. However, the measurement for success. Instead, she

newspaper ad instigated traffic of only chooses to compare the ticket revenue

5.63% sessions and inspired only 7 Red generated from the ad with the cost of

Warrior online tickets sales. Because no producing and publishing the ad (see exhibit

pre-show ticket sale newspaper ad was 20).

released for ‘It’s All Right With Me’ Nancy does

Awareness(&(Ticket(Sales(Created(by(
Newspaper(Ad(Production(Costs
Newspaper(Ad

Hours#spent#designing#
All#sessions 87 &#formatting#the#ad 4
Target#segment# Total#graphic#designer#
sessions 74 cost#($50/hour) $200
5>day#ad#publication#
Total#tickets#sold 7 fee $200
Total&ticket&revenue& Total&newspaper&ad&
($40/ticket) $280& cost $400
Exhibit 20

Insight: Based on exhibit 18, it appears that audience segment in future shows. She also

the email blast generated a greater amount of pencils in possibly sending out a third email as

pre-show ticket sale revenue compared to the a follow-up test. Overall, the newspaper ad was

same time period of It’s All Right With Me. not a worthwhile investment. However, the

While the newspaper ad did generate some newspaper ad still may be a worthwhile

revenue and visitor traffic the amount of investment if the ad could be reused a second

resources spent on producing and publishing time. Even though the publishing costs would

the ad outweighed those gains. increase because of the second installment,

production costs would not increase. It’s

Action: Nancy makes a note to send two email possible the repeated ad might contribute to

blasts prior to the next show opening to building ‘buzz’ around the show if used over

encourage pre-show ticket sales within this longer period of time, as well. If enough

!
revenue was generated to cover the costs of

the ad then Nancy may consider continuing to • Raise awareness of the show as

use newspaper ads to promote Rumble evidenced by increased traffic by her

Theatre’s shows. With the next newspaper ad target segment to the website and

she can use the Red Warrior newspaper ad increased online ticket sales.

production time, costs, and revenue generation

as a benchmark. Measure: By visiting the “Overview” page in

the “Acquisitions” report, and double checking

Example 3 that the time period analyzed is April 1-June

Red Warrior concluded its run on June 28th, 28, Nancy sees a total of All Sessions created

2015. On Monday the 29th Nancy sits down at and sessions created by her target segment,

her desk to see if she was successful in either “RedWarrior F/35-55/PGH.” At the top of this

of her original objectives: screen she selects from the dropdown menu

• Increase total ticket sales from 6,125 “Conversion:” her goal “Warrior Tkts.” Now,

tickets to 7,000 tickets over the alongside of the session metrics, Nancy also

course of the show and to increase the sees her goal’s conversion metrics (exhibit 21).

proportion of online tickets sales of

total ticket sales to 50%.

Exhibit 21
Analyze: Nancy sees that over the 3 months Analytics a total of 3,626 tickets, or

Rumble Theatre was promoting and producing 52.9% of all tickets sold, were sold

the show, a total of 10,560 sessions were through the website. Additionally, of

created. those visitors purchasing tickets

• Ticket Sales: A quick call to the box through the website, Nancy’s target

office reveals that the theatre sold only segment had a higher frequency of

6,850 tickets to Red Warrior, which ticket purchases (31.6%) compared to

represents a total ticket sale increase the purchase rate of all visitors total

of 11.8%. Even so, according to Google (20.1%).


!!FROM!STRATEGY!TO!ANALYSIS:!A!GUIDE!TO!NAVIGATING!GOOGLE!ANALYTICS! 2
! 3!

• Awareness: Of those sessions created Action: Nancy’s success in developing an

during the production’s timeframe, effective marketing strategy and using Google

visitors within her target segment Analytics to adjust her strategy based on

represented 4,756 of them, or 45%. visitor behavior has given her a powerful

Attracting new visitors from her target template to experiment with and apply to next

segment and in general was a primary season’s performances.

goal. Nancy is pleased to see that out

of all the sessions created, about Nancy used Rumble Theatre’s marketing

45.5% were first-time visitors. strategy for Red Warrior as a tool to help her

Moreover, she also sees that there was navigate the program’s seemingly endless

a higher rate of new sessions created supply of data and focus on what is relevant to

by the target segment (66.4%), as well her mission. Above all, she derived insights

as a higher rate of tickets purchased from data trends and used those insights to

online by the target segment (31.6%) alter content, channels used, and publication

compared to tickets purchased online frequency. Every shift in her website campaign

by all visitors (20.12%). created a new environment for audiences to

interact with, and thus new patterns of data

Insight: Even though the theatre collectively activity for Nancy to analyze and explore for

did not increase ticket sales by 15%, an meaning.

increase of 11.8% in ticket sales is a clear

success. Nancy is excited to see that 52.9% of Over the course of the campaign she

all tickets sold were sold through the website, discovered by routinely comparing critical

signifying that she successfully reached her website data points she possessed the ability

goal of increasing the proportion of total to nimbly adjust campaign tactics to better

tickets sold online from 35% to 50%. reach and engage her target audience. She

Heightened activity in both ticket sales as well makes a note to check Google Analytics more

as new sessions created by her target audience regularly, about twice a week. By investing

suggests that Nancy’s campaigns were not time into Google Analytics, Nancy was

only successful in reaching her target audience rewarded with a greater understanding of

but also in persuading them to visit the site which advertising methods and content were

and purchase tickets. effective in reaching her target audience, and

!
which enticed the audience to learn more

about the show and buy tickets. Going forward,

Nancy will continue to use Rumble Theatre’s

Google Analytics data to analyze visitor

behavior, identify trends, make decisions, and

communicate success to her peers and

supervisors.
Bibliography

Chaffey, Dave. “Google Analytics Goal setup.” Smart Insights (Marketing Intelligence) Ltd.
12/3/2010. Accessed 9/29/2014. http://www.smartinsights.com/google-
analytics/google-analytics-setup/getting-to-grips-with-goals-in-google-analytics/

Clifton, Brian. Advanced Web Metrics with Google Analytics, 3rd Edition. Indianapolis, IN: John
Wiley & Sons, Inc., 2012.

Cunningham, Joe. “Google Analytics for Dummies.” Cowley Associates. 7/9/2013. Accessed
10/10/14. http://www.cowleyweb.com/blog/google-analytics-dummies

Cutroni, Justin. Google Analytics. Cambridge, MA: O'Reilly Media, 2010.

Gianoglio, Jim. “Google Analytics Metrics and Dimensions.” LumaMetrics. 12/20/2012.


Accessed 9/29/2014. http://www.lunametrics.com/blog/2012/12/20/google-analytics-
metrics-dimensions/#sr=g&m=o&cp=or&ct=-
tmc&st=(opu%20qspwjefe)&ts=1412298049

Debrouwere, Stijn. “Cargo Cult Analytics.” Debrouwere.org. 8/21/2013. Accessed


10/10/2014. http://debrouwere.org/2013/08/26/cargo-cult-analytics/

Debrouwere, Stijn. “Turning Questions Into Metrics.” Debrouwere.org. Accessed 10/11/2014.


http://debrouwere.org/2014/08/26/turning-questions-into-metrics/

Hines, Kristi. “How Google Analytics Dashboards Can Make Your Life Easier.” Kissmetrics.
Accessed 10/11/14. https://blog.kissmetrics.com/google-analytics-dashboards/

Google Analytics Blog. “How to Setup Goals in Google Analytics.” Google. 5/19/2009. Accessed
9/28/2014.
http://analytics.blogspot.com/2009/05/how-to-setup-goals-in-google-analytics.html

Kantr, Beth and Katie Delahaye Paine. Measuring the Networked Nonprofit. San Francisco, CA:
John Wiley & Sons, Inc., 2012

Kakolewski, Laura. Take the Fear Out of ROI. National Arts Marketing Project: Americans for the
Arts, 2012. http://artsmarketing.org/resources/ebooks/roi. E-book.

Kaushik, Avinash. “3 Awesome, Downloadable, Custom Web Analytics Reports.” Occam’s Razor.
Accessed 10/2/2014.
http://www.kaushik.net/avinash/best-downloadable-custom-web-analytics-reports/

Kaushik, Avinash. “Best Web Metrics/KPIs for a Small, Medium, or Large-Sized Business.”
Occam’s Razor. http://www.kaushik.net/avinash/best-web-metrics-kpis-small-medium-
large-business/
Kaushik, Avinash. “I Got No Ecommerce. How Do I Measure Success?” Occam’s Razor.
http://www.kaushik.net/avinash/i-got-no-ecommerce-how-do-i-measure-success/

Kaushik, Avinash. “Web Analytics 101: Definitions: Goals, Metrics, KPIs, Dimensions, Targets.”
Occam’s Razor. Accessed 9/28/2014.
http://www.kaushik.net/avinash/web-analytics-101-definitions-goals-metrics-kpis-
dimensions-targets/.

NTEN. The State of Nonprofit Data. Portland, OR: NTEN, 2012. Accessed 9/28/2014.
http://www.nten.org/sites/default/files/data_report.pdf

Pavement, Peter. Delving into Google Analytics. Cambridge, UK: Arts Marketing Association,
2012. Accessed 9/28/2014. http://culturehive.co.uk/wp-
content/uploads/2013/02/Article-Delving-into-Google-Analytics-JAM48-Peter-Pavement-
2012.pdf

Peterson, Eric T. Web Analytics Demystified: The Big Book of Key Performance Indicators.
1/1/2006. Accessed 10/1/2014. http://www.webanalyticsdemystified.com

Ries, Eric. “Entrepreneurs: Beware of Vanity Metrics.” Harvard Business Review Blog Network.
2/8/2010. Accessed 9/28/2014. http://blogs.hbr.org/2010/02/entrepreneurs-beware-
of-vanity-metrics/

Semphonic. Non-Profit Guide to Web Analytics. Novato, CA: Semphonic, 2012. Accessed
9/28/2014. http://www.semphonic.com/wp-content/uploads/2013/02/Non-
Profit_Web_Analytics_Guide.pdf

Semphonic Inc., and Celebrus Technologies. When Web Analytics Marries Database Marketing.
Novato, CA: Semphonic, 2012. Accessed 9/28/2014. http://www.semphonic.com/wp-
content/uploads/2012/10/When_Web_Analytics_Marries_Database_Marketing.pdf

Sharif, Saye. “How a Nonprofit Can Best Use Google Analytics” LunaMetrics. 2/28/2012.
Accessed 10/10/14. http://www.lunametrics.com/blog/2012/02/28/nonprofit-google-
analytics/#sr=g&m=o&cp=or&ct=-tmc&st=(opu%20qspwjefe)&ts=1412298049

Tonkin, Sebastian, Justin Cutroni, and Caled Whitemore. Performance Marketing with Google
Analytics: Strategies and Techniques for Maximizing Online ROI. Indianapolis, IN: John Wiley
& Sons, Inc., 2010.

Unstuck Digital. “Tracking Your Blog Goals with Google Analytics.” Unstuck Digital. Accessed
9/28/2014. http://unstuckdigital.com/tracking-goals-with-google-analytics/
future internet

Review
Understanding the Digital Marketing Environment
with KPIs and Web Analytics
José Ramón Saura 1,† , Pedro Palos-Sánchez 2, *,† ID
and Luis Manuel Cerdá Suárez 3,†
1 Department of Business and Economics, Rey Juan Carlos University, Paseo Artilleros s/n,
28027 Madrid, Spain; joseramon.saura@urjc.es
2 Department of Business Management, University of Extremadura, Av. Universidad, s/n,
10003 Cáceres, Spain
3 Department of Business Organization and Marketing and Market Research, International University of La
Rioja, Av. de la Paz, 137, 26006 Logroño, La Rioja, Spain; luis.cerda@unir.net
* Correspondence: ppalos@unex.es; Tel.: +34-9-2725-7580
† These authors contributed equally to this work.

Received: 2 October 2017; Accepted: 3 November 2017; Published: 4 November 2017

Abstract: In the practice of Digital Marketing (DM), Web Analytics (WA) and Key Performance
Indicators (KPIs) can and should play an important role in marketing strategy formulation. It is the
aim of this article to survey the various DM metrics to determine and address the following question:
What are the most relevant metrics and KPIs that companies need to understand and manage in
order to increase the effectiveness of their DM strategies? Therefore, to achieve these objectives, a
Systematic Literature Review has been carried out based on two main themes (i) Digital Marketing
and (ii) Web Analytics. The search terms consulted in the databases have been (i) DM and (ii) WA
obtaining a result total of n = 378 investigations. The databases that have been consulted for the
extraction of data were Scopus, PubMed, PsyINFO, ScienceDirect and Web of Science. In this study,
we define and identify the main KPIs in measuring why, how and for what purpose users interact
with web pages and ads. The main contribution of the study is to lay out and clarify quantitative and
qualitative KPIs and indicators for DM performance in order to achieve a consensus on the use and
measurement of these indicators.

Keywords: internet; digital marketing; web analytics; KPI; measurement

1. Introduction
The growth of the Internet over the past decade is one of the most widely used examples to help
explain globalization. In the information age and the increasingly networked economy, electronic
Commerce (e-Commerce) is seen as one of the main instruments to foster business growth, labour
movement and interpersonal relationships. DM is not just a transactional tool, but also generates
change at the commercial and microeconomic level, which in turn demands changes in marketing
practice and theory [1]. From a historical perspective, it is clear that all types of companies have had to
adapt all their business practices to the availability/progress of new technology, new management
techniques and an ever-changing communications landscape.
The rapid spread of computing power in all manner of devices has fostered the creation of the
Digital Economy, or “a new socio-political and economic system characterised by an intelligent space
consisting of information access tools and information processing and communication capabilities” [2].
While WA is widely used by popular websites to provide useful data for client companies, its rising
popularity among users is not necessarily reflected in academic research. The research that is done
also paints a rather discouraging picture showing that most WA use is ad-hoc, the analysis is not
used strategically, and the benefits tend to be imprecise. Thus, in practice, many marketing managers

Future Internet 2017, 9, 76; doi:10.3390/fi9040076 www.mdpi.com/journal/futureinternet


Future Internet 2017, 9, 76 2 of 13

remain wary of performance measurement data and prefer to rely on intuition and experience for
decision-making [3]. Given the evolving nature of WA, this is understandable. This study therefore
suggests that the main benefits of WA for DM performance measurement will be determined by how
companies exploit the system under specific contextual circumstances.
Understanding the effectiveness of DM strategies requires the ability to analyze and measure
their impact [4]. Appropriate, accurate and timely DM metrics are critical for a company to assess
whether they are achieving their objectives, or whether the selected strategy is appropriate to achieve
organizational goals [5].
DM is the simultaneous integration of strategies on the web, through a specific process
and methodology, looking for clear objectives using different tools, platforms and social media.
The importance of DM for companies resides in changes in the ways that today’s consumers gather
and assess information and make purchasing decisions, in addition to the channels they use for this
process [6].
According to [7], we can distinguish four types of control necessary to guarantee the outcome of a
marketing plan for business: Control of the annual plan; Control of Profitability; Efficiency Control
and Strategic Control. In this research, we cover the necessary actions for the Control of Profitability
in DM, Strategic Control in the web measurement and analytical KPI’s relevant to the consumer or
Internet users.
While there are a great number of possible metrics and indicators, each one designed to measure a
specific aspect of the DM plan [8], the choice of which metrics will enable insightful and useful analysis
remains a tricky question for business managers. It is the aim of this article to survey the various DM
metrics to determine and address the following question: What are the most relevant metrics and
KPIs that companies need to understand and manage in order to increase the effectiveness of their
DM strategies?
Identifying KPIs for DM and WA, Marketing professionals and Academics can efficiently measure
key indicators related to the development of tactics and actions that are performed in the digital
environment. By identifying the most important indicators, companies could improve conversion
rates and consequently, increase their visibility on the Internet.

2. Methodology

2.1. Literature Review


The main objective of this research is to identify the main WA parameters for MD measurement
on the Internet. Secondary objectives are twofold: (i) to identify how investigators are linking the MD
with WA to find possible improvements; and (ii) to undertake a systematic literature review to help
structure a roadmap for future research in this area.
Therefore, to achieve these objectives, a systematic literature review has been carried out based
on two main themes (i) Digital Marketing and (ii) Web Analytics. The search terms consulted
in the databases have been (i) digital marketing and (ii) web analytics obtaining a result total of
378 investigations. Research has also been categorized into the type of research; empirical, revision or
conceptual, the research perspective and the main theme of the article (search carried out in January
2016 using 4 databases; PubMed, PsyINFO, ScienceDirect and Web of Science).
Although most articles are conceptual, many of them did not specifically develop the terms of
web analytics intended by this research.
The methodology chosen, systematic literature review is based on the work developed by [9].
The structure of PICOS, defined as the review and extraction of data, has been followed. This means
that the following variables have been considered: (i) Participants (any); (ii) Interventions (DM
indicators); (iii) Comparators (any); (iv) Outcomes (WA indicators and KPIs); and (v) Study design
(systematic reviews).
Future Internet 2017, 9, 76 3 of 13

2.2. Data Extraction


Future Internet 2017, 9, 76 
The databases that have   been consulted for the extraction of data were Scopus, PubMed, 3 of 13 
PsyINFO,
ScienceDirect and Web of Science. The queries were filtered for articles in English only and no more
The  databases  that  have  been  consulted  for  the  extraction  of  data  were  Scopus,  PubMed, 
filtersPsyINFO, ScienceDirect and Web of Science. The queries were filtered for articles in English only and 
were applied. When querying the mentioned databases, the Boolean operators of AND and OR
were no more filters were applied. When querying the mentioned databases, the Boolean operators of AND 
used to optimize the results of the databases corresponding to the topics of (i) Digital Marketing
and (ii) Web Analytics. The search terms can be seen in Table 1.
and OR were used to optimize the results of the databases corresponding to the topics of (i) Digital 
Marketing and (ii) Web Analytics. The search terms can be seen in Table 1.   
Table 1. Search term used.
Table 1. Search term used. 
Digital Marketing and Web Analytics Definition of Concepts Related to the Goals of the Research
Digital Marketing and Web Analytics Definition of Concepts Related to the Goals of the Research
“objectives”
“objectives” 
“measurement”
“measurement” 
“traffic”
“traffic” 
“Digital Marketing” “KPI”
“Digital Marketing”  “KPI” 
“onine marketing” “strategies”
“onine marketing” 
“marketing in Internet” “strategies” 
“indicators”
“marketing in Internet” 
“marketing on Internet” “indicators” 
“concepts”
“marketing on Internet” 
“Web Analytics” (AND)   “concepts” 
“variables”
“Web Analytics” (AND) 
“web measurements” “variables” 
“identifiers”
“Internet analytics”
“web measurements”  “values”
“identifiers” 
“web page analytics”
“Internet analytics”  “analytic indicators”
“values” 
“web page analytics”  “analytic variables”
“analytic indicators” 
“techniques”
“analytic variables” 
“tactics”
“techniques” 
“tactics” 
The titles and abstracts of articles have been independently analyzed to determine if the articles
The titles and abstracts of articles have been independently analyzed to determine if the articles 
are fit to continue with the systematic literature review process. All the articles present in this research
are fit to continue with the systematic literature review process. All the articles present in this research 
have been analyzed individually. The criteria are based on the AMSTAR tool [10] (see Figure 1) to
have been analyzed individually. The criteria are based on the AMSTAR tool [10] (see Figure 1) to 
incorporate only high quality abstracts. Although the AMSTAR tool was initially designed to assess
incorporate only high quality abstracts. Although the AMSTAR tool was initially designed to assess 
the quality of the articles from their abstracts, we have followed the indications of [11] as an eligibility
the quality of the articles from their abstracts, we have followed the indications of [11] as an eligibility 
gaugegauge for this research. 
for this research.

Articles identified in  databases  with  the 


terms  “digital  marketing”  and  “web 
analytics” (n = 378) 

Excluded  articles  after  analysis  of  the 


titles and abstracts (n = 296): 
 Inadequate terms 
 Not conclusive 

Potentially apt articles (n = 82) 
Excluded  articles  after  analysis  of  the 
complete article (n = 63): 

 Not fitting search terms 
 No relation with the research topic 
 No quality evaluation 
 No description and specification of 
terms 

Included articles (n = 26) 
 
Figure 1. PRISMA 2009 Flow Diagram. 
Figure 1. PRISMA 2009 Flow Diagram.
Future Internet 2017, 9, 76 4 of 13

The objective is to achieve the highest possible amount of evidence in the results based on quality
studies. Some of the variables used in AMSTAR to evaluate the quality of the systematic review were
(i) the relationship of the research question to the criteria included in the study, (ii) the extraction of data
from at least two independent researchers (iii) the quality of the literature review, (iii) identification and
definition of concepts, and (iv) the quality of the references used throughout the study. As developed
by [11] we have included the following criteria for the development of the methodology:

• Systematic review of abstracts (meta-analysis)


• Include structured research evaluations
• Published in journals and research journals
• Written in English
• Conclusions and research on topics directly related to digital marketing techniques and
measurement with web analytics.

In the first phase, databases and search terms were identified, obtaining a total sample of n = 378.
Secondly, after analyzing each article individually, a total of 296 articles were excluded from the
initial sample due to inadequate topics. Consequently, in the third phase of the systematic review,
a total of potentially appropriate articles of n = 82 were obtained. However, after applying the
exclusionary processes after analysis: not fitting the search terms; no explicit relation to the research
topic; Investigations without quality evaluation and lacking a description and specification of terms,
the sample obtains a total of n = 26 articles.

3. Results

3.1. Process of Data Evaluation and Study Selection


Table 2 lists and describes the indicators and KPIs used.

Table 2. Literature review about DM KPI definitions used by relevant literature.

Theme Relevant Literature Key Concepts Used and Analyzed


WA; search engine optimization (SEO); return on
[4,8]
investment (ROI), click-through rate (CTR).
Search engine marketing (SEM); SEO; ROI; CTR; KPIs;
[12,13]
traffic; unique users; lead; conversion rate and sources.
Search engines; clicks; page views; interaction; users; leads;
[14–17] KPIs; SEM; SEO; Pay-per-click (PPC); conversion and
Digital Marketing conversion rates.
WA; SEM; SEO; CTR; PPC; traffic, conversion; conversion
[3,18–20]
rate and type of users.
[1,2] WA; SEO; ROI; CTR and traffic.
SEM; SEO; CTR; PPC; new visitors, keywords and
[6]
conversion rates.
[21,22] SEO; PPC; keywords; user friendly, user type
DM; KPIs; traffic; unique visitors; pages views; conversion
[12,23]
rate; goals; cost per lead (CPL); leads and surveys.
Search engines; type of traffic; keywords; time on site; CTR;
[24]
ROI and type of users.
Search engines; type of traffic; traffic sources; direct traffic
Web Analytics [25]
and user friendly.
[26,27] SEO; PPC; users; conversion; search traffic and ROI.
DM; ROI; traffic; unique users; lead; conversion; A/B
[8,28,29] testing; conversion rate; goals conversion rate; new
visitors, returning visitors.
Future Internet 2017, 9, 76 5 of 13

3.2. Analysis of Scientometrics


This section presents the findings of our scientometric analysis on the identified scientific Journals
contributions concerning DM and WA based on the total of findings, Quartile, and Category (see
Table 3).

Table 3. Analysis of Scientometrics.

Journal Total of Findings Quartile Category


Industrial Marketing Management 4 Q2 Business
Journal of Interactive Marketing 4 Q1 Business
International journal of research in Marketing 3 Q1 Business
Journal of Business Research 2 Q1 Business
International journal of Information Management 2 Q1 Information Science and Library Science
The Journal of Academic Librarianship 2 Q2 Information Science and Library Science
Journal of Service Research 1 Q1 Business
Managing Service Quality 1 Q1 Business
Engineering Applications of Artificial Intelligence 1 Q1 Computer Science, Artificial Intelligence
Computer Standards and Interfaces 1 Q2 Computer science, software engineering
International Business Review 1 Q2 Business
European Management Journal 1 Q2 Business
Journal of Services Marketing 1 Q3 Business
Public relation review 1 Q3 Business
Mobile information systems 1 Q4 Computer Science, Information Systems

As Table 3 shows, the research presented over the last years is categorized into two main research
themes: Business, Computer Science and Information Systems.
MD has a significant business perspective since it is used as a tool for promotion and sale on the
Internet, but from the Computer Science perspective, a high and technical value is provided in order
to implement and develop these techniques as well as from the category of Information Science.
The research theme in DM and WA is a mix of these three research sciences. The total number
of investigations that have been selected after passing the quality filters developed in the systematic
review of literature can be appreciated in Table 3. In addition, it also shows the quality of the Journal
of Research when presenting the classification by Quartiles.
The Journals of Industrial Marketing Management (Business), Interactive Marketing (Business),
International Journal of research in Marketing (Business), Business Research (Business) International
Journal of Information Management (Information Science and Library Science) and The Journal
of Academic Librarianship (Information Science and Library Science) are key to understand this
research topic.

3.3. Metrics for Assessing DM Efforts


Marketing plans include budgetary allocations for communication campaigns, advertising and
other actions intended to publicize the brand, the products and services offered, and to reach
current and potential new customers, leading to the ultimate consummation of the marketing
process—A purchase. While the calculation of sales effectiveness for “traditional” marketing tools (e.g.,
TV advertising) has long been practiced, for the new, evolving digital marketing domain, this remains
a work in progress. Two of the areas most in need of improvement are the measurement of DM efforts
and DM results [6,30].
Because these metrics and analytical techniques are evolving, it is seldom easy to calculate the
ROI [31] of a campaign in DM. Depending on campaign objectives and complexity, it may be very
challenging to measure accurately. Even so, as a general managerial rule, an estimation of the results
should be attempted. This means that companies need to work with all available information, going
beyond the metrics provided by an agency or a digital medium [32].
Companies must analyze whether the money spent on a campaign generates business, whether it
is a superfluous expense, or if it really is an investment that generates a return. In the short, medium or
long-term, companies should devote resources in order to be able to calculate return on investment [33].
Future Internet 2017, 9, 76 6 of 13

In Table 4, we show one of the most common, and conceptually simple methods found in the literature
for calculating the profitability of DM actions.

Table 4. Measures to calculate the ROI in DM.

ROI (Return on Investment) CTR (Click-Through Rate)


A metric that measures the number of clicks
A performance measure used to evaluate the
advertisers receive on their ads per number
efficiency of an investment or to compare the
of impressions.
efficiency of a number of different investments.
It can also feed into a calculation of paid per click
Calculated by comparing the spending on DM to the
(PPC) or cost per Click (CPC).
sales increases. The return on investment formula:
The click-through rate formula:
Gain from Investment − Cost of Investment Number of Clicks
ROI = Cost of Investment
CTR = Impressions

3.4. DM Techniques
According to the [34]: “WA is the practice of the measurement, collection, analysis and reporting
of Internet data in order to understand how a site is used by an audience and how to optimize it”.
The focus of WA is to understand the users of a site, their behaviour and activity [35].
If WA is to be meaningful, the data collection process must be carefully designed to deliver
consistent and reliable data. Analysts working in WA should be WA are of how systems work and
how they generate data. They should be able to audit its implementation and operation. The first step
in the analysis is to be sure of the veracity of information. It is at this point that the technical tools must
perform their actions correctly [36].
As is well-known, there are different techniques of DM such as search marketing (SEO or SEM),
social media marketing, affiliate marketing or content marketing. However, to establish the main
KPIs and metrics for companies we will focus on the analysis of the techniques of SEO and SEM.
Search marketing shares, the main KPIs with other DM techniques because search engines are the
main channel of contact between the user and Internet companies [37].
The difference between visibility with search engines compared to other models in traditional
Internet advertising is that the user voluntarily seeks a service, product or information [38].
The accepted thinking in SEO and SEM is that to attract user traffic, it is essential that a website
is among the first two or three positions on the first page of the search engine results page (SERPs),
as derived through keyword ranking.
At this point we refer to the use of SEO as a technique or process for improving the visibility of a
page to search engines (to move up into the top results) in the rankings. From there, the results can be
assessed and analysed to calculate conversion rates (conversion here means moving from being seen to
being acted upon, as in clicking). Being present on the Internet at the right time with a relevant search
term can become a business opportunity [27,38].
The SEM proposition, however, is that it can make DM work better by promoting websites and
raising their visibility in SEPRs. Techniques such as SEO and SEM have led to consolidated contracting
models for advertising campaigns that can be applied to both display ads and text ads (see Table 2).
This is a positive development because firms need clear media buying models in order to achieve the
aims of their campaigns [38].
DM use of WA provides indicators of the effectiveness of each individual Internet Marketing
technique employed. In turn, these indicators are related to the different pricing models in digital
advertising, and therefore feed into the payment models used in DM strategies. It is the monetization
of SEO and SEM combined with WA that should enable marketers to calculate the return on investment
of their marketing efforts and to determine the basis for measuring the profitability and effectiveness
of DM campaigns [31,39,40].
Future Internet 2017, 9, 76 7 of 13

However, as seen in the literature review, there is little consensus in the DM ecosystem about
which particular metrics are most useful—for example, clicks, impressions, or number of page views
which are based on user behaviour on a website [41].
In order to reliably use these metrics, we must first examine the different contracting models used
in the calculation of digital advertising rates. The literature review shows that the metrics analyzed
in each type of research are determined by the type of contracting models in which companies have
invested. In Table 5 we can see the main contracting models used in the researches analyzed.

Table 5. Type of advertising contracting model.

Type of Advertising Contracting Model Description


CPI (Cost per impression) or CPM (Cost per thousand impressions) One of the most common ways of buying digital media.
PPC (Pay Per Click) and CPC (Pay Per Click) Here the advertiser pays when a click is made on an ad.
CPL (Cost per Lead) The advertiser pays when a lead form is completed and submitted.
CPA (Cost per Action) Here the advertiser pays only if a form or lead is made.

3.5. Quantitative and Qualitative Analytical Indicators


WA may be quantitative or qualitative and each type can be used to understand the behaviour of
the target audience, analyze trends, check the performance of the actions taken and help in making
strategic decisions. The Internet has a huge number of possible measures, so often the hardest part is
not getting the information, but being able to capture the real meaning and being able to interpret it.
Today, there are many WA tools. For example, small data tags—cookies—can be installed in a
user’s browser when he enters a website. In addition, cookies can be used to find other information
files with server logs, tags or navigation bars. The choice of one or other will depend on the needs of
measurement, the focus of the study, accuracy of the data and cost. These data can be used to get clear
objectives, set measurement parameters, segment the audience and configure and implement the WA
tool. There are several alternatives but Google Analytics is the most used due to its power and free
cost [29,42].

3.5.1. Key Quantitative Analytical Indicators


These are some important quantitative analytical indicators based on more measurable data.
According to Kaushik [8], a leading specialist in the field of global WA, and Digital Marketing
Evangelist at Google, there are 3 basic types of results we expect from quantitative data analysis:
Increased revenue (conversion), reduced costs (conversion rate) and increased customer satisfaction
and customer loyalty (user), or loosely, customer engagement. These three objectives, especially the
first two, are aimed at e-commerce [1]. When considering content strategy, the focus is on informing
the user or potential customer to attract attention and interest.
After reviewing the relevant literature, we highlight the following quantitative indicators in order
to clarify those most commonly used in DM in Table 6. The table is not meant to be wholly inclusive,
rather it includes only those measures in most common usage:

Table 6. Quantitative Indicators in DM.

Quantitative Indicators Description


An instance of an organic search-engine listing or sponsored ad being served on a
Impressions
particular Web page or an image being viewed in display advertising.
Traffic Number of visitors who come to a website.
Unique users Number of different individuals who visit a site within a specific time period.
When a visitor registers, signs up for, or downloads something on an advertiser’s
Lead
site. A lead might also comprise a visitor filling out a form on an advertiser’s site.
What defines a conversion depends on the marketing objective. It could be a sent
Conversion
form, a click on an ad or a purchase. It is an objective or goal.
Future Internet 2017, 9, 76 8 of 13

3.5.2. Key Qualitative Analytical Indicators


There are several ways to measure user behaviour and optimize service. One is to use qualitative
analytics data that identifies reasons related to why the user has performed an action on a web page.
These metrics are feedback that links to quantitative analytics indicators [38].
WA are not user data, they are information provided by the actions performed by users. We can
divide the qualitative information into two groups, depending on how we get it: Direct (i), asking
users in a direct way on a number of issues: surveys, discussion groups or focus and interviews with
users, and Hint (ii): Do not directly ask the user, but analyze the behaviour and response to timely
questions A/B testing, usability studies or heuristic analysis experts. The great challenge for all firms
is in analyzing the information effectively to turn it into knowledge to WA better conclusions about
user behaviour on a website [31].
In much of the relevant literature, the authors discuss user behaviour on the web and how it
can be measured with metrics such as time on site, number of page views or user experience when
interacting with web design. Some of these authors talk about phases and actions taken by companies
to improve user behaviour on a site, but do not define the specific qualitative indicator [31,43].
To define qualitative indicators that collect these user actions, a number of alternatives have been
used in the literature as laid out in Table 7.

Table 7. Qualitative Indicators in DM.

Qualitative Indicator Description


A/B testing refers to two different versions of a page or a page element such as
a heading, image or button. A/B testing is aimed at increasing page or site
A/B Testing
effectiveness against key performance indicators including click through rates,
conversion rates and revenue per visit.
A statement or instruction, typically promoted in print, web, TV, radio,
on-portal, or other forms of media (often embedded in advertising),
Call to Action (CTA)
that explains to a mobile subscriber how to respond to an opt-in for a particular
promotion or mobile initiative, which is typically followed by a Notice.
Encompasses all aspects of the end-user’s interaction with the company, its
services, and its products through different devices. This term is also used with
User experience (UX)
Information Architecture (IA), which is the structural design of shared
information on a site based on user behaviour.
A system of classifying according to quality or merit or amount which could
Rating systems
divide and organize the type of users.
Tools that allows users to send information to a website. It is usually used to set
Surveys and forms
the number of conversions or conversion goals in a web site or DM campaign.
Graphical representation of the paths users took through the site, from the
source, through the various pages, and where along their paths they exited the
The Flow of Users site. The Users Flow report lets you compare volumes of traffic from different
sources, examine traffic patterns through your site, and troubleshoot the
effectiveness of your site. It is used to understand the user behaviour on a site.

In addition, it is important to highlight that qualitative indicators related to social media are
important. Social interactions between companies and users should be analyzed. Social media
consumer interactions are important to any company in the Digital Marketing environment.
These social factors are defined and identified by many authors who focus their research on evaluating
social indicators and social commerce interactions to improve their DM strategies [14,20,21,36,43].

3.5.3. Key Performance Indicators (KPIs) in DM


Each company can identify the KPIs that it judges are most relevant to its business. In many cases,
this may be a process of trial and error [44]. The consensus in the literature is that to be useful, KPIs
must have the following characteristics: They must be Measurable (i), by definition a KPI should be
measurable in the DM environment. For example, it is difficult to measure how useful a web page
Future Internet 2017, 9, 76 9 of 13

is for a user, but the time spent on the page can be measured. If this is long, we can assume that the
content of the page is useful; Achievable (ii), the objectives considered when setting KPIs must be
credible. Sometimes too much information can be a problem and there are dozens of KPIs to choose
from, but only a few provide information of interest. Finally, a KPI must be Available at least for a time
(iii), KPIs must meet deadlines, and be available for reasonable periods of time [1,45,46].
In order to define a set of indicators for companies, after researching the overview and analysis of
the relevant articles on this topic, we define the following basic KPIs that companies should follow
and analyse with WA in their DM strategies as laid out in Table 8.

Table 8. Key Performance Indicators (KPIs) in DM.

KPI in DM Description
The average number of conversions per click in SERP results or in Ads click
(depends on the marketing objective), shown as a percentage. Conversion rates are
Conversion Rate calculated by simply taking the number of conversions and dividing that by the
number of total ad clicks/actions that can be tracked to a conversion during the
same time period.
A goal represents a completed activity (also called a conversion). Examples of goals
Goals Conversion Rate include making a purchase -e-commerce-, completing a game level (App), or
submitting a contact information form (Lead generation site).
New Visitors. They are users who visit your site for the first time. Returning Visitors.
Type of Users They are users who visit your site for the second or more times. It is important
because it shows the interest of your business and website for the target audience.
Source. Every referral to a web site has an origin, or source. Medium. Every referral
to a website also has a medium, such as, according to Google Analytics: “organic”
Type of Sources (unpaid search), CPC, referral, email and “none”, direct traffic has a medium of
none. Campaign. Is the name of the referring AdWords campaign or a custom
campaign that has been created.
Keywords in DM, are the key words and phrases in a web content that make it
possible for people to find a site via search engines. A non-branded keyword is a one
Keywords/Traffic of Non branded
that does not contain the target website’s brand name or some variation.
Keywords
Ranking for non-branded keywords is valuable because it allows a website to
obtain new visitors who are not already familiar with the brand.
Rank is an estimate of your website’s position for a particular search term in some
Keyword Ranking search engines’ results pages. The lower the rank is, the easier your website will be
found in search results for that keyword.

4. Conclusions
This paper provides a comprehensive and systemic overview of the current status of the theoretical
and empirical literature on DM and WA.
The development of the Internet and electronic commerce involves a change in marketing thinking
and practice due to the fact that traditional marketing has had to develop new techniques for the
Internet. This has resulted in the existence of a gap between the development of new techniques of
DM and measurement processes that have to be performed for the correct measurement of results.
Due to the increased use of DM in the last decade and the investment made by companies
in the last few years we have carried out an investigation to determine the key indicators to which
companies should pay attention in order to measure their digital marketing actions. Researches present
concerns expressed by companies about the lack of knowledge of what metrics they should use to
justify their marketing investments [15]. This remains true despite the many articles published on
DM measurement topics in the last few years. Researchers use a wide variety of different metrics and
indicators to measure the efficiency and effectiveness of DM techniques and calculate profitability
(ROI), however, our study shows that little consensus has been formed about the use of these indicators,
or on the definition of the key factors for measuring the DM performance.
In summary, this research presents the main analytical indicators to measure the performance of
DM. It highlights the most commonly used indicators that might therefore offer potential for increasing
standardisation and comparability of results across studies [47–49].
Future Internet 2017, 9, 76 10 of 13

Second, the indicators defined in this study are based on the use of relevant analytical indicators
in the field of MD and WA. The goal was to define correctly these indicators to group the main KPIs
for the measurement of DM return of investment.
The contribution of the theoretical framework demonstrates how companies should understand
the different contracting models in DM to establish relevant indicators and how they should understand
the main models of performance measurement in DM. In the study, we can see what the main
contracting models studied in the main works of DM are. This means that the understanding of the
different models of recruitment advertising on the Internet is important to determine the indicators to
be measured and calculate the ROI.
Third, the literature shows the importance of using two types of WA as a basis of assessment
in DM; (i) quantitative analytical indicators, which allow work on real data, quantifying different
goals or conversions and which are the main indicators studied by the authors, and (ii) qualitative
analytical indicators that are used in DM to show how the user understands a website, helping to
define KPIs to understand the on-line buying process and user behaviour. This study makes the
additional contribution of clarifying the main qualitative indicators from the literature.
Following the indicators identified in this research strategies and actions in DM can be improved.
Marketers and Academics can check the efficiency of their activities by consulting the ROI or CTR
in their actions in DM. To measure and optimize each process carried out by users on the website,
Marketers and Academics can consult the indicated qualitative indicators. In addition, it will allow
them to optimize and structure their strategies. On the other hand, they can use this research to
improve the online shopping process and the User Experience (UX). This will increase the conversion
rates. In order to measure the online strategies objetives, this research suggests that different KPIs
should be determined to assess the impact of each action. Each Marketer or Academic, could use these
indicators to improve their strategies and account the goals achieved in DM.

4.1. Implications for Academics


The significance of our research and results for academics has its value in the study of DM as a
theoretical concept. The growing use of DM is tied to the development of the Internet in the last decade.
For a topic to be relevant, a number of studies and investigations should be carried out over time.
The early development of the Internet and new technologies has meant that traditional marketing
has evolved rapidly and has adapted to the new demands of the Internet, and continues to do so.
Therefore, academic work needs to be ongoing. According to [7] DM refers to “marketing functions
performed electronically”. Therefore, academics must understand and learn the new marketing tools
and their uses. As traditional marketing evolves, academics must obtain and develop new skills and
learn the new vocabulary necessary to understand this ecosystem.
Just as there has been a gap in the Marketer’s skill-set in the professional sector [19], this
research shows there has also been a gap in the understanding amongst academics in the use of
these technologies in the professional sector. In quickly evolving fields this may be quite natural,
but it also highlights the need for academics to pursue additional research to refine the definition,
measurement and assessment tools for the DM environment. This study contributes to that effort by
cataloguing the most prominent metrics and their uses.
Looking ahead it is clear that there is enormous potential for researchers to make major
contributions to both business and technical research themes in the subject of Digital Marketing.
The continued growth in the use of social media and social networking is likely to receive much more
attention from academic researchers, which will be reflected in a growing number of publications.
The topics of online business models and advertising are key research areas in strategy and marketing.
Given the fact that most of the new developments and innovations in DM are created and implemented
by business organizations, it is critical that academic researchers continue to balance academic theory
with industry practice, and actively seek to produce research that is both rigorous and relevant to both
academics and managers.
Future Internet 2017, 9, 76 11 of 13

4.2. Implications for Marketers


Marketers recognise the importance of DM and drive its ongoing development and
implementation. However, following the investigation of [19] we see that there is a gap in skills
development in terms of monitoring and assessing marketing actions in DM. This gap will potentially
weaken marketers strategically because it means they will have an incomplete or even erroneous
understanding of the effectiveness of their interventions. In order to take full advantage of the evolving
digital ecosystem marketers need to be properly trained to understand and use the key performance
indicators that are particular to this environment. Further, they must also seek to integrate those
measures with other more traditional ones for marketing effectiveness.
In order to assist the normalization and standardization of DM indicators, our research lays out
the main indicators used in the DM ecosystem. This study presents a straightforward, easy to follow
discussion of KPIs for WA, and the different techniques that attract DM website traffic. Finally, we
lay out a logical progression that marketers can follow, starting with the types of WA contracts on
offer, to, the relevant performance indicators. These can then also lead to calculations of the ROI of
their investments. Finally, the findings of our scientometric analysis regarding the identified scientific
Journals contributions highlight the most relevant scientific Journals published on this topic.
The limitations of the study are those related to the methodology used and the number and
databases of papers analyzed. The results of the investigation could be followed as indicators but not
generalization of the sector due to the limitation of the number of investigations analyzed.

Author Contributions: José Ramón Saura, Pedro Palos-Sánchez and Luis Manuel Cerdá Suárez conceived
and designed the review; José Ramón Saura performed the methodology; Pedro Palos-Sánchez and
Luis Manuel Cerdá Suárez analyzed the results; José Ramón Saura, Pedro Palos-Sánchez and Luis Manuel Cerdá Suárez
wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Chaffey, D.; Patron, M. From web analytics to digital marketing optimization: Increasing the commercial
value of digital analytics. J. Direct Data Digit. Mark. Prac. 2012, 14, 30–45. [CrossRef]
2. Baye, M.R.; Santos, B.D.; Wildenbeest, M.R. Search engine optimization: What drives organic traffic to retail
sites? J. Econ. Manag. Strategy 2015, 25, 6–31. [CrossRef]
3. Germann, F.; Lilien, G.L.; Rangaswamy, A. Performance implications of deploying marketing analytics. Int. J.
Res. Mark. 2013, 30, 114–128. [CrossRef]
4. Pauwels, K.; Aksehirli, Z.; Lackman, A. Like the ad or the brand? Marketing stimulates different electronic
word-of-mouth content to drive online and offline performance. Int. J. Res. Mark. 2016, 33, 639–655.
[CrossRef]
5. Yang, Z.; Shi, Y.; Wang, B. Search engine marketing, financing ability and firm performance in E-commerce.
Procedia Comput. Sci. 2015, 55, 1106–1112. [CrossRef]
6. Leeflang, P.; Verhoef, P.; Dahsltröm, P.; Freundt, T. Challenges and solutions for marketing in a digital era.
Eur. Manag. J. 2014, 32, 1–12. [CrossRef]
7. Kotler, A.E. Principles of Marketing; Pearson: Boston, MA, USA, 2016.
8. Kaushik, A. Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity; John Wiley & Sons:
Hoboken, NJ, USA, 2009.
9. Smith, V.; Devane, D.; Begley, C.M.; Clarke, M. Methodology in conducting a systematic review of systematic
reviews of healthcare interventions. BMC Med. Res. 2009, 11, 15. [CrossRef] [PubMed]
10. AMSTAR is a Reliable and Valid Measurement Tool to Assess the Methodological Quality of Systematic Reviews.
Available online: https://www.ncbi.nlm.nih.gov/pubmed/19230606 (accessed on 12 September 2017).
11. Bosch, M.V.; Sang, A.O. Urban natural environments as nature based solutions for improved public
health—A systematic review of reviews. J. Transp. Health 2017, 5, S79. [CrossRef]
12. Seggie, S.H.; Cavusgil, E.; Phelan, S.E. Measurement of return on marketing investment: A conceptual
framework and the future of marketing metrics. Ind. Mark. Manag. 2017, 36, 834–841. [CrossRef]
Future Internet 2017, 9, 76 12 of 13

13. Li, L.-Y. Marketing metrics’ usage: Its predictors and implications for customer relationship management.
Ind. Mark. Manag. 2011, 40, 139–148. [CrossRef]
14. Järvinen, J.; Töllinen, A.; Karjaluoto, H.; JayWAardhena, C. Digital and social media marketing usage in B2B
industrial section. Mark. Manag. J. 2012, 22, 102–117. [CrossRef]
15. Royle, J.; Laing, A. The digital marketing skills gap: Developing a Digital Marketer Model for the
communication industries. Int. J. Inf. Manag. 2014, 34, 65–73. [CrossRef]
16. Bates, J.; Best, P.; Mcquilkin, J.; Taylor, B. Will web search engines replace bibliographic databases in the
systematic identification of research? J. Acad. Librariansh. 2017, 43, 8–17. [CrossRef]
17. Choudhary, V.; Currim, I.; Dewan, S.; Jeliazkov, I.; Mintz, O.; Turner, J. Evaluation set size and purchase:
Evidence from a product search engine. J. Interact. Mark. 2017, 37, 16–31. [CrossRef]
18. Aswani, R.; Kar, A.K.; Ilavarasan, P.V.; Dwivedi, Y.K. Search engine marketing is not all gold: Insights from
Twitter and SEOClerks. Int. J. Inf. Manag. 2018, 38, 107–116. [CrossRef]
19. Dotson, J.P.; Fan, R.R.; Feit, E.M.; Oldham, J.D.; Yeh, Y. Brand attitudes and search engine queries.
J. Interact. Mark. 2017, 37, 105–116. [CrossRef]
20. Oberoi, P.; Patel, C.; Haon, C. Technology sourcing for website personalization and social media marketing:
A study of e-retailing industry. J. Bus. Res. 2017, 80, 10–23. [CrossRef]
21. Jayaram, D.; Manrai, A.K.; Manrai, L.A. Effective use of marketing technology in Eastern Europe: Web
analytics, social media, customer analytics, digital campaigns and mobile applications. J. Econ. Financ.
Adm. Sci. 2015, 20, 118–132. [CrossRef]
22. Fishkin, R.; Høgenhaven, T. Inbound Marketing and SEO: Insights from the Moz Blog; Wiley:
Hoboken, NJ, USA, 2013.
23. Nabout, A.; Skiera, B.; Stepanchuk, T.; Gerstmeier, E. An analysis of the profitability of fee-based
compensation plans for search engine marketing. Int. J. Res. Mark. 2012, 29, 68–80. [CrossRef]
24. Wilson, R.F.; Pettijohn, J.B. Affiliate management software: A premier. J. Website Promot. 2008, 3, 118–130.
[CrossRef]
25. Wilson, R.D. Using web traffic analysis for customer acquisition and retention programs in marketing.
Serv. Mark. Q. 2004, 26, 1–22. [CrossRef]
26. Kent, M.L.; Carr, B.J.; Husted, R.A.; Pop, R.A. Learning web analytics: A tool for strategic communication.
Public Relat. Rev. 2011, 37, 536–543. [CrossRef]
27. Lee, G. Death of ‘last click wins’: Media attribution and the expanding use of media data. J. Direct Data Digit.
Mark. Pract. 2010, 12, 16–26. [CrossRef]
28. Fagan, J.C. The suitability of web analytics key performance indicators in the academic library environment.
J. Acad. Librariansh. 2014, 40, 25–34. [CrossRef]
29. Plaza, B. Google analytics intelligence for information professionals. Online 2010, 34, 33–37.
30. Xu, Z.; Frankwick, G.L.; Ramirez, E. Effects of big data analytics and traditional marketing analytics on new
product success: A knowledge fusion perspective. J. Bus. Res. 2016, 69, 1562–1566. [CrossRef]
31. Palos Sanchez, P.R. Aproximación a los factores claves del retorno de la inversión en formación e-learning.
3C Empresa 2016, 5, 12. [CrossRef]
32. Fiorini, P.M.; Lipsky, L.R. Search marketing traffic and performance models. Comput. Stand. Interfaces 2012,
34, 517–526. [CrossRef]
33. Järvinen, J.; Karjaluoto, H. The use of Web analytics for digital marketing performance measurement.
Ind. Mark. Manag. 2015, 50, 117–127. [CrossRef]
34. Bourne, M.; Neely, A.; Platts, K.; Mills, J. The success and failure of performance measurement initiatives:
Perceptions of participating managers. Int. J. Oper. Prod. Manag. 2002, 22, 1288–1310. [CrossRef]
35. Digital Analytics Association. 2018. Available online: http://goo.gl/BJnhaJ (accessed on 5 September 2017).
36. Vásquez, G.A.; Escamilla, E.M. Best practice in the use of social networks marketing strategy as in SMEs.
Procedia Soc. Behav. Sci. 2014, 148, 533–542. [CrossRef]
37. Nabout, N.A.; Skiera, B. Return on quality improvements in search engine marketing. J. Interact. Mark. 2012,
26, 141–154. [CrossRef]
38. Hwangbo, H.; Kim, Y.S.; Cha, K.J. Use of the smart store for persuasive marketing and immersive customer
experiences: A case study of Korean apparel enterprise. Mob. Inf. Syst. 2017, 2017, 4738340. [CrossRef]
Future Internet 2017, 9, 76 13 of 13

39. Kim, J.; Xu, M.; Kahhat, R.; Allenby, B.; Williams, E. Designing and assessing a sustainable networked
delivery (SND) system: Hybrid business-to-consumer book delivery case study. Environ. Sci. Technol. 2009,
43, 181–187. [CrossRef] [PubMed]
40. Mathews, S.; Bianchi, C.; Perks, K.J.; Healy, M.; Wickramasekera, R. Internet marketing capabilities and
international market growth. Int. Bus. Rev. 2016, 25, 820–830. [CrossRef]
41. Mavridis, T.; Symeonidis, A.L. Identifying valid search engine ranking factors in a Web 2.0 and Web 3.0
context for building efficient SEO mechanisms. Eng. Appl. Artif. Intell. 2015, 41, 75–91. [CrossRef]
42. Welling, R.; White, L. Web site performance measurement: Promise and reality. Manag. Serv. Qual. 2006,
16, 654–670. [CrossRef]
43. Thaichon, P.; Quach, T.N. Online marketing communications and childhood’s intention to consume unhealthy
food. Australas. Mark. J. 2016, 24, 79–86. [CrossRef]
44. Moreno, J.; Tejeda, A.; Porcel, C.; Fujita, H.; Viedma, E. A system to enrich marketing customers acquisition
and retention campaigns using social media information. J. Serv. Res. 2015, 80, 163–179. [CrossRef]
45. File, K.M.; Prince, R.A. Evaluating the effectiveness of interactive marketing. J. Serv. Mark. 1993, 7, 49–58.
[CrossRef]
46. Peters, K.; Chen, Y.; Kaplan, A.M.; Ognibeni, B.; Pauwels, K. Social media metrics—A framework and
guidelines for managing social media. J. Interact. Mark. 2013, 27, 281–298. [CrossRef]
47. Meghan, L.M.; Tang, T. Mobile marketing and location-based applications. In Strategic Social Media: From
Marketing to Social Change; John Wiley & Sons: Hoboken, NJ, USA, 2016; pp. 130–143. [CrossRef]
48. Arch, G.; Woodside, J.; Milner, W. Buying and Marketing CPA Services. Ind. Mark. Manag. 1992, 21, 265–272.
[CrossRef]
49. Palos Sanchez, P.R.; Cumbreño, E.; Fernández, J.A. Factores condicionantes del marketing móvil: Estudio
empírico de la expansión de las apps. El caso de la ciudad de Cáceres. Rev. Estudios Econ. Empres. 2016,
28, 37–72.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
21
Data
analytics

In this chapter, you will learn:


»» The importance of analytics to digital marketing.
»» What metrics you can and should be tracking.
»» How to capture web analytics data.
»» Techniques and guidelines for analysing data to better understand
your users.
»» How to present data clearly and how to use data visualisation to help
users understand it.
Data analytics › Introduction Data analytics › Working with data

21.1 Introduction Heatmap


A data visualisation tool that shows levels of activity on a web page
in different colours.
Picture the scene: You’ve opened up a new fashion retail outlet in the trendiest
shopping centre in town. You’ve spent a small fortune on advertising and branding. A popular scripting language. Also used in web analytics for page
You’ve gone to great lengths to ensure that you’re stocking all of the prestigious JavaScript
tagging.
brands. Come opening day, your store is inundated with visitors and potential
customers. Key
performance A metric that shows whether an objective is being achieved.
And yet, you are hardly making any sales. Could it be because you have one cashier indicator (KPI)
NOTE for every hundred customers? Or possibly it’s the fact that the smell of your freshly
painted walls chases customers away before they complete a purchase? While it A text file created on the server each time a click takes place,
Remember, analytics Log file
can be difficult to isolate and track the factors affecting your revenue in this fictional capturing all activity on the website.
data can be found in
many places, not just store, move it online and you have a wealth of resources available to assist you with
your website. Consider Metric A defined unit of measurement.
tracking, analysing and optimising your performance.
data from email, social
media, mobile devices, To a marketer, the Internet offers more than just new avenues of creativity. By its Multivariate Testing combinations of versions of the website to see which
and more. Refer back to very nature, the Internet allows you to track each click to your site and through test combination performs better.
the Data driven decision
making chapter for
your site. It takes the guesswork out of pinpointing the successful elements of a
campaign, and can show you very quickly what’s not working. It all comes down to Objective A desired outcome of a digital marketing campaign.
more on this.
knowing where to look, what to look for, and what to do with the information you find.
A piece of JavaScript code embedded on a web page and executed
Page tag
At the beginning of this book, you learned how important it is for a business to be data- by the browser.
driven and client-focused. You also learned about a few of the forms and sources
An interpretation of data captured, usually one metric divided by
of data. Each chapter mentioned some elements you should track to measure the Ratio
another.
success of a particular area of digital marketing. Now you’re going to learn more
specifics about data analytics and how to analyse the data you’ve gathered.
Referrer The URL that originally generated the request for the current page.

21.2 Key terms and concepts Segmentation


Filtering visitors into distinct groups based on characteristics to
analyse visits.

Term Definition Target A specific numeric benchmark.

Also known as a split test, it involves testing two versions of the


A/B test An individual visiting a website that is not a search engine spider or
same page or site to see which performs better. Visitor
a script.

Click path The journey a user takes through a website. Table 1

Completing an action that the website wants the user to take.


Usually a conversion results in revenue for the brand in some way. 21.3 Working with data
Conversion
Conversions include signing up to a newsletter or purchasing a In the days of traditional media, actionable data was a highly desired but scarce
product. commodity. While it was possible to broadly understand consumer responses to
marketing messages, it was often hard to pinpoint exactly what was happening
Conversion
A defined path that visitors should take to reach the final objective. and why.
funnel
As the Data driven decision making chapter showed, in the digital age, information NOTE
A small text file that is used to transfer information between
is absolutely everywhere. Every single action taken online is recorded, which means Read more about this
Cookie browsers and web servers. They help web servers to provide the
there is an incredible wealth of data available to marketers to help them understand in the Data driven
right content when it is requested. decision making
when, where, how and even why users react to their marketing campaigns.
chapter.
Count Raw figures captured for data analysis. Remember, this also means there is a responsibility on marketers to make data-
driven decisions. Assumptions and gut feel are not enough – you need to back these
Event A step a visitor takes in the conversion process. up with solid facts and clear results.

The defined action that visitors should perform on a website, or the Don’t worry if you’re not a ‘numbers’ person – working with data is very little about
Goal number crunching (the technology usually takes care of this for you) and a lot about
purpose of the website.

542 543
Data analytics › Working with data Data analytics › Tracking and collecting data

analysing, experimenting, testing and questioning. All you need is a curious mind
To determine a pattern, they had to explore 47 000 000 000 000 000 000 000
and an understanding of the key principles and tools.
combinations of factors, obviously, too many to evaluate without using
Here are some data concepts you should be aware of. machines. These combinations came from 35 touchpoints, including the
website, campaigns, and other marketing channels, and 37 analytics points,
21.3.1 Performance monitoring and trends including auto buyers and smartphone users.
Data analytics is all about monitoring user behaviour and marketing campaign The brand was able to spot relevant patterns, such as that consumers who
performance over time. The last part is crucial. There is little value in looking at bought a certain brand of car were more likely to download brochures, but not
a single point of data, you want to look at trends and changes over a set period to more likely to request test drives. This allowed them to segment the consumers
encourage a dynamic view of data. who bought cars into those who started the purchase process by downloading a
brochure, and those who started with a test drive.
For example, it is not that helpful to say that 10% of this month’s web traffic converted.
The first group was detail-oriented, so ads featuring specific models with links
Is that good or bad, high or low? But saying that 10% more users converted this
to the specifications page helped to drive conversions.
month than last month shows a positive change or trend. While it can be tempting to
NOTE focus on single ‘hero’ numbers and exciting-looking figures such as ‘Look, we have The second group wanted to know how driving the car felt, so they were targeted
5 000 Facebook fans!’, these really don’t give a full picture if they are not presented with ads that appealed to their senses and included a call to action about
Pay close attention
to any changes in the in context. In fact, we call these ‘Vanity metrics’ they look good, but they don’t tell scheduling a test drive. This helped to drive media efficiency and campaign
expected data, good or you much. performance.
bad, and investigate any
anomalies.
21.3.2 Big data 21.4 Tracking and collecting data
Big data’ is the term used to describe truly massive data sets, the ones that are so A key problem with tracking users on websites used to be that it was impossible
big and unwieldy that they require specialised software and massive computers to to track individual users - only individual browsers, or devices, since this is done
process. Companies like Google, Facebook and YouTube generate and collect so through cookies. So, if Joe visits the website from Chrome on his home computer
much data every day that they have entire warehouses full of hard drives to store it and Safari on his work laptop, the website will think he’s two different users. If
all. Susan visits the site from the home computer, also using Chrome, the website will
think she’s the same user as Joe, because the cookie set when Joe visited the site
Understanding how it works and how to think about data on this scale provides some
will still be there.
valuable lessons for all analysts.
• Measure trends, not absolute figures: The more data you have, the more
Email opens aren’t tracked with cookies. Instead, when the images in the email
meaningful it is to look at how things change over time.
load, a tiny 1×1 pixel also loads and tracks open rate. This means that if the user
• Focus on patterns: With enough data, patterns over time should become is blocking images, their activity will not be tracked.
apparent so consider looking at weekly, monthly or even seasonal flows.
To track if those who did open your email then visited your page, or eventually
• Investigate anomalies: If your expected pattern suddenly changes, try to find converted, links within the email include utm tags. UTM tags are codes in the
out why and use this information to inform your actions going forward. url that enable your analytics software to track where a user has come from.
In this link: https://www.redandyellow.co.za/5-ways-design-can-used-
empower-women/?utm_source=newsletter&utm_medium=email&utm_
21.3.3 Data mining campaign=AugNewsletter
Data mining is the process of finding patterns hidden in large numbers and The campaign tracking tag appended on the end of the URL is:
databases. Rather than having a human analyst process the information, an ?utm_source=newsletter&utm_medium=email&utm_campaign=AugNewsletter
automated computer program pulls apart the data and matches it to known patterns
to deliver insights. Often, this can reveal surprising and unexpected results, and An additional concern was the decline of cookies. Most modern browsers allow
tends to break assumptions. users to block them. With growing consumer privacy concerns, and laws like the
EU Privacy Directive, which requires all European websites to disclose their cookie
usage, cookies began to fall out of favour, making tracking more difficult.
Data mining in action
Google’s Universal Analytics changed all that. Because of Google’s dominance in the
Krux (2016) offers the example of examining an enormous dataset for an search engine market, we will focus on them for this section.
automotive brand that wanted to improve brochure downloads and increase
requests for test drives. The data they analysed related to consumers, consumer 21.4.1 Universal analytics
attributes, and marketing touchpoints.
Google’s universal analytics allows you to track visitors (that means real people)
rather than simply sessions. By creating a unique identifier for each customer,
544 545
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data

universal analytics means you can track the user’s full journey with the brand, How to set up Google Analytics NOTE
regardless of the device or browser they use. You can track Joe on his home First, you need a primary Google account, used for services such as Gmail or You will need to make
computer, work laptop, mobile phone during his lunch break, and even when he YouTube. You can use this to set up your Analytics account. This should be set up adjustments to your
swipes his loyalty card at the point of sale allowing you to combine offline and online using a Google account that will always be available for your business. Analytics account so
information about users. that you can get the
Next, go to www.google.com/analytics and follow the steps to sign up. You can set up most out of tracking
Crucially, however, tracking Joe across devices requires both universal analytics multiple accounts here if you want to track a website, an app, or multiple websites your users. You can
and authentication on the site across devices, in other words, Joe has to be logged and apps. learn a little more
in to your website or online tool on his desktop, work laptop and mobile phone in about that here: moz.
After the sign-up process, you will be issued a Google Analytics tracking ID. This will com/blog/absolute-
order to be tracked this way. If he doesn’t log in, we won’t know he’s the same beginners-guide-to-
person. Users who use Gmail are easy for Google to track because they’ll be logged be UA followed by a series of numbers. You need to add this code to the HTML file of google-analytics.
in across devices. your website, before the </head> tag, on each of your pages.

You can see: Now Google is tracking every visitor to your website! NOTE

Google Analytics is, obviously, not the only analytics package available. Other Try it now – go to a
• How visitors behave depending on the device they use (browsing for quick
packages exist for detailed tracking of social media accounts, emails, and website random website, such
ideas on their smartphone, but checking out through the eCommerce portal as www.redandyellow.
on their desktop). data. Website analysis should always account for any campaigns being run. For co.za, and right click
example, generating high traffic volumes by employing various digital marketing on it, then click ’View
• How visitor behaviour changes the longer they are a fan of the brand, do they
tactics such as SEO, PPC, and email marketing can be a pointless and costly exercise page source’ to view
come back more often, for longer, or less often but with a clearer purpose? the HTML code for
if visitors are leaving your site without achieving one or more of your website’s goals.
• How often they’re really interacting with your brand. the site. Do a search
Conversion optimisation aims to convert as many of a website’s visitors as possible for ’UA-‘ to view the
• What their lifetime value and engagement is. into active customers. tracking code for that
site. The tracking code
Another useful feature of universal analytics is that it allows you to import data from 21.4.2. Gathering data for the website above
is UA-43748615-1.
other sources into Google Analytics, for example, CRM information or data from a Google Analytics can measure almost anything about the customers that visit your
point-of-sale cash register. This gives a much broader view of the customers and website. To gather the kind of data that can help you optimise your site, you’ll need NOTE
lets you see a more direct link between your online efforts and real-world behaviour. to know a little about where to look. When you log into your analytics account, you Read more about this
How does universal analytics work? will see seven main menu items on the left. They are: in the Conversion
optimisation chapter.
Universal analytics has three versions of the tracking code that developers can Views
implement, helping them track users on: The Views button lets you switch between various pictures of the data.
1. Websites
2. Mobile apps
3. Other digital devices such as game consoles and even information kiosks.

It collects information from three sources to provide the information that you can
access from your Google Analytics account: Figure 1. The Views button.

1. The HTTP request of the user: This contains details about the browser Customization
and the computer making the request, including language, hostname, and The Customization tab lets you create dashboards that give you an overview of
NOTE
referrer. different data elements, custom reports, shortcuts, or custom alerts.
2. Browser and system information: This includes support for Java and Flash
What kind of privacy
concerns might a user
and screen resolution.
have about the data 3. First-party cookies: Analytics sets and read these cookies to obtain user
you’re collecting about session and ad campaign information.
them?

This information is sent to the Google Analytics servers as a list of parameters


attached to a one-pixel GIF image request.
You don’t need to know the technical details of how tracking works, but if you are
interested, you can read about Google Analytics tracking here: https://developers.
google.com/analytics/resources/concepts/gaConceptsTrackingOverview
Figure 2. The Customization tab.
546 547
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data

Real time
Real time allows you to monitor activity as it happens on your website. Data updates
continuously so that you can see how many users are on your site right now, where
they are from, the keywords and sites that referred them, which pages they are
viewing, and what conversions are happening.

Figure 5. Part of Google’s Acquisition overview tab.

Behaviour
This section shows how users interact with your content, how the content performs,
its searchability and its interactivity. You can see how fast your pages load, how
successful users are when searching the site, how any interactive elements on your
site are being used, popular content, which pages drive revenue, and more.

Figure 3. Part of Google’s Realtime overview tab.

Audience
The audience section helps you understand the characteristics of your audience,
including their demographics, interests, behaviour (level of engagement). The mix
of new and returning users and how their behaviour differs, and the browsers,
networks and mobile devices they are using to access your site.

Figure 6. Part of Google’s Behaviour overview tab.

Conversions
Conversions does exactly what it says on the box, it shows you how users are
converting on your site. You can look at:
• The Goals tab, which shows how well your site meets business objectives
• The eCommerce tab, which shows what your visitors buy and can link it to
other data to show what drives your revenue
• Multi-channel funnels, which shows how your channels work together to
generate sales and conversions (for example, if a customer sees a display
ad about your brand, visits your site to do research, and later does a search
Figure 4. Part of Google’s Audience overview tab. for a specific product before converting)
• Attribution, which shows you how traffic from various channels converts.
Acquisition
Acquisition lets you compare traffic from search, referrals, email, and marketing
campaigns. It shows you which sources drive the most traffic to your site.

548 549
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data

• Session: An interaction by an individual with a website consisting of one or


more page views within a specified period of time.
• Unique visitors: The number of individual users visiting the website one or
more times within a set period of time. Each individual is counted only once.
• New visitor: A unique visitor who visits the website for the first time ever in
the period of time being analysed. NOTE

• Returning visitor: A unique visitor who makes two or more visits (on the New visitors show
that you are reaching
same device and browser) within the time period being analysed. new audiences and
NOTE
markets, while
You can play around on returning visitors
Google’s Analytics site are an indicator of
using their free demo brand loyalty. Most
account, here: https:// websites should aim
analytics.google. Figure 7. Part of Google’s Goals overview in the Conversions tab. for a healthy balance
com/analytics/web/ between the two.
demoAccount.
21.4.3 The type of information captured
By now, you should know the difference between objectives, goals, KPIs, and targets.
KPIs are what you’ll be focusing on when you measure data that has been captured.
KPIs are the metrics that help you understand how well you are meeting your
objectives. A metric is a defined unit of measurement. Definitions can vary between
various web analytics vendors depending on their approach to gathering data, but
the standard definitions are provided here.
Web analytics metrics are divided into:
• Counts: These are the raw figures that will be used for analysis. Figure 8. A breakdown of new versus returning visitors in Google Analytics.
• Ratios: These are interpretations of the data that is counted.

Metrics can be applied to three different groupings:


• Aggregate: All traffic to the website for a defined period of time.
• Segmented: A subset of all traffic according to a specific filter, such as by
campaign (PPC) or visitor type (new visitor vs. returning visitor).
• Individual: The activity of a single visitor for a defined period of time.

Here are some of the key metrics you will need to get started on with website
analytics.

Building-block terms
These are the most basic web metrics. They tell you how much traffic your website
is receiving. For example, looking at returning visitors can tell you how well your
website creates loyalty; a website needs to grow the number of visitors who come Figure 9. A dashboard showing some important KPIs for an eCommerce page.
back. An exception may be a support website where repeat visitors could indicate
that the website has not been successful in solving the visitor’s problem. Each Visit characteristics
website needs to be analysed based on its purpose.
These are some of the metrics that tell you how visitors reach your website, and how
• Traffic: The number of users that visit a website they move through the website. The way that a visitor navigates a website is called a
• Page: Unit of content (so downloads and Flash files can be defined as click path. Looking at the referrers, both external and internal, allows you to gauge
pages). the click path that visitors take.
• Page views: The number of times a page was successfully requested.

550 551
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data

• Entry page: The first page of a visit. • Conversion metrics: These metrics give insight into whether you are
• Landing page: The page intended to identify the beginning of the user achieving your analytics goals (and through those, you overall website
experience resulting from a defined marketing effort. objectives).
• Exit page: The last page of a visit. • Event: A recorded action that has a specific time assigned to it by the
browser or the server.
• Visit duration: The length of time in a session.
• Conversion: A visitor completing a target action.
• Referrer: The URL that originally generated the request for the current
page.
• Internal referrer: A URL that is part of the same website.
• External referrer: A URL that is outside of the website.
• Search referrer: A URL that is generated by a search function.
• Visit referrer: A URL that originated from a particular visit.
• Original referrer: A URL that sent a new visitor to the website.
• Clickthrough: The number of times a link was clicked by a visitor.
• Clickthrough rate: The number of times a link was clicked divided by the
number of times it was seen (impressions).
• Page views per visit: The number of page views in a reporting period divided
Figure 11. Goal conversions in Google Analytics.
by the number of visits in that same period to get an average of how many
pages being viewed per visit.

Figure 10. Referrer information on the Acquisitions page of Google Analytics.


Figure 12. Funnel visualisation on Google Analytics.

Content characteristics
Mobile metrics
When a visitor views a page, they have two options: leave the website, or view another NOTE
NOTE
page on the website. These metrics tell you how visitors react to your content. When it comes to mobile data, there are no special, new or different metrics to use.
Why do you think
A high bounce rate is Bounce rate can be one of the most important metrics that you measure. There are However, you will probably be focusing your attention on some key aspects that are Google Analytics has a
not always bad. On a a few exceptions, but a high bounce rate usually means high dissatisfaction with a particularly relevant here namely, technologies and the user experience. separate category for
blog, for example, most tablets, rather than
users click through
web page. • Device category: Whether the visit came from a desktop, mobile or tablet including them under
from a search to read • Page exit ratio: Number of exits from a page divided by total number of page device mobile devices?
one article and, having
views of that page • Mobile device info: The specific brand and make of the mobile device
satisfied their curiosity,
leave without visiting • Single page visits: Visits that consist of one page, even if that page was • Mobile input selector: The main input method for the device (such as
any other pages. viewed a number of times touchscreen, click wheel, stylus)
• Bounces (or single page view visits): Visits consisting of a single page view • Operating system: The OS that the device uses to run, sch as iOS or Android.
• Bounce rate: Single page view visits divided by entry pages.
552 553
Data analytics › Tracking and collecting data Data analytics › Analysing data

France, or only visitors who arrived on the site by clicking on a display


advert. This lets you see if particular types of visitors behave differently.
• Behaviour and content metrics: Analysing data around user behaviours
NOTE
for example, time spent on site, number of pages viewed, can give a lot of
insight into how engaging and valuable your website is. Looking at content Analytics data cannot
give you a definitive
metrics will show you which pages are the most popular, which pages users answer to why users
leave from most often, and more providing excellent insight for your content behave a certain way.
marketing strategy, as well as discovering what your audience is really It does provide plenty
interested in. of clues about intent,
it’s up to you to put the
pieces together.
A crucial, often-overlooked part of this analysis is internal search. Internal search
Figure 13. Mobile device information on Google Analytics. refers to the searches of the website’s content that users perform on the website.
Now that you know what tracking is, you can use your objectives and KPIs to define While a great deal of time is spent analysing and optimising external search such
what metrics you’ll be tracking. You’ll then need to analyse these results, and take as using search engines to reach the website in question, analysing internal search
appropriate actions. Then the testing begins again! goes a long way to exposing weaknesses in site navigation, determining how
effectively a website is delivering solutions to visitors, and finding gaps in inventory
on which a website can capitalise.
NOTE
21.5 Analysing data
Google Analytics can Consider the keywords a user may use when searching for a hotel website, and
send you email alerts In order to test the success of your website, you need to remember the TAO of keywords they may use when on the website.
whenever something conversion optimisation.
unusual happens on Keywords to search for a hotel website may be:
your website. Simply set Track – Analyse – Optimise
a triggering condition • Cape Town hotel
and decide what sort A number is just a number until you can interpret it. Typically, it is not the raw
• Bed and breakfast Cape Town.
of report you want to figures that you will be looking at, but what they tell you about how your users are
receive. For example, interacting with your website. Because your web analytics package will never be
you may want to know able to provide you with 100% accurate results, you need to analyse trends and Once on the website, the user may use the site search function to find out more.
when more than a Keywords they may use include:
certain number of users
changes over time to truly understand your brand’s performance.
have accessed your site • Table Mountain
in one day as this could 21.5.1 Key elements to analyse • Pets
mean an opportunity
to capitalise on high Avinash Kaushik, author of Web Analytics: An Hour a Day, recommends a three- • Babysitting service.
traffic. pronged approach to web analytics:
1. Analysing behaviour data infers the intent of a website’s visitors. Why are Analytics tools can show what keywords users search for, what pages they visit after
users visiting the website? searching, and, of course, whether they search again or convert.
2. Analysing outcomes metrics shows how many visitors performed the goal
actions on a website. Are visitors completing the goals we want them to?
NOTE 3. A wide range of data tells us about the user experience. What are the
Read more about this patterns of user behaviour? How can we influence them so that we achieve
in the User experience our objectives?
design chapter.
Behaviour
Web users’ behaviour can indicate a lot about their intent. Looking at referral URLs
and search terms used to find the website can tell you a great deal about what
problems visitors are expecting your site to solve.
Some methods to gauge the intent of your visitors include:
• Click density analysis: Looking at a heatmap to see where users are clicking Figure 14. Site search information on Google Analytics.
on the site and if there are any noteworthy ’clumps’ of clicks such as many
users clicking on a page element that is not actually a button or link. Outcomes
• Segmentation: Selecting a smaller group of visitors to analyse, based on At the end of the day, you want users who visit your website to perform an action that
a shared characteristic for example, only new visitors, only visitors from increases your revenue. Analysing goals and KPIs indicates where there is room for
554 555
Data analytics › Analysing data Data analytics › Analysing data

improvement. Look at user intent to establish if your website meets users’ goals, who progress from one step to the next will go a long way to improving the overall
and if these match with the website goals. Look at user experience to determine how conversion rate of the site.
outcomes can be influenced.

10%
13,430 total visitors to the site Persuasion

of visitors

of visitors
100 80 20 10

of visitors
Conversion
problem

Figure 15. Reviewing conversion paths can give you insight into of visitors

improving your website.


After performing a search, 100 visitors land on the home page of a website. From
there, 80 visitors visit the first page towards the goal. This event has an 80%
conversion rate. 20 visitors take the next step. This event has a 25% conversion rate.
Ten visitors convert into paying customers. This event has a 50% conversion rate.
Figure 16. A conversion funnel.
The conversion rate of all visitors who performed the search is 10%, but breaking
this up into events lets us analyse and improve the conversion rate of each event.

Here are some examples of possible objectives, goals and KPIs for different
User experience websites.
To determine the factors that influence user experience, you must test and determine Hospitality eCommerce site, such as www.expedia.com
the patterns of user behaviour. Understanding why users behave in a certain way on
Objective: Increase bookings
your website will show you how that behaviour can be influenced to improve your
outcomes. This is covered in the next chapter on Conversion optimisation. Objective: Decrease marketing expenses

Goal: Make a reservation online


21.5.2. Funnel analysis
Funnel analysis is crucial to understanding where problems in a conversion process KPIs:
lie, and helps you to track whether your website is achieving its ultimate goal. The Conversion rate
process of achieving that goal can be broken down into several steps. These are Cost per visitor
called events or micro-conversions. Analysing each step in the process is called
Average order value.
funnel analysis or path analysis.
For example, on a hotel booking website, the ultimate goal is that visitors to the site News and content sites, such as www.news24.com
make a booking on the website with a credit card. Objective: Increase readership and level of interest
Each step in the process is an event that can be analysed as a conversion point. Objective: Increase time visitors spend on website
Event 1: Perform a search for available dates for hotels in the desired area. Goal: A minimum time on site
Event 2: Check prices and amenities for available hotels.
Event 3: Select a hotel and go to checkout. KPIs:

Event 4: Enter personal and payment details and confirm booking (conversion). Length of visit
Average time spent on website
One naturally expects fewer users at each step. Increasing the number of visitors Percentage of returning visitors.

556 557
Data analytics › Analysing data Data analytics › Analysing data

KPIs help you to look at the factors you can influence in the conversion process. For Connection speed, operating system, browser
example, if your objective is to increase revenue, you could look at ways of increasing
your conversion rate, that is, the number of visitors who purchase something on Consider the effects of technology on the behaviour of your users. A high bounce rate
your site. One way of increasing your conversion rate could be to offer a discount. So, for low-bandwidth users, for example, could indicate that your site is taking too long
you would have more sales, but probably a lower average order value. Or, you could to load. Visitors who use open source technology may expect different things from
look at ways of increasing the average order value, so that the conversion rate would your website to other visitors. Different browsers may show your website differently,
stay the same, but you would increase the revenue from each conversion. how does this affect these visitors?

Once you have established your objectives, goals and KPIs, you need to track the
Geographical location
data that will help you to analyse how you are performing, and will indicate how you
can optimise your website or campaign. Do users from different countries, provinces or towns behave differently on your
website? How can you optimise the experience for these different groups?
21.5.3 Segmentation
Every visitor to a website is different, but there are some ways in which we can First-time visitors
characterise groups of users, and analyse metrics for each group. This is called How is the click path of a first-time visitor different from that of a returning visitor?
segmentation. What parts of the website are most important to first-time visitors?

21.5.4 In-page heat maps


Software such as Crazy Egg (www.crazyegg.com) can show you exactly where users
click on a web page, regardless of whether they are clicking on links or not.

Figure 18. Heat map options offered by Crazy Egg.


It produces information that helps you to know which areas of a website are
clickable, but which attract few or no clicks, and which areas are not clickable but
have users attempting to click there. This can show you what visual clues on your
web page influence where your visitors click, and this can be used to optimise the
Figure 17. Default segments in Google Analytics. click path of your visitors.

Some segments include: There are many factors that could be preventing your visitors from achieving specific
end goals. From the tone of the copy to the colour of the page, anything on your
website may affect conversions. Possible factors are often so glaringly obvious that
Referral source one tends to miss them, or so small that they are dismissed as trivial. Changing one
Users who arrive at your site via search engines, those who type in the URL directly, factor may result in other unforeseen consequences and it is vital to ensure that you
and those who come from a link in an online news article are all likely to behave don’t jump to the wrong conclusions.
differently. As well as conversion rates, click path and exit pages are important Hotjar (www.hotjar.com) another popular analytics tool, demonstrates how
metrics to consider. Consider the page that these visitors enter your website from, heatmaps can help you improve your web page. You can find more information here:
can anything be done to improve their experience? https://www.hotjar.com/heatmaps

Landing pages
21.6 Data Visualisation
Users who enter your website through different pages can behave very differently.
What can you do to affect the page on which they are landing, or what elements of In the Data-driven decision making chapter, we discussed the importance of
the landing page can be changed to positively influence outcomes? reporting on data and making sure that the information gets to the right users, in the
right way. Not everyone is adept at understanding a detailed financial breakdown,

558 559
Data analytics › Analysing data Data analytics › Analysing data

and analytics reports often intimidate people, so how can a data-focused marketer
present information in a way that’s accessible to everyone?
The answer lies in data visualisation, which involves placing data in a visual context
to help users understand it. Data visualisation software can help demonstrate
patterns and trends that might be easily missed in purely text-based data reporting.
It can refer to something as simple as an infographic, or something as complex as a
multi-point interactive program that lets users decide what to compare.

Figure 19. Traditional graphs and charts to represent data.

Figure 21. Clever use of the layout of a clock and plotting points for
representing what Americans spend their time doing each day.

Figure 22. Word clouds are becoming popular ways to visualise data,
where the size of the word represents its importance or frequency.

Many data visualisation online are also interactive. Visit this link to see an
interactive data visualisation about voting habits of Americans: https://
www.nytimes.com/interactive/2016/06/10/upshot/voting-habits-turnout-
partisanship.html

For a good lesson in data visualisation, including how to start using it, check out
this article from SAS: Data Visualization: What it is and why it matters - https://
www.sas.com/en_za/insights/big-data/data-visualization.html.
Figure 20. Representing data in different ways.

560 561
Data analytics › Analysing data Data analytics › Case study

NOTE It can be challenging to decide on what data you want to visualise and the information
you want to communicate, but as long as you know how your audience is likely to 21.9 Case study: eFinancialCareers
For some tips on how
to begin with data
process visual information and what they need to know, you should be able to choose
something that conveys the necessary information simply. 21.9.1 One-line summary
visualisation, take a
quick look at some eFinancialCareers, the world’s leading financial services careers website, used Google Analytics
tools and some more
resources on the topic. 21.7 Tools of the trade 360 and DoubleClick Manager to improve its programmatic display remarketing.
The Guardian actually
has a remarkably The first thing you need is a web analytics tool for gathering data. Some are free and 21.9.2 The challenge
useful article: https:// some need to be paid for. You will need to determine which package best serves your
www.theguardian.com/ needs. Bear in mind that if you switch vendors, you may lose historical data. eFinancialCareers uses dynamic remarketing ads to drive leads to its site, where their goal – the
global-development- major conversion they hope for – is for the user to fill out a job application. The company wanted
professionals- Below are some leading providers: to boost the number of conversions coming to them from programmatic ads.
network/2014/
aug/28/interactive- • Google Analytics – www.google.com/analytics
infographics-
• AWStats – awstats.sourceforge.io
development-data
• Webalizer – www.webalizer.org
• Hotjar – www.hotjar.com
• GoSquared – www.gosquared.com
• Kissmetrics – www.kissmetrics.com
• Clicky – clicky.com

When it comes to running split tests, if you don’t have the technical capacity to run
these in-house, there are some third-party services that can host them for you.
Google Optimize, which you would have learnt about in the Conversion optimisation
chapter is Google’s platform for running tests and assessing your website’s
performance.
A test the significance of basic split tests, a split-test calculator is available at:
www.usereffect.com/split-test-calculator. When you use cookie-based tracking,
you need to add code tags to your web pages and these need to be maintained,
updated and changed occasionally. Google Tag Manager (www.google.com/
tagmanager) makes it easy to add and work with these tags without requiring any
coding knowledge. Other professional tag management tools include TagMan (www.
tagman.com), Ensighten (www.ensighten.com) and Tealium (www.tealium.com).

21.8 Advantages and challenges


Figure 23. The eFinancialCareers website.
Tracking, analysing and optimising is vital to the success of all marketing efforts.
Digital marketing allows easy and fast tracking, and the ability to optimise frequently.
21.9.3 The solution
When you use real data to make decisions, you’re likely to make the best choices for
your business and website. With help from Google 360 experts Periscopix, they decided to gather and analyse insights about
site visitors, including behavioural, demographic, and geographic information.
However, it can be easy to become fixated on figures and metrics, instead of looking
at broader trends and using them to optimise campaigns. Generally, macro or global They set up event tracking for a number of variables, including:
metrics should be looked at before analysing the micro elements of a website.
• Country
Testing variables is vital to success. Results always need to be statistically analysed, • Job sector
and marketers should let these numbers make the decisions. Never assume the
• City
outcome, wait for the numbers to inform you. The next chapter covers this in much
more detail. • Company
• Job ID number.

562 563
Data analytics › Case study Data analytics › Further reading

After collecting data about their website users for six weeks, they segmented them into: 21.11 Summary
• Passive users, who have visited the website and potentially registered for job updates, The ability to track user behaviour on the Internet allows you to analyse almost every level of a
but haven’t viewed or applied for any jobs. digital campaign, which should lead to improved results over time. The foundation of successful
web analytics is to determine campaign and business objectives upfront and to use these to
• Active users, who have viewed and applied for jobs.
choose goals and KPIs grounded in solid targets.

Once they had this data about each segment, they could tailor programmatic remarketing ads Web analytic packages come in two flavours – server-based and cookie-based tracking – although
to send the right message to individual users. Messages could encourage passive users to apply some packages combine both methods.
for vacancies, while active users could be sent tailored ads based on information such as the job Data can be analysed to discover how users behave, whether outcomes have been achieved,
sector in which they had displayed interest. and how appealing the user experience is. Testing to optimise user experience can demonstrate
They created almost 300 different remarketing lists, adding layers of detail to each identified ways in which to influence user behaviour so that more successful outcomes can be achieved.
segment. segmenting the audience allows specific groups of users to be analysed.

Because Analytics 360 and DoubleClick can be integrated, updated remarketing lists can be
automatically passed to DoubleClick to ensure relevant targeting for programmatic ads. 21.12 Case study questions
21.9.4 The results 1. Why did eFinancialCareers create so many remarketing lists?
Because the new system allowed remarketing to reach users within the ideal conversion 2. Describe what analytics data was gathered to create these lists. Why did they choose to
period, and with relevant messaging that was updated based on the user’s site activity, the ads focus on this data?
performed considerably better. eFinancialCareers saw:
3. How did the integration of various digital elements improve this brand’s remarketing
• A 21% increase in site traffic from real-time bidding campaigns efforts?
• A 423% increase in conversion rates for job applications coming from remarketing
efforts.
21.13 Chapter questions
(Google, 2016)
1. Why is it so important to use data to inform business decisions?
21.10 The bigger picture
2. What would you learn from a single-page heat map?
Tracking, analysing and optimising are fundamental to any digital marketing activity, and it is
3. What is the difference between a goal and a KPI?
possible to track almost every detail of any online campaign.
Most analytics packages can be used across all digital marketing activities, allowing for an
integrated approach to determining the success of campaigns. While it is important to analyse 21.14 Further reading
each campaign on its own merits, the Internet allows for a holistic approach to these activities. www.kaushik.net/avinash – Avinash Kaushik is an analytics evangelist, and his regular insight on his blog,
The savvy marketer will be able to see how campaigns affect and enhance each other. Occam’s Razor, is essential reading for any digital marketer.
The data gathered and analysed can provide insights into the following fields, among others:
Web Analytics 2.0 by Avinash Kaushik – if you are looking to get started in web analytics, you can’t go
• SEO: What keywords are users using to search for your site, and how do they behave wrong with this book by the web analytics legend.
once they find it?
www.analyticspros.com/blog – Analytics Pros has a blog with great advice and thoughts about analytics.
• Email: When is the best time to send an email newsletter? Are users clicking on the
links in the newsletter and converting on your website? blogs.adobe.com/digitalmarketing – Adobe has a good blog with a lot of analytics information as well.
• Paid media: How successful are your paid advertising campaigns? How does paid traffic
contentmarketinginstitute.com/?s=analytics – Believe it or not, the Content Marketing Institute has some
compare to organic search traffic?
great analytics tips.
• Social media: Is social media driving traffic to the website? How do fans of the brand
behave compared to those who do not engage socially? support.google.com/analytics#topic=3544906 – Google Analytics Help Center is an excellent starting point
• Mobile: How much of your traffic comes from mobile devices? Is it worth optimising your for anyone who wants to get to grips with this free, excellent web analytics service.
site for these? (It usually is!)

564 565
Data analytics › Figure acknowledgements

21.15 References
Google, 2016. Google Analytics 360 and DoubleClick Bid Manager boost conversions by 423%. [Online]
Available at: services.google.com/fh/files/misc/ga360_efinancialcareers_case_study_v3.pdf
[Accessed 6 November 2017]

Krux, 2016. Data Mining Is Becoming The New Market Research. [Online]
Available at: www.krux.com/blog/general/data-mining-is-becoming-the-new-market-research
[Accessed 23 February 2017] - Link no longer active

21.16 Figure acknowledgments


Figure 1 - 14. Screenshot, Google, 2017.

Figure 15. Stokes, 2013.

Figure 16. Adapted from Stokes, 2010. https://catalog.flatworldknowledge.com/bookhub/


reader/13259?e=fwk-105454-ch14_s02

Figure 17. Screenshot, Google, 2017.

Figure 18. Designmodo, 2014. https://designmodo.com/improve-website-crazy-egg

Figure 19. Public Tableau, 2017. https://public.tableau.com/en-us/s/gallery

Figure 20. Public Tableau, 2017. https://public.tableau.com/en-us/s/gallery

Figure 21. Scribble Live, 2015. http://www.scribblelive.com/blog/2015/12/28/9-best-data-visualization-


examples-2015

Figure 22. Peekaboo, 2012. http://peekaboo-vision.blogspot.co.za/2012/11/a-wordcloud-in-python.html

Figure 23. Screenshot, eFinancial careers, 2017.

566
Taking the
measure of
your customer
experience
Ending the tyranny
of the session
In the mid-1990s, website server log file analysts (creating a practice
later to be known as web analytics) created the concept of the
session. The “session” was, philosophically, a measure reflecting
the singular amount of attention that a visitor gave to a website, by
landing on and loading pages, navigating links and reading content.
Out of necessity, these analysts needed to define a time span around
which to base good measurements and arbitrarily chose 30 minutes,
and the session was born. For measurement purposes, the session
might not last 30 full minutes or the session might last much longer,
but if a visitor was inactive for 30 minutes, the session was over. This
framing has dominated the field of web — now digital — analytics ever
since, with Adobe, Google, Webtrends, Coremetrics and other tools
currently defining a session (or “visit”) as the continuous, sequential
behavior of a visitor on an online property until there has been no
activity for 30 minutes.1

From a data scientist’s perspective, such a definition was necessary:


the raw server-log data reflected a barrage of continuous clickstream
data of a volume not seen previously and which has only grown.
While the technologies used to store this data have evolved to
The session concept has include big data and Hadoop-based platforms, the business container

become so ingrained in the


has not, limiting the insights we can gain from the clickstream. The
business container in use today is still the same one created over

way that we think about online two decades ago — a timestamp is combined with a cookie or IP
address combination, and sessions are cut off with 30 minutes of
behavior that no one questions gap in the clickstream. Most standard digital analytics behavior is
measured within this session framework: “conversion rate” is actually
the concept or why it was “conversions per session”; “landing pages” are the first page-

created in the first place. load within a session; “campaign response” is the campaign that
generated a session; “referrers” are the websites immediately before
that session, and an “exit page” is the last thing a visitor did in that
session before that 30-minute window expired. The session concept
has become so ingrained in the way that we think about online
behavior that no one questions the concept or why it was created in
the first place.

It is time to end this tyranny. As the word “experience” has come


to dominate the digital world, mature analytics teams have realized
that the true measure of engagement is not what happens during
an arbitrary length of time, but what happens with a visitor during
the full digital experience, and whether the brand is able to fulfill
the expectations of that visitor across their entire journey. This
whitepaper presents this framework, introduces ways that it should
be measured, discusses the concepts of “behavioral signatures” and
“milestones,” and, in the end, calls on analysts and vendors to provide
the flexibility needed to break apart that 30-minute defining barrier.

Sic semper tyrannis! —


Marcus Junius Brutus, 44 B.C.E.

1
Over the years, slight nuances have been made to account for outlier traffic, with
Adobe, for example, also cutting a visit off if there has been continuous activity for
24 hours.

Taking the measure of your customer experience | 1


| 1. Current pervasiveness of the session

In common digital reporting, it is difficult to escape the pervasiveness of the session, whether analysts are using the common web analytics
tools (Adobe, Google, Webtrends, IBM Coremetrics) or are looking at data through clickstream feeds. Below is a sample of such metrics and the
reasons why they have been questioned in recent years:

Session-based metrics

Metric Definition and calculation Issues


Sessions (or Time-stamped hits from the • 3
0 minutes is inherently arbitrary; visitors may need additional time (to seek additional
“visits”) same device/browser to the information, compare prices or generally make up their minds). This often results in “re-
digital data collection server entries” — new sessions that are qualitatively continuations but are recorded with their
are grouped sequentially. own starts, visits numbers and entry pages.
When there is no activity for
30 minutes, the session is
recorded as finished.
Bounce rate Sessions of a single page/ • Prone to inaccuracy for technical reasons (rich media or video on the landing page; drives
sessions to offsite)
• Does not account for re-entries or interim exits
Conversion The number of “Conversions” ssumes that it usually takes a single session to convert
• A
rate (purchases, signups, form oes not account for multi-device conversion behavior
• D
completions)/sessions any visitors do not come to the website to convert at all (careers, account management,
• M
contact us)
Campaign Campaign impressions or ffect of SEM can be inflated (navigational assist)
• E
response sends/sessions referred from ampaigns often drive to app opens instead of site session starts
• C
the campaign ocial campaigns could lead to a “share,” not a “session”
• S
mail or display content could be absorbed without generating a click to the website
• E
ignificantly weights marketing attribution to “last-touch”
• S
Exit pages The last page loaded in a ften used to identify unengaging site content, these reports do not account for session
• O
session re-entries, app opens, or drives offsite
n exit page could itself be a sign of success — absorption of content (e.g., video) or a
• A
thank you page.
Landing pages The first page loaded in a • S
ession re-entries, browser redirects or app drives can misidentify a landing page for
or entry pages session continuous activity
onversely, multi-sourced sessions (sessions where a visitor goes offsite for a brief time,
• C
only to return within 30 minutes) hide new landing pages from reporting
ampaign-specific landing pages have become increasingly dynamic, personalized, and
• C
the focus of A/B testing
Referrers The domain visited before a • D ifficult to reconcile with campaign response data
session starts • M ulti-sourced sessions tend to either inflate (Adobe) or hide (Google) referring instances
• I nternal microsites or other owned properties often appear in referrer reports
Time on site The time stamp-based total oes not include the time spent on the exit page of a site, and is prone to understatement
• D
time recorded for a session lternately, this may not account for videos or apps that are left open but continue to
• A
generate server calls, leading to overstatement

The above table points out issues with individual session-based metrics. To mitigate these issues usually requires very careful reporting,
additional rounds of technical implementation or tool configuration, spreadsheet-based heuristics or data manipulation or an understood
“grain of salt” on the part of metric stakeholders. But the underlying assumption behind these metrics is also flawed: the session, as originally
and currently defined, no longer represents the actionable unit at which a customer’s or prospect’s digital attention to the brand singularly
operates.

2
| 2. Drivers of changes

The growing obsolescence of the session as a unit of measurement • Social ads delivered based on display ad impressions
can be attributable broadly to several factors: the growth in the
• Audience data collected from off-platform programming or native
multi-device habit among visitors, the expansion of the digital
advertising (e.g., YouTube, Snap, Forbes BrandVoice)
experience beyond owned properties and the greater availability of
experience data. In addition to making onsite behavioral measurement problematic,
these trends are blurring the line between website optimization and
Growth in multi-device engagement
marketing. Outbound marketing campaigns are now informed by
According to ComScore, 66% of the digital population is website activity and vice versa. The segments being targeted for
multiplatform.2 Nielsen reports an average of four digital devices marketing and communications are the same segments that are
owned by a typical American household, and Google reports that activating onsite personalization.
only 10% of consumers surf the internet from only one device.3 The
Digitization of the real world
same machine — particularly laptops and desktops — could have
multiple browsers like Internet Explorer, Mozilla Firefox, Apple Safari Online interactions have, from the beginning, been highly measurable
and Google Chrome installed and used concurrently, and all major and analyzable, and now offline interactions are becoming digitized
browsers allow users to browse on multiple tabs. Finally, ComScore and quantified to a degree that enables the full customer journey to
reported that more than half of all internet behavior is done through be measured and optimized. Commonly referred to as a “360-View,”
apps, not browsers.4 The practical implication for these trends is joinable data is becoming available, underscoring the smallness of the
that multi-session behavior by the same individual is the norm, not individual website session. Such data includes:
the exception, and that even though cookie-based measurement
• Call center data, with voice transcripts often digitized and classified
challenges have increased, the need to measure behavior across
browsers and devices has become more acute. • In-store analytics: the ability to track shoppers’ behavior in and
around a physical store through mobile and video technology,
Expansion of digital engagement beyond owned assets
digital kiosks and POS data collection
Furthermore, digital customer engagement with the brand no longer
• Internet-of-Things, including onboard devices (OBD), usage meters
takes place mostly on owned websites or apps, but also on social
or supply chain data
media, search engines, interactive emails and the wider display
and video advertising universe. What this means is that a visitor’s • Customer satisfaction, or net promoter score (NPS) data collected
engagement with a brand, their true “macro-session,” could really through surveys on either a transactional or overall customer
start on an email and end on Facebook. Because of the broad scale relationship basis
at which certain tags are deployed, cookie aggregators like Oracle
Measuring digital performance across this proliferation of devices,
BlueKai or Google DoubleClick can piece together these visitor macro-
experiences and platforms presents a very different set of challenges
sessions and use this data for audience segmentation and targeting.
than the ones addressed 20+ years ago when a website was simply
The industry is increasingly moving in this direction and looking for
trying to measure basic traffic and impact across a relatively small
additional solutions to solve ecosystem measurement and audience
number of web pages with limited multimedia, ecommerce or social
profiling challenges. Some practical analytics examples of how
networking capabilities.
these unowned properties are beginning to be incorporated into the
customer journey include the following:
• Segment targeting of display ads leading to personalization
of website content through distinct landing pages and onsite
experiences In addition to making onsite
• Email retargeting campaigns based on website interactions (owned behavioral measurement
websites and external)
problematic, these trends are
2

3
Comscore Inc. Investor Luncheon – Final. Fair Disclosure Wire, Oct 27, 2016
"Mobile Marketing Statistics Compilation," Smart Insights website, http://www.
blurring the line between website
smartinsights.com/mobile-marketing/mobile-marketing-analytics/mobile-marketing-
statistics. Accessed 20 November 2017. optimization and marketing.
4
"The 2016 U.S. Mobile App Report," comScore website, https://www.comscore.
com/Insights/Presentations-and-Whitepapers/2016/The-2016US-Mobile-App-Report.
Accessed 20 November 2017.

Taking the measure of your customer experience | 3


| 3. Breaking through the single-session barrier

Earlier in this decade, the vision of creating an end-to-end picture • O


 n email retargeting (using joins between email address and
of a single customer journey, including advertising impressions, customer ID, further joined to cookie IDs if a customer clicks
email communications, social media activities, searches, call center through)
calls and in-store experiences was unachievable. This was due to the
• I n-store (joining device cookies with customer IDs and emails at the
data collection being fragmented by cookie silos; data ownership
point of sale or at mobile coupon redemptions)
and privacy restrictions; and technology limitations in consuming,
storing and aggregating all this data. As a result, a single customer • O
 ver time (Anonymous data is retained at the cookie level until
was often duplicated across many platforms and even within a single authentication occurs, at which point data is rewritten with the
measurement platform. consolidated customer ID)
Today, three main methodologies have tackled the deduplication Device fingerprinting
problem that, in parallel with better data management platforms
Device fingerprinting is actually older than cookie-based tracking,
(DMPs) and analytics middleware, have opened up visibility into the
dating to the days of server-log file web analytics in the 1990s.
complete customer life cycle. These three methodologies deduplicate
It uses a combination of IP address, browser configuration, user
and identify visitors across multiple devices and over time, without
agent, Wi-Fi information — pretty much anything available in the
relying on visitor cookies. They are:
document object model (DOM) or within the server “handshake” — to
User authentication uniquely identify a single device. The principal advantage to device
fingerprinting is that it does not rely on cookies, so private browsing,
The creation and consolidation of a single customer key, tied to a
cookie-deletion or the rejection of third-party cookies do not
customer login that provides a bridge between devices, browsers
prohibit continuous behavior tracking. Device fingerprinting is used
and in-store experiences. This key can be an alphanumeric ID; a login
as a more accurate replacement for cookie-based IDs, particularly
username; an email address; or more specific PII, such as a SS#,
for organizations without user authentications, email capture or
name, address or phone number, and is usually user-generated or
logins. This method is becoming more important and sophisticated
read from a cookie.
as cookie-blocking software becomes more widespread; Apple’s
Industry example (ecommerce and retail stores): What used to be recent announcement of additional restrictions on first-party cookie
up to a dozen different customer IDs, GUIDs and logins have been functionality is an example of where the cookie industry is heading.5
consolidated into a single customer identifier that is collected at
multiple points in the experience:
• O
 n owned websites (shopping cart, shipping address, e-receipts)
5
“Apple to ad industry: Tough luck, privacy comes first,” CNet. https://www.cnet.com/news/
• O
 n apps (joining device IDs with authentication data) apple-rejects-ad-industry-complaint-over-safari-privacy. Accessed 15 September 2015.

4
Industry example (media and entertainment): A video-on-demand As soon as a visitor enters any website included in any of these
service knows that members of a single household use the same login networks (and a single website can belong within more than one),
to access their services. In order to personalize the recommendations they can be deduplicated down to a single ID, and all their sessions
available to the user, device fingerprinting is used in conjunction with on multiple websites can be joined. These data environments are
look-alike segmentation to identify which devices should be targeted often referred to as “walled gardens” or “closed platforms” because
with “kids,” “family,” “drama,” or “action” recommendations, even this deduplicated ID is only natively available to the parent network/
though only the aggregate mix of content is associated with a single DMP, within which visitor-level segmentation can occur for website
authentication ID. and marketing personalization purposes. A company wishing to
take advantage of this rich segmentation and targeting data-pool
Walled gardens
must subscribe to the DMP provider, be included in the network, or
Cookie-aggregators emerged during the last decade as part of otherwise pay for these marketing services or visitor-level data.
marketing targeting and audience aggregation, working with online
Industry example (media/entertainment): A studio selling movie
properties to deploy a simple third-party JavaScript tag on their
tickets wants to target individuals who have watched movie trailers
websites in return for better data visibility, competitive reporting
on YouTube. But since YouTube is part of the Google “walled garden,”
and advertising targeting. Today, they have expanded to include a
that organization must subscribe to Google Analytics 360 Suite in
wide range of analytics, personalization and data-joining capabilities
order to reach these individuals, or pay for a back-end Google data-
and, as a consequence, are often branded as Data Management
integration with their current DMP. At no point does Google provide
Platforms or “DMPs.” Many of these have been acquired by traditional
individual-level data to the organization in such a way that it can be
database technology companies. About a dozen major vendors
analyzed internally.
account for the major market-share in this space and include Adobe
Audience Manager (originally Demdex), Oracle® DMP (originally All three methodologies above complement each other and can
BlueKai), Salesforce DMP (originally Krux), MediaMath, Neustar®, be used together, making it possible today to map the complete
Nielsen, Acxiom® and Google (through DoubleClick, Google Analytics, customer journey to a degree of accuracy unavailable in the past.
AdWords and YouTube).6 To this list should be added the social media In doing so, organizations are now able to break through the single-
platforms (Facebook, Twitter, Snapchat, LinkedIn) that have scaled session focus of traditional web analytics and optimize the customer
user data-capture not only on their own sites but very widely on the experience holistically.
internet; online retail behemoths (principally Amazon and Walmart);
and media conglomerates (e.g., Comcast/NBCU, Disney/ABC/ESPN,
Time Warner, Turner, Hearst and Cox).

6
Forrester Wave ™ “Data Management Platforms, Q2 2017” (June 1, 2017) lists eleven
platforms, but did not evaluate Google, which released its Audience Center as part of its
Analytics 360 Suite as a beta in 2016.

Taking the measure of your customer experience | 5


| 4. A customer-journey framework for digital analytics

As the digital world moves inexorably towards omnichannel Behavioral signatures are sets of distinct actions that define a
experiences, session-based metrics must be supplanted by measures customer’s life cycle stage and their intent or reason for engaging in a
of the efficiency that brands move visitors and customers along their particular digital experience. An organization must understand how to
experience. This will deliver a seamless cross-session and cross-device measure what these customer intents are and quantify their ability to
framework for digital analytics reporting, analysis and optimization. successfully meet the expectation of the customer vs. that intent. The
true measure of engagement then is not necessarily based on what
To achieve this measurement, digital analytics teams should adopt the
happens in a session, but on what happens relative to one or many
following road map:
behavioral signatures.
Customer journey mapping
In digital analytics, the behavioral signatures are imprints within the
Customer journey maps exist as routine work products among clickstream data collected from web behavior. Below is an example of
CX (Customer Experience) teams in most organizations. But what a typical journey map on an acquisition website:
many analytics teams have failed to do is translate these diagrams,
descriptions and PDFs into quantitatively measurable metrics. For the
digital analyst, this requires the creation of behavioral signatures and
journey micro-conversion metrics.

Figure 1: Example digital customer journey for an eCommerce website

View and leave


the process

► Checkout Buy
Spurious ► Form submissions
► Marketing
content
► Social media
chatter
Land
► Homepage Learn ► Calculators Shopping cart
► Landing pages ►
or tools
Product
► Forms Decide

► Re-assurers
detail pages

► Design your own


► Online chat
Customize

Time (minutes or weeks)

6
“Spurious” visitors are “one-and-done” — they come to the website, tagging. The customer’s existing life cycle stage is set within a cookie
don’t do anything indicating intent and never return. If they come and modified according to each action taken. Note that this process
to the site intentionally, they are considered as “Land.” From there, can take place in the absence of any personal identifiers outside of the
the journey may take them through a “Learn” or “Customize” phase, standard, anonymous cookie-based tracking, but if authentication is
finally and presumably “Decide” and “Buy.” Behavioral signatures present, then customer data can be made available in real time and
encompass those actions that define a visitor in each phase and can pushed into the web analytics solution.
be defined in real time through JavaScript logic within web analytics

Figure 2: Example customer journey for authenticated customer experience

► Entry back to
Expand
prospect journeys

► Self-service tools
Maintain ► Order summaries
► Auto pay
► Bill pay
Alerts and
► At-a-glance

notifications Develop
► App downloads
► Online
registration
► Order or
► Onboarding
service issues
Familiarize content
Help, forums
Acute

► Email
and chat
engagement
► Support
monitoring
needs
► Call center

Time (months)

Non-digital data related to customer experience (e.g., call center As a final example, content-oriented websites (for example, publishing
interactions, transactional history or voice-of-customer), when or video-on-demand) have an engagement journey lifespan that
combined with the behavioral signature, can provide additional is very long (similar to existing customer, self-service journeys),
refinement and color to the above website journey map, signaling but often lack customer data-enrichment opportunities created by
churn, identifying acute issues and encouraging upsell. account ID’s. Such a journey might look like this:

Figure 3: Example visor journey for a content-oriented, non-conversion website

View and leave


the process

► Content consumption
Spurious ► Video consumption
► Advertising Engage Dedicate
► Bookmark
► Consistent visitation
Familiarize ► Navigation

► Chat
Homepage
► Social Share
Land Post


► SEO or social
lands

Time (months)

Taking the measure of your customer experience | 7


In cases such as these, web analytics-based journey measurement The design and implementation of behavioral signatures sets the
can be used, but data quality tends to deteriorate over time. Instead, stage for micro-conversion metrics whose purpose is to visualize the
behavioral signatures are best defined though a clickstream data-lake success of the digital experience in moving visitors and customers
or DMP that is integrated with external segmentation or targeting through their journey. These can be straightforward percentages as
feeds in order to assist with visitor deduplication. Such a data-lake can follows:
also be used for onsite personalization, advertising placement and
marketing segmentation.

Figure 4: Journey-based micro-conversion metrics: acquisition or eCommerce website

Land Learn Decide Buy/lead

Spurious Learn Decide Buy


% % % %
All traffic Lead Learn Land

Decide Buy
% %
Land Learn Customers
and leads
Buy
%
Decide

or
Figure 5: Journey-based micro-conversion metrics: My Account or Self-Service website

Develop Maintain
Familiarize Acute needs
Familiarize accounts Contact rate post digital interaction
%
All new accounts Issue resolution rate

Familiarize Acute needs Develop Use products


Develop accounts Frequency of product usage
%
Familiarize accounts User accounts
%
Familiarize accounts

Use product
Maintain
Maintain accounts
%
Familiarize accounts

Add and
retain

These micro-conversion metrics are only the framework for understanding what led a customer through an experience conversion
experience optimization: their tactical and operational use lies in (or conversely, why they failed to do so).

8
Milestones and KPIs For example, suppose I have an ecommerce website with a journey as
described in Figure X above. My behavioral signature for the “Learn”
So far we have avoided the term “KPI,” because a metric by itself —
phase might consist of the following elements:
while useful tactically — isn’t a key performance indicator. What makes
it “key” is whether a particular behavior can be analytically shown • A customer is in “Learn” phase in any of the following situations:
to improve customer satisfaction, increase sales or otherwise get
• They click a homepage banner
customers or visitors to do what they want they want to do online,
significantly, statistically and demonstrably. Actions and metrics must • They use internal search directly
be analyzed to identify those that stand out as significantly driving
• They click through from an offer email
customers along their life cycle journey; these actions and metrics
then become milestones and KPIs. • They browse by category using filters

• They use a “Compare Products” tool

• Pulling micro-conversion metrics for each use case might show the
following results:

Illustrative micro-conversion metrics tied to potential milestone behaviors


Use case “Learn” to “Decide” conversion %
Homepage banner 14%
Internal search 13%
Offer email click-through 15%
Browse by category 10%
Compare products tool 35%

Statistical methods — either built into the web analytics tool or done The above example also illustrates that the customer journey mapping
in-house — can reveal whether these percentages are significantly and behavioral signatures are best deployed in an iterative manner,
different (simple T-tests to more sophisticated GLM, decision tree or much like customer behavioral segmentation. As improvements are
binary logistic models to control for covariate effects). In this case, the made to online assets, or marketing channels become optimized,
compare products tool can be considered a “milestone” for customers customer journeys and behavioral segments can change. Cluster
in the “Learn” phase, and use of that tool could become a KPI. or principal components analysis could show that the “compare
Experience improvements can, thus, be focused on getting potential products” behavior displays a degree of independence from other
customers to that section of the experience. website use cases and could constitute its own step in the customer
journey.

Customer
journey
mapping
Milestones
and KPIs
Customer journey refinement process
Behavioral
signatures
Data
analysis

Taking the measure of your customer experience | 9


Milestone-based data architecture

Once milestones have been identified within the digital customer life and data can be reorganized around milestones that are agnostic of
cycle journey, the use of sessions as the basis for digital behavioral calendar time or device. Traditionally, web data has been organized
measurement and optimization can disappear. Instead, variables like this:

Illustrative session-based data architecture


Viewed
Performed Added item Item
Customer Session ID Date product Item viewed Item added Purchase
search to cart purchased
detail page
Mary Smith 12345 9/13/2017 1 0 0 0
Mary Smith 23456 9/14/2017 1 1 SKU 123 0 0
Mary Smith 34567 9/14/2017 0 1 SKU 123 1 SKU 123 0
Mary Smith 45678 9/15/2017 0 1 SKU 123 1 SKU 123 1 SKU 123
Operational Metrics: Searches/session; sessions/customer; purchases/session

This data architecture places undue emphasis on the session and Instead, the two most important data points — the customer and the
obscures the journey-to-purchase undertaken by the customer. item — should be used as the organizing principle:

Illustrative example based on customer journey


Viewed Added item Time to Time to Number of
Customer Item product Time stamp of view Purchase
to cart cart add purchase touchpoints
detail page
Mary Smith SKU 123 1 9/13/2017:14:24:13 2 1:24:18 1 2:17:36 4
Operational Metrics: Time-to-Purchase; Cart Adds/Purchases; Touchpoints/Purchases

Organizing data in this manner also allows for the inclusion of campaign codes, product category, individual transaction ID. The
additional touchpoints (social, email, display ad impressions, etc.), point is that the framework for digital measurement represented in
or even to offline data (call center, point-of-sale). Other table designs the first table is traditional, session-based. In the second table, the
could be used towards different data activation use-cases: marketing framework is based on customer journey.

10
| 5. The future of digital analytics

With increased customer omnichannel engagement and the real • Web analytics tools will need to adapt to compete, either by
improvements in technology to link data, the session is no longer the expanding to become audience DMPs or by focusing as single-
optimal organizing principle for the collection, storage and activation channel, granular data collection platforms that can be integrated
of digital behavioral data. Rather, mature organizations are moving with existing DMP solutions.
toward the customer journey as this organizing principle, replacing
Web analytics tools that do not enable integration and view of
the session with an over-time, multiple touchpoint view organized
the full customer life cycle do not capture the insights these
by life cycle stages and milestones, and collected through behavioral
integrations enable. As the industry evolves away from session-
signatures. What are the longer-term implications of this trend for
based customer views, tools will need to follow suit to stay
digital analytics? We predict the following:
competitive.
• Customer journey and experience mapping will increasingly
• Cloud-based data repositories will slowly be brought in-house in
become a data-led exercise informed as much by data scientists and
order to integrate PII (Personally Identifiable Information), PHI
analytics professionals as experience strategists.
(Protected Health Information), financial, legal, or transactional
The best target state experience designs will be informed by deep data.
insights derived from measuring behavioral signatures, and the
As powerful as they are and will become in the next year or two,
best analytics implementations will be designed from understanding
third-party cloud-based DMPs are stymied because they cannot
a journey map — with a sound test strategy built in to make sure
store private information. Secure as they may be (or may become),
both the analytics and experience people have it right.
most large organizations with large repositories of customer
• Just as web analytics expanded to become digital analytics, digital information will be legally (or socially) prevented from sending
analytics teams will blend with customer analytics, customer this to third-party DMPs. As the analytics power of full customer
insights, or customer experience teams. journey analytics becomes widely exploited, the most mature
organizations will need to bring as much data as possible in-house
Between 2008 and 2011, different emerging specializations such
in order to differentiate themselves analytically.
as social analytics, mobile analytics, web analytics and — to some
degree — marketing analytics became blurred under the heading This means that organizations will demand full feeds — including
of “digital analytics.”7 Today, we have customer analytics, digital visitor IDs — from their third-party DMPs. Walled gardens could
analytics, finance analytics, supply chain analytics and text analytics open their doors, even if they charge a hefty price at their gates.
— as databases become more comprehensive, it is likely that the line
• IoT (Internet of Things), in-store analytics, OBD, voice and mobile
between some of these specializations will become blurred.
geolocational data will be the new frontiers in customer analytics,
• Web analytics implementations will become more focused on those eventually to be integrated with the above.
behavioral dimensions that indicate key segments and steps in the
Each new device, sensor and interaction point becomes another
customer journey.
data source to better understand behavioral signatures and the
Implementations of web analytics tools consisting of hundreds context that best drives outcomes along the experience map.
of custom variables will become outdated. Rather, web-specific Session data is clearly not relevant to the connected home, car
behavioral dimensions will become standardized in a CMS-fed or quantified self, so it is time to remove it from our other digital
data layer to become one of many customer dimensions of equal experience data sources.
usefulness for segmentation and targeting.
Forward-thinking organizations have begun to invest valuable time
and resources in journey mapping and planning the customer
experience across omnichannel interactions. Many also have mature
digital analytics implementations that are divorced from this journey
construct. The time has come to bring these worlds together and
retire the session in favor of the customer experience.
7
As an example, the Web Analytics Association officially changed its name to the Digital Vivat Experientia!
Analytics Association in 2011.

... mature organizations are moving toward the customer journey as this
organizing principle, replacing the session with an over-time, multiple touchpoint
view organized by life cycle stages and milestones, and collected through
behavioral signatures.
Taking the measure of your customer experience | 11
Contact
Chad Richeson
Principal
Advisory, PI

+1 949 437 0505


chad.richeson@ey.com

Chris Gianutsos
Executive Director
Advisory, PI

+1 212 773 4402


chris.gianutsos@ey.com

Thomas Buchte
Manager
Advisory, PI

+1 310 592 2110


thomas.buchte@ey.com

Jason Bennett
Senior
Advisory, PI

+1 206 654 7472


jason.bennett@ey.com

12
EY | Assurance | Tax | Transactions | Advisory

About EY
EY is a global leader in assurance, tax, transaction and advisory services.
The insights and quality services we deliver help build trust and confidence
in the capital markets and in economies the world over. We develop
outstanding leaders who team to deliver on our promises to all of our
stakeholders. In so doing, we play a critical role in building a better working
world for our people, for our clients and for our communities.

EY refers to the global organization, and may refer to one or more, of the
member firms of Ernst & Young Global Limited, each of which is a separate
legal entity. Ernst & Young Global Limited, a UK company limited by
guarantee, does not provide services to clients. For more information about
our organization, please visit ey.com.

Ernst & Young LLP is a client-serving member firm of


Ernst & Young Global Limited operating in the US.

© 2018 Ernst & Young LLP.


All Rights Reserved.

US SCORE no. 00876-181US


1710-2442227
ED None

This material has been prepared for general informational purposes only and is not intended to be
relied upon as accounting, tax, or other professional advice. Please refer to your advisors for specific
advice.

ey.com
Advanced Web Metrics
Whitepaper

Understanding Web Analytics


Accuracy
by Brian Clifton (PhD)
Version 2.0, March 2010
Understanding Web Analytics Accuracy

Preface From The Author


When it comes to benchmarking the performance of your web site, Thank you for downloading this free
on-site web analytics measurement is critical. But this information is whitepaper. Documents such as these
only accurate if you avoid common errors associated with collecting represent the culmination of a huge effort on
the data – especially comparing numbers from different sources. my part to research, write and update the
This white paper is aimed at web managers, digital marketers and contents. My hope is to educate and inform so
webmasters who want to maximise the accuracy of their data. that you become comfortable with your web
visitor data, mitigate error bars, and go on to
Originally published in February 2008, this second edition has been build your analysis hypothesis on solid
completely revised and updated for 2010. foundations.

I would greatly appreciate your feedback -


either a tweet, blog comment or rating on this whitepaper’s
companion blog site would be great.

Brian Clifton

Add your comments on the blog - Measuring Success

Follow my interests and thoughts @BrianClifton

Join your peers on the LinkedIn Group

Copyright Statement: All content © 2010 by Brian Clifton - Copyright holder is licensing this under the Creative Commons License,
Attribution-Noncommercial-No Derivative Works 3.0 Unported, http://creativecommons.org/licenses/by-nc-nd/3.0/. (This means you can post
this document on your site and share it freely with your friends, but not resell it or use as an incentive for action.)

Web Analytics Accuracy Advanced-Web-Metrics.com Page 2 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

Table of Contents
Introduction........................................................................................4 Why PPC Vendor Numbers Do Not Match ................................. 15
How Web Sites Collect Visitor Data...............................................4 Tracking URLs: Missing Paid Search Click-throughs................... 15
Page Tags and Logfiles ....................................................................4 Slow Page Load Times................................................................... 15
Cookies in Web Analytics................................................................6 Clicks and Visits: Understanding the Difference........................... 16
Understanding Web Analytics Data Accuracy.............................7 PPC Account Adjustments ............................................................. 16
Issues Affecting Visitor Data Accuracy for Logfiles...................7 Keyword Matching: Bid Term versus Search Term...................... 16
Dynamically Assigned IP Addresses ...............................................7 Google AdWords Import Delay ...................................................... 16
Client-Side Cached Pages ...............................................................8 Losing Tracking URLs Through Redirects .................................... 16
Counting Robots................................................................................8 Data Misinterpretation ................................................................... 17
Issues Affecting Visitor Data From Page Tags............................8 Why Counting Uniques Is Meaningless...................................... 18
Setup Errors Causing Missed Tags .................................................8 Ten Recommendations For Enhancing Accuracy .................... 18
JavaScript Errors Halt Page Loading...............................................9 Summary .......................................................................................... 19
Firewalls Block Page Tags ...............................................................9 Acknowledgements........................................................................ 19
Logfiles “See” Mobile Users .............................................................9
Issues Affecting Visitor Data When Using Cookies....................9
Visitors Rejecting or Deleting Cookies ............................................9
Users Owning and Sharing Multiple Computers ...........................10
Latency Leaves Room for Inaccuracy ...........................................11
Offline Visits Skewing Data Collection...........................................11
Comparing Data From Different Vendors ...................................12
First-Party Versus Third-Party Cookies .........................................12
Page tags: Placement Considerations ..........................................12
Did You Tag Everything?................................................................12
Pageviews: A Visit or a Visitor? .....................................................12
Cookies Timeouts............................................................................13
Page-tag Code Hijacking ................................................................13
Data Sampling .................................................................................13
PDF files: A Special Consideration ................................................13
E-commerce: Negative Transactions.............................................13
Filters and Settings: Potential Obstacles.......................................13
Time Differences .............................................................................14
Process Frequency: Understanding glitches.................................14
Goal Conversions versus Pageviews ............................................14

Web Analytics Accuracy Advanced-Web-Metrics.com Page 3 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

Introduction
In the past decade, the Internet has transformed marketing, but
anyone expecting to increase their revenue and profitability using With these types of metrics, marketers and webmasters can
the web needs to get their facts straight with respect to web traffic. determine the direct impact of specific marketing campaigns. The
Of course, the web is a great medium to market and sell products level of detail is critical. For example, you can determine if an
and services. But if you don’t understand the behaviour of your web increase in pay-per-click advertising spend for a set of keywords on a
site visitors in sufficient detail, your business is going nowhere. single search engine – increased the return on investment during that
time period. So, as long as you can minimise inaccuracies, web
So it is no great surprise that the business of web analytics has analytics tools are effective for measuring visitor traffic to your online
grown in tandem with business use of the Internet. Put simply, web business. The remainder of this document examines, in detail, how
analytics are tools and methodologies used to enable organisations inaccuracies arise and how organisations can counter them.
to track the number of people who view their site and then use this
to measure the success of their online strategy.
How Web Sites Collect Visitor Data
The danger is, too many businesses take web analytics reports at
face value and this raises the issue of accuracy. After all, it isn’t Page Tags and Logfiles
difficult to get the numbers.
There` are two common techniques for collecting web visitor data –
However the harsh truth is web analytics data can never be 100 page tags and logfiles.
percent accurate, and even measuring the error bars is difficult.

So what’s the point? Page tags collect data via the


visitor’s web browser and send
First, the good news. Error bars remain pretty constant on a weekly, information to remote data-
or even a monthly, basis. Even comparing year-on-year behaviour collection servers. the analytics
can be safe as long as there are no dramatic changes in technology customer views reports from the
or end-user behaviour. As long as you use the same measurement remote server (see Figure 1). this
“yard stick”, visitor number trends will be accurate. information is usually captured by
Javascript code (known as tags or
Here are some examples of accurate metrics: beacons) placed on each page of
your site. some vendors also add
• 30 percent of my web site traffic came via search multiple cus- tom tags to collect
• 50 percent of visitors viewed page X.html additional data. this technique is known as client-side data collec-
• We increased conversions by 20 percent last week tion and is used mostly by outsourced, Software as a Service (SaaS)
• Pageviews at our site increased by 10 percent during March vendor solutions.

Web Analytics Accuracy Advanced-Web-Metrics.com Page 4 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

Logfiles refer to data collected by counter the disadvantages of the other. This is known as a hybrid
your web server independently of a method and some vendors can provide this.
visitor’s browser: the web server
logs its activity to a text file that is Table 1 – Page Tag versus logfile data collection
usually local. the analytics customer
views reports from the local server, Page Tagging Logfile Analysis
as shown in Figure 2. this
technique, known as server-side Advantages Advantages
data collection, captures all • Breaks through proxy and • Historical data can be
requests made to your web server, caching servers—provides more reprocessed easily.
including pages, images, and accurate session tracking. • No firewall issues to worry
PDFs, and is most frequently used by stand-alone licensed software • Tracks client-side events—e.g., about.
vendors. JavaScript, Flash, Web 2.0 • Can track bandwidth and
(Ajax). completed downloads—and
In the past, the easy availability of web server logfiles made this • Captures client-side e-commerce can differentiate between
technique the one most frequently adopted for understanding the data—server-side access can be completed and partial
problematic. downloads.
behaviour of visitors to your site. In fact, most Internet service
• Collects and processes visitor • Tracks search engine spiders
providers (ISPs) supply a freeware log analyzer with their web- data in nearly real time. and robots by default.
hosting accounts (Analog, Webalizer, and AWstats are some • Allows the vendor to perform • Tracks legacy mobile visitors
examples). Although this is probably the most common way people program updates for you. by default.
first come in contact with web analytics, such freeware tools are too • Allows the vendor to perform data
basic when it comes to measuring visitor behaviour and are not storage and archiving for you.
considered further in this book.
Disadvantages Disadvantages
In recent years, page tags have become more popular as the
method for collecting visitor data. Not only is the implementation of • Setup errors lead to data loss—if you • Proxy and caching
page tags easier from a technical point of view, but data- make a mistake with your tags, data is inaccuracies—if a page is
management requirements are significantly reduced because the lost and you cannot go back and cached, no record is logged
data is collected and processed by external SaaS servers (your reanalyze. on your web server.
• Firewalls can mangle or restrict tags. • No event tracking—e.g., no
vendor), saving website owners the expense and maintenance of • Cannot track bandwidth or completed JavaScript, Flash, Web 2.0
running licensed software to capture, store, and archive information. downloads—tags are set when the tracking (Ajax).
page or file is requested, not when the • Requires your own team to
Note that both techniques, when considered in isolation, have their download is complete. perform program updates.
limitations. Table 1 summarizes the differences. A common myth is • Cannot track search engine • Requires your own team to
that page tags are technically superior to other methods, but as spiders— robots ignore page perform data storage and
Table 1 shows, that depends on what you are looking at. By tags archiving.
combining both techniques, however, the advantages of one • Robots multiply visit counts.

Web Analytics Accuracy Advanced-Web-Metrics.com Page 5 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

different page than the one a first-time visitor would view, such as a
Other Data-Collection Methods “welcome back” message to give them a more individual experience
or an auto-login for a returning subscriber.
Although logfile analysis and page tagging are by far the most
widely used methods for collecting web visitor data, they are The following are some cookie facts:
not the only methods. Network data-collection devices
(packet sniffers) gather web traffic data from routers into • Cookies are small text files (no larger than 4 Kb), stored locally,
black-box appliances. Another technique is to use a web that are associated with visited website domains.
server application programming interface (API) or loadable • Cookie information can be viewed by users of your computer,
module (also known as using notepad or a text editor application.
a plug-in, though this is not strictly correct terminology). • There are two types of cookies: first party and third party.
These are programs that extend the capabilities of the web • A first-party cookie is one created by the website domain. A
server—for example, enhancing or extending the fields that visitor requests it directly by typing the URL into their browser or
are logged. Typically, the collected data is then streamed to a by following a link.
reporting server in real time. • A third-party cookie is one that operates in the background and is
usually associated with advertisements or embedded content
that is delivered by a third-party domain not directly requested by
the visitor.
• For first-party cookies, only the website domain setting the
Cookies in Web Analytics cookie information can retrieve the data. this is a security feature
built into all web browsers.
Page tag solutions track visitors by using cookies. Cookies are small • For third-party cookies, the website domain setting the cookie
text messages that a web server transmits to a web browser so that can also list other domains allowed to view this information. the
it can keep track of the user’s activity on a specific website. The user is not involved in the transfer of third-party cookie
visitor’s browser stores the cookie information on the local hard information.
drive as name–value pairs. Persistent cookies are those that are still • Cookies are not malicious and can’t harm your computer. they
available when the browser is closed and later reopened. can be deleted by the user at any time.
Conversely, session cookies last only for the duration of a visitor’s • A maximum of 50 cookies are allowed per domain for the latest
session (visit) to your site. versions of IE8 and Firefox 3. Other browsers may vary (opera 9
currently has a limit of 30; Safari and Google Chrome have no
For web analytics, the main purpose of cookies is to identify users limit on the number of cookies per domain).
for later use—most often with an anonymous visitor id. Among many
things, cookies can be used to determine how many first-time or
repeat visitors a site has received, how many times a visitor returns
each period, and how much time passes between visits. Web
analytics aside, web servers can also use cookie information to
present personalized web pages. A returning customer might see a
Web Analytics Accuracy Advanced-Web-Metrics.com Page 6 of 20
! Brian Clifton
Understanding Web Analytics Accuracy

inaccuracies, web analytics tools are effective for measuring visitor


Understanding Web Analytics Data traffic to your online business.

Accuracy
Conflicting Data Points Are Common
When it comes to benchmarking the performance of your website,
web analytics is critical. However, this information is accurate only if A UK survey of 800 organizations revealed that almost two-
you avoid common errors associated with collecting the data— thirds (63 percent) of respondents say they experience
especially comparing numbers from different sources. Unfortunately, conflicting information from different sources of online
too many businesses take web analytics reports at face value. After measurement data (“Online Measurement and Strategy Report
all, it isn’t difficult to get the numbers. the harsh truth is that web 2009,” Econsultancy.com, June 2009).
analytics data can never be 100 percent accurate, and even
measuring the error bars can be difficult.

So what’s the point? Next, I’ll discuss in detail why such inaccuracies arise, so you can put
this information into perspective. the aim is for you to arrive at an
Despite the pitfalls, error bars remain relatively constant on a acceptable level of accuracy with respect to your analytics data.
weekly, or even a monthly, basis. Even comparing year-by-year Recall from Table 1 that there are two main methods for collecting
Behaviour can be safe as long as there are no dramatic changes in web visitor data—logfiles and page tags—and both have limitations.
technology or end-user behaviour. As long as you use the same
yardstick, visitor number trends will be accurate. For example, web
analytics data may reveal patterns like the following:
Issues Affecting Visitor Data Accuracy for
• Thirty percent of site traffic came from search engines. Logfiles
• Fifteen percent of site revenue was generated by product
page x.html. Logfile tracking is usually set up by default on web servers. Perhaps
• We increased subscription conversions from our email because of this, system administrators rarely consider any further
campaigns by 20 percent last week. implications when it comes to tracking.
• Bounce rate decreased 10 percent for our category pages
during March.
Dynamically Assigned IP Addresses
With these types of metrics, marketers and webmasters can
determine the direct impact of specific marketing campaigns. The Generally, a logfile solution tracks visitor sessions by attributing all
level of detail is critical. For example, you can determine if an hits from the same IP address and web browser signature to one
increase in pay-per-click advertising spending—for a set of person. This becomes a problem when ISPs assign different IP
keywords on a single search engine—increased the return on addresses throughout the session. A U.S.-based comScore study
investment during that time period. As long as you can minimize (http://www.comscore.com/Press_Events/Presentations_Whitepaper

Web Analytics Accuracy Advanced-Web-Metrics.com Page 7 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

s/2007/Cookie_Deletion_Whitepaper) showed that a typical home unnamed robots exist. For this reason, a logfile analyzer solution is
PC averages 10.5 different IP addresses per month. Those visits likely to over count visitor numbers, and in most cases this can be
will be counted as 10 unique visitors by a logfile analyzer. This issue dramatic.
is becoming more severe, because most web users have identical
web browser signatures (currently internet explorer). As a result,
visitor numbers are often vastly over counted. This limitation can be
Issues Affecting Visitor Data From Page
overcome with the use of cookies. Tags
Client-Side Cached Pages Deploying a page tag on every single page is a process that can be
automated in many cases. However, for larger sites 100 percent
Client-side caching means a previously visited page is stored on a correct deployment is rarely achieved. Perhaps it is because the
visitor’s computer. In this case, visiting the same page again results page tag is hidden to the human eye or there is so much other data
in that page being served locally from the visitor’s computer, and available that those errors often go unnoticed for long periods.
therefore the visit is not recorded at the web server. Having a full deployment is crucial to the accuracy and validity of
data collected by this method.
Server-side caching can come from any web accelerator technology
that caches a copy of a website and serves it from their servers to Setup Errors Causing Missed Tags
speed up delivery. This means that all subsequent site requests
come from the cache and not from the site itself, leading to a loss in The most frequent error by far observed for page tagging solutions
tracking. Today, most of the Web is in some way cached to improve comes from its setup. Unlike web servers, which are configured to
performance. For example, see Wikipedia’s cache description at log everything delivered by default, a page tag solution requires the
http://en.wikipedia.org/wiki/Cache. webmaster to add the tracking code to each page. Even with an
automated content management system, pages can and do get
Counting Robots missed.

Robots, also known as spiders or web crawlers, are most often used In fact, evidence from analysts at Maxamine
by search engines to fetch and index pages. However, other robots (http://www.maxamine.com)—now part of Accenture Marketing
exist that check server performance—uptime, download speed, and Sciences—who used their automatic page auditing tool has shown
so on—as well as those used for page scraping, including price that some sites claiming that all pages are tagged can actually have
comparison, e-mail harvesters, competitive research, and so on. as many as 20 percent of pages missing the page tag—something
These affect web analytics because a logfile solution will also show the webmaster was completely unaware of. In one case, a corporate
all data for robot activity on your website, even though robots are business-to-business site was found to have 70 percent of its pages
not real visitors. missing tags. Missing tags equals no data for those pageviews.

When counting visitor numbers, robots can make up a significant


proportion of your pageview traffic. Unfortunately, these are difficult
to filter out completely because thousands of home grown and

Web Analytics Accuracy Advanced-Web-Metrics.com Page 8 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

JavaScript Errors Halt Page Loading vendors can revert to using the visitor’s IP address for tracking in
these instances, but mixing methods is not recommended. As
Page tags work well, provided that Javascript is enabled on the discussed previously in “issues affecting visitor data accuracy for
visitor’s browser. Fortunately, only about 1 to 3 percent of Internet logfiles” (comScore report), using visitor IP addresses is far less
users have disabled Javascript on their browsers, as shown in accurate than simply not counting such visitors. It is therefore better
Figure 3. However, the inconsistent use of Javascript code on web to be consistent with the processing of data.
pages can cause a bigger problem: Any errors in other Javascript
on the page will immediately halt the browser scripting engine at Logfiles “See” Mobile Users
that point, so a page tag placed below it will not execute.
A mobile web audience study by comScore back in January 2007
(www.comscore.com/press/release.asp?press=1432) showed that in
the United States, 30 million (or 19%) of the 159 million U.S. Internet
users accessed the Internet from a mobile device. At that time, the
vast majority of mobile phones did not understand Javascript or
cookies, and hence only logfile tools were able to track visitors who
browsed using their mobile phones.

However, thanks mainly to the phenomenal success of the iPhone,


mobile visitors on your website can now be tracked with page tag
web analytics, because the browser software is very similar to that
found on regular laptops and PCs, that is, where both Javascript and
cookies are used.

Issues Affecting Visitor Data When Using


Cookies
Figure 3 Percentage of Internet users with Javascript-disabled browsers.
Source: 1,000,000,000 visits across multiple industry web properties using Cookies are a very simple, well-established way of tracking visitors.
Indextools (www.visualrevenue.com/blog—Dennis R. Mortensen) However, their simplicity and transparency (any user can remove
them) presents issues in themselves. The debate of using cookies or
Firewalls Block Page Tags not remains a hot topic of conversation in web analytics circles.

Corporate and personal firewalls can prevent page tag solutions Visitors Rejecting or Deleting Cookies
from sending data to collecting servers. In addition, firewalls can
also be set up to reject or delete cookies automatically. Once again, Cookie information is vital for web analytics because it identifies
the effect on visitor data can be significant. Some web analytics visitors, their referring source, and subsequent pageview data. The

Web Analytics Accuracy Advanced-Web-Metrics.com Page 9 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

current best practice is for vendors to process first-party cookies machine, a web analytics solution will ‘see’ them as a
only. This is because visitors often view third-party cookies as different and new visitor every time.
infringing on their privacy, opaquely transferring their information to
third parties without explicit consent. Therefore, many anti-spyware
programs and firewalls exist to block third-party cookies Correcting Data for Cookie Deletion and Rejection
automatically. It is also easy to do this within the browser itself. By
contrast, anecdotal evidence shows that first-party cookies are Calculating a correction factor to account for your visitors
accepted by more than 95 percent of visitors. either deleting or rejecting your web analytics cookies is quite
straightforward. All you need is a website that requires a user
Visitors are also becoming savvier and often delete cookies. login. That way you can count the number of unique login IDs
independent surveys conducted by Belden Associates (2004), and divide it by the number of unique users your web analytics
Jupiterresearch (2005), Nielsen//Netratings (2005) and comScore tool reports. The result is a correction factor that can be
(2007) concluded that cookies are deleted by at least 30 percent of applied to subsequent data (number of unique visitors, number
internet users in a month. of new visitors, or number of returning visitors).

Users Owning and Sharing Multiple Computers Having a website that requires a user login is, thankfully in my
view, quite rare, because people wish to access information
User behaviour has a dramatic effect on the accuracy of information freely and as easily as possible. So, although the correction-
gathered through cookies. Consider the following scenarios: factor calculation is straightforward, you most probably don’t
have any login data to process. Fortunately, a small number of
Same user, multiple computers websites can calculate a correction factor to shed light on this
• Today, people access the Internet in any number of ways – issue. These include online banks and popular brands such as
from work, home, or public places such as Internet cafes. Amazon, FedEx, and social network sites, where there is a real
One person working from three different machines results in user benefit to both having an account and (most importantly)
three cookie settings, and all current web analytics solutions using it when visiting the site.
will count each of these anonymous user sessions as
unique. A specific example is Sun Microsystems Forums
(http://forums.sun.com), a global community of developers with
Different users, same computer nearly 1 million contributors. A 2009 study by Paul Strupp and
• People share their computers all the time, particularly with Garrett Clark, published at http://blogs.sun.com/pstrupp/,
their families, and, as a result, cookies are shared too reveals some interesting data.
(unless you log off or switch off you computer each time it is
used by a different person). In some instances, cookies are When using third-party cookies:
deleted deliberately. For example, Internet cafes are set up
to do this automatically at the end of each session. So even • 78% is the correction factor for monthly unique users.
if a visitor uses that cafe regularly and works from the same • 20% of users delete (more correctly defined as lose) their
measurement cookie at least once per month.

Web Analytics Accuracy Advanced-Web-Metrics.com Page 10 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

most low-value items are either instant purchases or are purchased


• 5% of users block the third-party measurement within seven days of the initial website visit. With such a short time
cookie. period between visitor arrival and purchase, your web analytics
solution has the best possible chance of capturing all the visitor
When using first-party cookies: pageview and behaviour information and therefore reporting more
accurate results.
• The correction factor improves to 83%.
• Percentage of users who delete their measurement Higher-value items usually mean a longer consideration time before
cookie at least once per month decreases to 14%. the visitor commits to becoming a customer. For example, in the
• Percentage of users who block the first-party travel and finance industries, the consideration time between the
measurement cookie drops to less than 1%. initial visit and the purchase can be as long as 90 days. During this
time, there’s an increased risk of the user deleting cookies,
Note that this is a tech-savvy audience—those who can reinstalling the browser, upgrading the operating system, buying a
delete/block an individual cookie without a second thought. new computer, or dealing with a system crash. Any of these
occurrences will result in users being seen as new visitors when they
An interesting observation from the study that Paul himself finally make their purchase. Offsite factors such as seasonality,
highlights, is the relatively small value of the correction factor. adverse publicity, offline promotions, or published blog articles or
That is, when using a first-party cookie, a more precise comments can also affect latency.
unique visitor count is 0.83 multiplied by the reported value.
Putting this into context, as part of the analysis, 30% of users Offline Visits Skewing Data Collection
who used more than one computer in a month to visit the
forum were removed from the data prior to analysis. This It is important to factor in problems that are unrelated to the method
indicates that multiple-device access happens more used to measure visitor behaviour but that still pose a threat to data
frequently than cookie deletion. accuracy. High-value purchases such as cars, loans, and mortgages
are often first researched online and then purchased offline.
It is tempting to think that this data can be used to correct Connecting offline purchases with online visitor behaviour is a long-
your own unique visitor counts. However, the correction factor standing enigma for web analytics tools. Currently, the best-practice
is a complicated function of cookie deletion, multiple way to overcome this limitation is to use online voucher schemes
computer use, and visitor return frequency. These factors will that visitors can print and take with them to claim a free gift, upgrade,
almost certainly be different for your specific website. or discount at your store. If you would prefer to receive your orders
Nonetheless, it is a useful rule-of-thumb guide. online, consider providing similar incentives, such as web-only
pricing, free delivery if ordered online, and the like.

Latency Leaves Room for Inaccuracy Another issue to consider is how your offline marketing is tracked.
Without taking this into account, visitors who result from your offline
The time it takes for a visitor to be converted into a customer campaign efforts will be incorrectly assigned or grouped with other
(latency) can have a significant effect on accuracy. For example, referral sources and therefore skew your data.

Web Analytics Accuracy Advanced-Web-Metrics.com Page 11 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

Stone Temple Consulting conducted a similar study in 2007. Their


Comparing Data From Different Vendors results showed that the difference between a tracking tag placed at
the top or bottom of a page accounted for a 4.3% difference in unique
As shown earlier, it is virtually impossible to compare the results of
visitor traffic. This was attributed to the 1.4 second difference in
one data-collection method with another. the association simply isn’t
executing the page tag.
valid. However, given two comparable data-collection methods—
both page tags—can you achieve consistency? Unfortunately, even
In addition, non-related Javascript placed at the top of the page can
comparing vendors that employ page tags has its difficulties.
interfere with Javascript page tags that have been placed lower
Factors that lead to differing vendor metrics are described in the
down. Most vendor page tags work independently of other Javascript
following sections.
and can sit comfortably alongside other vendor page tags—as shown
in the Stone Temple Consulting report in which pages were tagged
First-Party Versus Third-Party Cookies for five different vendors. However, Javascript errors on the same
There is little correlation between the two because of the higher page will cause the browser scripting engine to stop at that point and
blocking rates of third-party cookies by users, firewalls, and anti- prevent any Javascript below it, including your page tag, from
spyware software. For example, the latest versions of Microsoft executing.
Internet Explorer block third-party cookies by default if a site doesn’t
have a compact privacy policy (see www.w3.org/P3P). Did You Tag Everything?
Many analytics tools require links to files such as PDFs, Word
Page tags: Placement Considerations documents, or executable downloads or outbound links to other
Page-tag vendors often recommend that their page tags be placed websites to be modified in order to be tracked. this may be a manual
just above the </body> tag of your HTML page to ensure that the process whereby the link to the file needs to be modified. The
page elements, such as text and images, load first. This means that modification represents an event or action when it is clicked, which
any delays from the vendor’s servers will not interfere with your sometimes is referred to as a virtual pageview. Comparing different
page loading. the potential problem here is that repeat visitors, vendors requires this action to be carried out several times with their
those more familiar with your website navigation, may navigate specific codes (usually with Javascript). Take into consideration that
quickly, clicking onto another page before the page tag has loaded whenever pages have to be coded, syntax errors are a possibility. If
to collect data. clearly, the longer the delay, the greater the page updates occur frequently, consider regular website audits to
discrepancy will be. validate your page tags.

Tag placement was investigated in a 2009 whitepaper by Pageviews: A Visit or a Visitor?


tagMan.com. Their study of latency effects revealed that
Pageviews are quick and easy to track; and because they require
approximately 10 percent of reported traffic is lost for every extra
only a call from the page to the tracking server, they are very similar
second a page takes to load. in addition, moving the Google
among vendors. the challenge is differentiating a visit from a visitor;
Analytics page tag from the bottom of a page to the top increased
and because every vendor uses a different algorithm, no single
the reported traffic by 20%.
algorithm results in the same value.

Web Analytics Accuracy Advanced-Web-Metrics.com Page 12 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

Cookies Timeouts This is an important distinction as information on whether or not the


visitor completes the download – for example a 50-page PDF file – is
The allowed duration of timeouts—how long a web page is left not available. Therefore, a click on a PDF link is reported as a single
inactive by a visitor—varies among vendors. Most page-tag vendors event or pageview.
use a visitor-session cookie timeout of 30 minutes. This means that
continuing to browse the same website after 30 minutes of inactivity
is considered to be a new repeat visit. However, some vendors offer Note: The situation is different for logfile solutions. When you
the option to change this setting. Doing so will alter any data view a PDF file within your web browser, Adobe Reader can
alignment and therefore affect the analysis of reported visitors. download the file one page at a time, as opposed to a full
Other cookies, such as the ones that store referrer details, will have download. This results in a slightly different entry in your web
different timeout values. For example, Google Analytics referrer server logfile, showing an HTTP status code 206 (partial file
cookies last six months. Differences in these timeouts between download). Logfile solutions can treat each of the 206 status
different web analytics vendors will obviously be reflected in the code entries as individual pageviews. When all the pages of a
reported visitor numbers. PDF file are downloaded, a completed download is registered
in your logfile with a final HTTP status code of 200 (download
Page-tag Code Hijacking completed). Therefore, a logfile solution can report a
completed 50-page PDF file as 1 download and 50 pageviews.
Depending on your vendor, your page tag code could be hijacked,
copied, and executed on a different or unrelated website. This
contamination results in a false pageview within your reports. By
using filters, you can ensure that only data from your domains are E-commerce: Negative Transactions
reported.
All e-commerce organizations have to deal with product returns at
Data Sampling some point, whether because of damaged or faulty goods, order
mistakes, or other reasons. Accounting for these returns is often
This is the practice of selecting a subset of data from your website forgotten within web analytics reports. For some vendors, it requires
traffic. Sampling is widely used in statistical analysis because the manual entry of an equivalent negative purchase trans- action.
analyzing a subset of data gives very similar results to analyzing all Others require the reprocessing of e-commerce data files. Whichever
of the data, yet can provide significant speed benefits when method is required, aligning web visitor data with internal systems is
processing large volumes of information. Different vendors may use never bulletproof. For example, the removal or crediting of a
different sampling techniques and criteria, resulting in data transaction usually takes place well after the original purchase and
misalignment. therefore in a different reporting period.

PDF files: A Special Consideration Filters and Settings: Potential Obstacles

For page tag solutions, it is not the completed PDF download that is Data can vary when a filter is set up in one vendor’s solution but not
reported, but the fact that a visitor has clicked on a PDF file link. in another. Some tools can’t set up the exact same filter as another

Web Analytics Accuracy Advanced-Web-Metrics.com Page 13 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

tool, or they apply filters in a different way or at a different point Process Frequency: Understanding glitches
during data processing.
the frequency of processing is best illustrated by example: google
Consider, for example, a page-level filter to exclude all error pages Analytics does its number crunching to produce reports hourly.
from your reports. Visit metrics such as time on site and page depth however, because it takes time to col- late all the logfiles from all of
may or may not be adjusted for the filter depending on the vendor. the data-collecting servers around the world, reports are three to four
This is because some vendors treat page-level metrics separately hours behind the current time. in most cases, it is usually a smooth
from visitor-level metrics. pro- cess, but sometimes things go wrong. For example, if a logfile
transfer is interrupted, then only a partial logfile is processed.
because of this, google collects and reprocesses all data for a 24-
Time Differences hour period at the day’s end. other vendors may do the same, so it is
important not to focus on discrepancies that arise on the current day.
A predicament for any vendor when it comes to calculating the time
on site or time on page for a visitor’s session involves how to Goal Conversions versus Pageviews
calculate for the last page viewed. For example, time spent on
pageA is calculated by taking the difference between the visitor’s Using Figure 4 as an example, assume that five pages are part of
timestamp for pageA and the subsequent timestamp for pageB, and your defined funnel (click-stream path), with the last step (page 5)
so on. But what if there is no pageC; How can the time on page be being the goal conversion (purchase). During checkout, a visitor goes
calculated for pageB if there is no following timestamp? back up a page to check a delivery charge (step A) and then
continues through to complete payment. The visitor is so happy with
Different vendors handle this in different ways. Some ignore the final the simplicity of the entire process that she then purchases a second
pageview in the calculation; others use an onUnload event to add a item using exactly the same path during the same visitor session
timestamp should the visitor close their browser or go to a different (step B).
website. Both are valid methods, although not every vendor uses
the onUnload method. The reason some vendors prefer to ignore Depending on the vendor you use, this process can be counted in
the last page is that it is considered the most inaccurate from a time various ways, as follows:
point of view— perhaps the visitor was interrupted to run an errand
or left their browser in its current state while working on something • Twelve funnel page views, two conversions, two transactions
else. Many users behave in this way; that is, they complete their • Ten funnel page views (ignoring step A), two conversions,
browsing task and simply leave their browser open on the last page two transactions
while working in another application. A small number of pageviews • Five funnel page views, two conversions, two transactions
of this type will disproportionately skew the time-on-site and time- • Five funnel page views, one conversion (ignoring step B),
on-page calculations; hence, most vendors avoid this issue. two transactions

Note: Google Analytics ignores the last pageview of a visitor’s Most vendors, but not all, apply the last rationale to their reports. That
session when calculating the time-on-site and time-on-page is, the visitor has become a purchaser (one conversion); and this can
metrics. happen only once in the session, so additional conversions

Web Analytics Accuracy Advanced-Web-Metrics.com Page 14 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

(assuming the same goal) are ignored. For this to be valid, the same
rationale must be applied to the funnel pages. In this way, the data Note: in the above example, the total number of pageviews is
becomes more visitor-centric. 12 and should be reported as such in all pageview reports. It is
the funnel and goal conversion reports that will be different.

Why PPC Vendor Numbers Do Not Match


If you are using pay-per-click (PPC) networks, you will typically have
access to the click-through reports provided by each network. Quite
often, these numbers don’t exactly align with those reported in your
web analytics reports. This can happen for the reasons described in
the following sections.

Tracking URLs: Missing Paid Search Click-throughs


Tracking URLs are required in your PPC account setup in order to
differentiate between a non-paid search engine visitor click-through
and a paid click-through from the same referring domain –
Google.com or Yahoo.com, for example. Tracking URLs are simple
modifications to your landing page URLs within your PPC account
and are of the form www.mysite.com?source=adwords. Tracking
URLs forgotten during setup, or sometimes simply assigned
incorrectly can lead to such visits incorrectly assigned.

Slow Page Load Times


As previously discussed, the best practice location for web analytics
data-collection tags is at the bottom of your pages—just above the
</body> HTML tag. if your PPC landing pages are slow to download
for whatever reason (server delays, page bloat, and so on), it is likely
that visitors will click away, navigating to another page on your site or
even to a different website, before the data-collection tag has had
chance to load. the chance of this happening increases the longer
the page load time is. the general rule of thumb for what constitutes a

Web Analytics Accuracy Advanced-Web-Metrics.com Page 15 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

long page load is only two seconds (see ‘blue shoes’ and clicks on your ad. Web analytics vendors may
www.akamai.com/html/about/press/releases/2009/press_091409.ht report the search term, the bid term or both.
ml).
Google AdWords Import Delay
Clicks and Visits: Understanding the Difference
Within your AdWords account, you’ll see that data is updated hourly.
Remember that PPC vendors, such as Google AdWords, measure This is because advertisers need this information to control budgets.
clicks. Most web analytics tools measure visitors who can accept a Google Analytics imports AdWords cost data once a day. This is for
cookie. Those are not always going to be the same thing when you the data range minus 48 to 24 hours from 23:59 the previous day (so
consider the effects on your web analytics data of cookie blocking, AdWords cost data is always at least 24 hours old).
Javascript errors, and visitors who simply navigate away from your
landing page quickly—before the page tag collects its data. Why the delay? because it allows time for the AdWords invalid-click
Because of this, web analytics tools tend to slightly underreport and fraud- protection algorithms to complete their work and finalize
visits from PPC networks. click-through numbers for your account. therefore, from a reporting
point of view, the recommendation is to not compare AdWords visitor
PPC Account Adjustments numbers for the current day. this recommendation holds true for all
web analytics solutions and all PPc advertising networks.
Google AdWords and other PPC vendors automatically monitor
invalid and fraudulent clicks and adjust PPC metrics Note: Although most of the AdWords invalid click updates take
retroactively. For example, a visitor may click your ad several place within hours, final adjustments may take longer. For this
times (inadvertently or on purpose) within a short space of time. reason, even if all other factors are eliminated, AdWords
Google AdWords automatically investigates this influx and numbers and web analytics reports may never match exactly.
removes the additional click-throughs and charges from your
account. However, web analytics tools have no access to these
systems and so record all PPC visitors. For further information Losing Tracking URLs Through Redirects
on how Google treats invalid clicks, see:
http://adwords.google.com/support/bin/topic.py?topic=35 Using third-party ad-tracking systems—such as Adform, Atlas
Search, Blue Streak, DoubleClick, Efficient Frontier, and SEM
Keyword Matching: Bid Term versus Search Term Director—to track click-throughs to your website means your visitors
are passed through redirection URLs. This results in the initial click
The bid terms you select within your PPC account and the search being registered by your ad company, which then automatically
terms used by visitors that result in your PPC ad being displayed redirects the visitor to your actual landing page. The purpose of this
can often be different: think ‘broad match’. For example, you may two-step hop is to allow the ad-tracking network to collect visitor
have set up an ad group that targets the word ‘shoes’ and solely statistics independently of your organization, typically for billing
relies on broad match to match all search terms that contain the purposes. Because this process involves a short delay, it may
word ‘shoes’. This is your bid term. A visitor uses the search term prevent some visitors from landing on your page. The result can be a
small loss of data and therefore failure to align data.

Web Analytics Accuracy Advanced-Web-Metrics.com Page 16 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

More important, and more common, redirection URLs may break


the tracking parameters that are added onto the landing pages for
Data Misinterpretation
your own web analytics solution. For example, your landing page
The following are not accuracy issues. However, they point out that
URL may look like this:
data is not always so straightforward to interpret. Take the following
two examples:
http://www.mysite.com/?source=google&medium=ppc&ca
mpaign=Jan10
• New visitors plus repeat visitors does not equal total visitors.
When added to a third-party tracking system for redirection, it could
A common misconception is that the sum of the new plus repeat
look like this:
visitors should equal the total number of visitors. Why isn’t this
the case? Consider a visitor making his first visit on a given day
http://www.redirect.com?http://www.mywebsite.com?s
ource=google&medium=ppc&campaign=Jan10 and then returning on the same day. They are both a new and a
repeat visitor for that day. Therefore, looking at a report for the
given day, two visitor types will be shown, though the total
The problem occurs with the second question mark in the second
number of visitors is one. It is therefore better to think of visitor
link, because you can’t have more than one in any valid URL. Some
types in terms of “visit” type - that is, the number of first-time
third-party ad-tracking systems will detect this error and remove the
visits plus the number of repeat visits equals the total number of
second question mark and the following tracking parameters,
visits.
leading to a loss of campaign data.
• Summing the number of unique visitors per day for a week
Some third-party ad-tracking systems allow you to replace the
does not equal the total number of unique visitors for that
second ? with a # so the URL can be processed correctly. If you are
week.
unsure of what to do, you can avoid the problem completely by
using encoded landing-page URLs within your third-party ad-
Consider the scenario in which you have 1,000 unique visitors to
tracking system, as described at the following site:
your website blog on a Monday. These are in fact the only unique
www.w3schools.com/tags/ref_urlencode.asp.
visitors you receive for the entire week, so on Tuesday the same
1,000 visitors return to consume your next blog post. This pattern
continues for Wednesday through Sunday.
Note: From experience, the most common reasons for
discrepancies between PPC vendor reports and web
If you were to look at the number of unique visitors for each day
analytics tools arise from:
of the week in your reports, you would observe 1,000 unique
• Tracking URLs failing to distinguish paying and
visitors. However you cannot say that you received 7,000 unique
nonpaying visitors
visitors for the entire week. For this example, the number of
• Slow page downloading
unique visitors for the week remains at 1,000.
• Losing data via third-party ad-tracking redirects

Web Analytics Accuracy Advanced-Web-Metrics.com Page 17 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

period and the seeking of a second opinion from a spouse, friends, or


Why Counting Uniques Is Meaningless work colleagues (the Sun Microsystems study discussed earlier
estimated the percentage of users using more than one computer in
The term uniques is often used in web analytics as an abbreviation a month to visit the same website as 30 percent).
for unique web visitors, that is, how many unique people visited your
site. The problem is that counting unique visitors is fraught with Simply put, there is not a web analytics solution in the world that can
problems that are so fundamental, it renders the term uniques accurately track this scenario, that is, to tie the data together from
meaningless. multiple devices and where multiple people have been involved, nor
is there likely to be one in the near future.
As discussed earlier, cookies get lost, blocked, and deleted—nearly
one-third of tracking cookies can be missing after a period of four Combining these limitations leads to large error bars when it comes
weeks. The longer the time period, the greater the chance of this to tracking uniques. In fact, these errors are so large that the metric
happening, which makes comparing year-on-year uniques invalid, becomes meaningless and should be avoided where possible in
for example. In addition, browsers make it very easy these days for favor of more accurate “visit” data. That said, if you must use unique
cookies to be removed—see the new “incognito” features of the visitors as a key metric, ensure the emphasis is on the trend, not the
latest Firefox, Chrome, and Internet Explorer browsers. absolute number.

However, the biggest issue for counting uniques is how many Ten Recommendations For Enhancing
devices people use to access the Web. For example, consider the
following scenario: Accuracy
• You and your spouse are considering your next vacation. Your
spouse first checks out possible locations on your joint PC at 1. Be sure to select a tool that uses first-party cookies for data
home and saves a list of website links. collection.
2. Don’t confuse visitor identifiers. For example, if first-party cookies
• The next evening you use the same PC to review these links. are deleted, do not resort to using IP address information. It is
Unable to decide that night, you email the list to your office, and better simply to ignore that visitor.
the next day you continue your vacation checks during your 3. Remove or report separately all non-human activity from your
lunch hour at work and also review these again on your mobile data reports, such as robots and server-performance monitors.
while commuting home on the train. 4. Track everything. Don’t limit tracking to landing pages. Track
your entire website’s activity, including file downloads, internal
• Day 3 of your search resumes at your friend’s house, where you search terms, and outbound links.
seek a second opinion. Finally, you go home and book online 5. Regularly audit your website for page tag completeness (at least
using your shared PC. monthly for large websites). Sometimes site content changes
result in tags being corrupted, deleted, or simply forgotten.
This scenario is actually very common—particularly if the value of 6. Display a clear and easy-to-read privacy policy (required by law
the purchase is significant, which implies a longer consideration in the European union). This establishes trust with your visitors

Web Analytics Accuracy Advanced-Web-Metrics.com Page 18 of 20


! Brian Clifton
Understanding Web Analytics Accuracy

because they better understand how they’re being tracked and campaigns (paid search, email, banners), keywords, geographies, or
are less likely to delete cookies. devices (PC, Mac, mobile) are used.
7. Avoid making judgments on data that is less than 24 hours old,
because it’s often the most inaccurate. When all the possibilities of inaccuracy that affect web analytics
8. Test redirection URLs to guarantee that they maintain tracking solutions are considered, it is apparent that it is ineffective to focus on
parameters. absolute values or to merge numbers from different sources. if all web
9. Ensure that all paid online campaigns use tracking URLs to visitors were to have a login account in order to view your website,
differentiate from non-paid sources. this issue could be overcome. in the real world, however, the vast
10. Use visit metrics in preference to unique visitor metrics because majority of internet users wish to remain anonymous, so this is not a
the latter are highly inaccurate. viable solution.

These suggestions will help you appreciate the errors often made As long as you use the same measurement for comparing data
when collecting web analytics data. Understanding what these ranges, your results will be accurate. This is the universal truth of all
errors are, how they happen, and how to avoid them will enable you web analytics.
to benchmark the performance of your website. Achieving this
means you’re in a better position to then drive the performance of
your online business.
Acknowledgements
With thanks to the following people for their generous feedback in
Summary compiling this whitepaper: Sara Andersson, Nick Mihailovski, Alex
Ortiz-Rasado, Tomas Remotigue.
So, web analytics is not 100 percent accurate and the number of
possible inaccuracies can at first appear overwhelming. However,
get comfortable with your implementation and focus on measuring
trends rather than precise numbers. For example, web analytics can
help you answer the following questions:

• Are visitor numbers increasing?


• By what rate are they increasing (or decreasing)?
• Have conversion rates gone up since beginning PPC
advertising?
• How has the cart abandon rate changed since the site redesign?

If the trend shows a 10.5% reduction, for example, this figure should
be accurate, regardless of the web analytics tool that was used.
these examples are all high-level metrics, though the same accuracy
can also be maintained as you drill down and look at, for example,
which specific referrals (search engines, affiliates, social networks),
Web Analytics Accuracy Advanced-Web-Metrics.com Page 19 of 20
! Brian Clifton
Understanding Web Analytics Accuracy

About The Author


Brian Clifton (PhD), is an independent author, consultant and trainer
who specialises in performance optimisation using Google Analytics
and related tools. Recognised internationally as a Google Analytics
expert, his latest book, the second edition of Advanced Web Metrics
with Google Analytics is used by students and professionals world-
wide.

Brian has been involved in web design and SEO since as far back
as 1997, when he built his first website and started defining best
practise to advise clients. From 2005-8 he was Head of Web
Analytics for Google EMEA, defining the adoption strategy and
building a team of pan-European product specialists from scratch. A
legacy of his work is the online learning centre for the Google
Analytics Individual Qualification (GAIQ).

Brian is the Founder, CEO and Senior Strategist for GA-


Experts.com – a company specialising in performance optimisation
using Google Analytics and related products for global clients.

Add your comments on the blog - Measuring Success Advanced Web Metrics with Google Analytics is available form:
Amazon (including Kindle), Barnes & Noble and directly from Wiley.
Follow my interests and thoughts @BrianClifton

Join your peers on the LinkedIn Group A PDF ebook is also available

Web Analytics Accuracy Advanced-Web-Metrics.com Page 20 of 20


! Brian Clifton
Off-page SEO and link building: General
strategies and authority transfer in the
digital news media
Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

Nota: Este artículo se puede leer en español en:


http://www.elprofesionaldelainformacion.com/contenidos/2019/ene/09_es.pdf

How to cite this article:


Lopezosa, Carlos; Codina, Lluís; Gonzalo-Penela, Carlos (2018). “Off-page SEO and link building: General stra-
tegies and authority transfer in the digital news media”. El profesional de la información, v. 28, n. 1, e280107.
https://doi.org//10.3145/epi.2019.ene.07
Article received on June 15th, 2018
Approved on December 12th, 2018

Carlos Lopezosa * Lluís Codina


http://orcid.org/0000-0001-8619-2194 http://orcid.org/0000-0001-7020-1631
Pompeu Fabra University Pompeu Fabra University
Department of Communication Department of Communication
Roc Boronat, 138. 08018 Barcelona, Spain Roc Boronat, 138. 08018 Barcelona, Spain
carlos.lopezosa@upf.edu lluis.codina@upf.edu

Carlos Gonzalo-Penela
https://orcid.org/0000-0002-3380-6823
Pompeu Fabra University
Department of Communication
Roc Boronat, 138. 08018 Barcelona, Spain
carlos.gonzalo@upf.edu

Abstract
In recent years, a number of digital news media outlets have begun to include paid links in their content. This study seeks
to identify and analyse this content whose sole purpose is to improve the website authority of the advertisers and their
search engine rankings. To do so, it employs two basic methodologies: first, it undertakes a systematic review of off-page
SEO practices, the digital press and native advertising; and, second, it reports a case study based on the identification
and analysis of 150 news items that contain specially commissioned links resulting from a commercial transaction. The
study provides evidence of a new revenue stream for the digital news media, one that is not clearly disclosed and which
is based on the sale of links. The article includes a discussion of the case study findings, and presents future guidelines
for the use of paid links based on the emerging concept of ‘native advertising’.
Keywords
Digital news media; Online journalism; Digital journalism; SEO; Off-page SEO; Web positioning; Link building; Native
advertising; Journalism ethics.

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-2407 1


Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

1. Introduction and study goals


This study examines a new activity being practiced by the digital news media involving the sale of links and aimed at
improving the web positioning of the sites that receive them. These links are embedded in articles that are written solely
with this purpose in mind. In some instances, the content of the articles is actually provided by the buyer of the links.
It is, therefore, a form of advertising which, as it is not clearly disclosed, can easily be confused with editorial content.
Such content can give rise to three types of problem:
- as it is produced with the sole purpose of serving as a link vector, its quality is of secondary importance and, moreover,
it responds not to journalistic criteria but rather to those of advertising;
- as it is not clearly identified as sponsored content, it threatens to undermine professional ethics, which requires the
unambiguous separation of this type of content from editorial content.
- as it is presented in a non-transparent fashion, not only is the public unaware of such practices, but many studies (and
experts) of the digital media are also unaware of it.
If, in the current digital information ecosystem, this is We are dealing with a form of advertising
one of the engines of content creation, then experts un- that, as it is not clearly disclosed, can ea-
disputedly have an incomplete picture of this ecosystem sily be confused with editorial content
if they are unfamiliar with this practice.
To understand the emergence of this activity, in this study we examine the SEO strategy that underpins it, based on what
is known as ‘link building’. More specifically, the goals of this study can be stated as follows:
- To analyse and characterise a new line of activity in the digital news media centred on link buying/selling and to iden-
tify the actors involved.
- To classify the content published as a result of this activity and to examine its implications for off-page SEO strategies.
- To provide guidelines for the possible improvement of this activity by developing a set of best practices modelled on
so-called native advertising.
In keeping with these objectives, we address the following research questions:
- What are the main characteristics of this new line of activity centred on link buying/selling in the digital news media
and who are its main actors?
- What are the main characteristics of the content published as link vectors?
- Is it possible to develop a set of best practices for this activity based on native advertising in order to improve it?
1.1. Methodology
The following two basic methodologies were employed
As it is not identified as sponsored con-
in conducting this study:
tent, it threatens to undermine profes-
- Systematic literature review (Hart, 2008; Booth; Pa-
paionnou; Sutton, 2012) of the articles listed in the bi- sional ethics, which requires the unambi-
bliography of this paper, based on a prior consultation guous separation of this type of content
of the Scopus, Web of Science, Lista and Communica- from editorial content
tion Source databases. We also consulted the most
authoritative professional sources in relation to SEO –including Search Engine Journal and Search Engine Land- and
native advertising –including the Native Advertising Institute and the Nieman Reports-.
- Case study research (Yin, 2014) involving the identification, selection and classification of a set of 150 news items,
published by three digital news media outlets and originating from link buying/selling.
The results of the systematic literature review are presented first followed by the case study findings. The former enable
us to outline the main characteristics of off-page SEO strategies, which are the origin of this practice; the latter allow us
to classify the products of this activity.

2. Off-page SEO
It is worth recalling that Google was the first search engine to apply a technique based on hyperlink (i.e. the links be-
tween web pages) analysis to determine the relative importance of all pages on the World Wide Web.
For analyses of this type, the inventors of Google based their work on citation analysis in the academic world and its
corresponding impact factor. In this way, they designed a metric –PageRank- that serves to express the results of such
an analysis (Brin; Page, 2000).
Given its enormous efficacy, Google has had an enduring influence on the way in which search engines display their
results pages, with all of them adopting the same basic idea (Kleinberg, 1998; Lewandowski, 2012; Giomelakis; Veglis,
2015). The reason for its widespread adoption is that it provided the first genuinely efficient response to all the challen-
ges posed by Internet searches (Gonzalo-Penela; Codina; Rovira, 2015), although initially no firm in the search engine
sector seemed to realise.

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-24077 2


Off-page SEO and link building: General strategies and authority transfer in the digital news media

More specifically, the new idea developed by Google was the following: instead of calculating the relevance of each page
exclusively in terms of its intrinsic characteristics –including, for example, the number of times the keyword appears-, it
also took into account its extrinsic characteristics, most notably, the number and quality of links it receives (Harry, 2013).
What was the underpinning rationale? In broad terms, given two pages addressing the same theme, the more important
of the two is considered to be the one that receives the greater number of backlinks from websites which, in turn, are
highly linked (Brin; Page, 2000; Thelwall, 2004; Gonzalo-Penela, 2006).
Here, the key point is that part of a page’s PageRank can be transferred to other pages if they are linked to it. PageRank
is also a measure of a page’s authority in the same sense that a journal’s impact factor is a measure of its authority.
In this way, the net effect of these links –indistinctly known as backlinks, inbound links or external links- is to transfer
authority from the page that points to the linked page, improving its visibility in the search engines (Crowe, 2017; Gio-
melakis; Veglis, 2016).
Consequently, the number and quality of the links that
On-page SEO: actions to optimize web
link to a website are an indicator of its essential rele- page content.
vance, as well as being one of the most influential po- Off-page SEO: link building actions (to
sitioning factors (Fishkin, 2016; García-Carretero et al., obtain backlinks)
2016). It is not surprising, given these circumstances,
that firms’ SEO managers seek to implement link building strategies (Gonzalo-Penela, 2006; Serrano-Cobos, 2015). This,
in turn, has led to two major branches of SEO:
- On-page SEO: actions to optimize web page content.
- Off-page SEO: actions to obtain backlinks, that is link building.
Several link building procedures have been developed (Monterde, 2016; Publisuites, 2018), among which two stand out:
- Natural or editorial link building: this is based on a similar logic to that of the impact factor of academic articles, whe-
reby a high quality article is one that will be highly cited, thus establishing itself as an article of great authority. In the
case of the web, this type of link building is achieved by creating high quality content.
- Strategic link building: this is a proactive practice that requires direct contact between the website manager and the
author of another site to which a link is requested. If performed on a massive scale, Google, Bing, Yahoo, Yandex, etc.
are able to identify patterns of unnatural links, and if so, penalize those web sites by pushing them down the search
result rankings, or even excluding them from their indexes
The main goals, therefore, of off-page SEO professionals are (Cámaras-León, 2018; Rowe, 2018):
- to search for and obtain a large number of backlinks;
- to multiply the strength of backlinks by ensuring that the sites from which the links originate are in turn highly linked.
There exist various websites where it is possible to ob-
tain free backlinks. Primarily they can be obtained from Dofollow links link related themes. Given
web profiles, forums, social networks, blogs 2.0, com- their editorial nature, they transfer au-
ments on websites/blogs, wikis, content aggregators, thority to the linked website
directories, newspapers, third-party websites, etc., and
of course from other websites (Cooper, 2012).
From a technical point of view, but with far-reaching implications for the matter in hand, there are two types of backlink:
dofollow links (also known as follow), and nofollow links (Dean, 2018).
Both types of link are identified by means of the corresponding labelling of the source code (not visible on the page).
They can be explained as follows:
- dofollow links fulfil the original function of hyperlinks, that is, they link related themes. Due to their editorial nature,
Google considers them a way of transferring authority to the linked website, and the amount of authority or of PageRank
transferred depends on the quality or authority of the page that creates the link. Dofollow means that Google will follow
the link and attribute PageRank to the page that receives it. In theory, dofollow links are limited to editorial links. Dofollow
links do not have a brand. In other words, a standard link, without any additional brand, is a dofollow link.
- nofollow links, on the other hand, include a source code tag that tells searchers that this link cannot be used for Page-
Rank. It is a code that informs search engine robots not to follow the link (hence its name). Since they correspond to
advertising links, the transmission of authority in this case is zero.
2.1. Anchor text
Nofollow links include a label that tells
The links are made up not only of the corresponding the search engines they cannot be used
URL, but also of a text known as the anchor text (Gonzá-
to transfer authority because they are
lez-Villa, 2017). This is the portion of the text that acti-
vates the link on the web page from which it originates. advertisements

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-2407 3


Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

For Google, the anchor text forms part of the content of the linked site, and it is used to determine whether that site is
relevant for the keyword contained in the anchor text (Figure 1).
In short, we should stress the following: the authority of the site from which a link originates, the link’s anchor text and
the context in which that link is included are the most important elements of link building.
Figures 1 and 2 illustrate the main concepts associated with links, as presented above.
Figure 1 shows the structure of a link using the source code. It can be seen that:
- the link’s destination, that is, the page that will open in the browser if the user clicks on it is https://es.unesco.org
- the anchor text is Unesco.

Figure 1. Source code of a dofollow type link

This is a dofollow link because it does not include any additional coding (see Figure 2). For this reason, this link transfers
PageRank or authority to the Unesco page. If the page containing this link belongs to a leading digital newspaper, such
as The New York Times, the authority transferred will be very high. Moreover, Google will understand that the Unesco
keyword in the anchor text is part of the content of the destination page.
Figure 2 shows the structure of a nofollow link, since it incorporates the ‘rel’ attribute, with the nofollow value. Due to
this attribute, the link does not transfer authority to the (fictitious) destination page Store. In this case, the authority of
the page containing the link is of no significance. Moreover, because of this attribute, Google will not follow the link and
will not transfer any value.

Figure 2. Source code of a nofollow type link

3. Hyperlink buying/selling sector


Any human activity that can give rise to a supply and demand relationship will eventually generate a market. Here, news
media websites have something (offer) that the SEO managers of other sites need (demand), namely, backlinks of great
authority.
Hence, it was only a matter of time before SEO mana-
The authority of the site from which a
gers began to explore the possibility of buying links in link originates, the link’s anchor text and
the digital news media. To mediate in this relationship, a the context in which that link is included
number of intermediaries have emerged to act as go-be- are the most important elements of link
tweens for the website managers that need backlinks building
and the online news media.
Given this change in the digital information ecosystem, one of the objectives of this study was to identify the leading
firms operating in this sector. Although it is impossible to determine exactly how many companies are operating in Spain,
based on our monitoring of professional forums, social networks and other sources regularly used by the sector’s pro-
fessionals, we believe that those shown in Tables 1 to 5 are, by far, the most important.
To outline the main characteristics of these intermediaries, we present their principal features in the following data files.
All data are derived from their respective websites (Tables 1 to 5).

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-24077 4


Off-page SEO and link building: General strategies and authority transfer in the digital news media

Table 1. PrensaRank

Name PrensaRank

URL https://prensarank.com

Its website claims they have 3,660 customers, 408 newspapers from which they can obtain links, and (as of February
Description
2018) they had sold 30,024 articles to the news media.

On registration the user obtains a link purchase interface. We identify 305 newspapers, distributed geographically as
Digital news media follows: Andorra: 1 newspaper; Saudi Arabia: 1; Argentina: 9; Chile: 4; Spain: 236; USA: 2; Mexico: 45; Nicaragua: 1; Peru:
1; Portugal: 1; United Kingdom: 2; Venezuela: 2.

Current affairs; Love, weddings, relationships, and couples; Betting and casinos; Art, decoration and design; Film and tele-
vision; Cooking and gastronomy; Dating; Sports; Economics and politics; Education and culture; Company (advertising);
Themes Home, decoration and DIY; Humour and leisure; Computers and technology; Games and video consoles; Marketing and
SEO; Pets and nature; Music and shows; Fashion and beauty; Cars; Women, babies, and children; Others; Religion, mysti-
cism and esotericism; Health; Estate agency services; Sex shops; Sexuality; Tarot; Travel, hotels and tourism.

Maximum price for link € 950

Table 2. Unancor

Name Unancor

URL https://www.unancor.com

Description Its website claims (as of February 2018) they have 6,000 customers and 500 newspapers from which they can obtain links.

On registration the user obtains a link purchase interface. We identify 431 newspapers, distributed geographically as
follows: Germany: 53 newspapers; Argentina: 39; Canada: 1; Chile: 13; Colombia: 6; Costa Rica: 1; El Salvador: 1; Arab
Digital media
Emirates: 1; Spain: 213; USA: 18; France: 17; Italy: 1; Morocco: 1; Mexico: 44; Monaco: 1; Nicaragua: 1; Panama: 1; Peru:
2; Uruguay: 3; Venezuela: 5

All these newspapers are associated with one or more of the following topics: Art and culture; Health and sport; Econo-
mics and business; Education; Home, decoration and DIY; Cooking and recipes, gastronomy; Computers, technology,
Themes mobiles and apps; Marketing (offline and online); Nature (animals and plants); Cars and motorcycles; Cinema, TV and
music; News and politics; Travel and tourism; Others; Fashion and beauty; Erotica; Love, relationships, couples; Services
(locksmiths, home improvements, plumbers, etc.); Legal; Children; Tarot

Maximum price for link € 10,000

Table 3. Publisuites

Name Publisuites

URL https://www.publisuites.com/es

Its website claims they have 54,967 users and 478 newspapers from which they can obtain links. As of February 2018,
Description
they had sold 39,334 articles to the news media and blogs.

On registration the user obtains a link purchase interface. We identify 478 newspapers, distributed geographically as
follows: Argentina: 19 newspapers; Australia: 1; Bolivia: 1; Brazil: 4; Chile: 6; Colombia: 4; El Salvador: 1; Spain: 304; USA:
Digital media
3; France: 26; Honduras: 1; Italy: 70; Jersey: 1; Mexico: 14; Nicaragua: 1; New Zealand: 1; Panama: 1; Paraguay: 1; Peru: 6;
Portugal: 3; United Kingdom: 1; Dominican Republic: 1; Senegal: 1; South Africa: 1; Venezuela: 6.

All these newspapers are associated with one or more of the following topics: Betting, casinos and lotteries; Celebri-
ties; Cooking, recipes and gastronomy; Trivia; Sports; Economy; Education and training; Entrepreneurs and SMEs; Com-
Themes puters and programming; Literature and culture; Music and radio; Marketing, SEO and social platforms; Miscellaneous;
Fashion and accessories; Cars and motorcycles; Nature and ecology; News; Leisure and free time; Politics; Health;
Technology; Mobile telephones and apps; Travel and tourism.

Maximum price for link € 1,943

Table 4. RT Gopress

Name RT Gopress

URL https://rtgopress.com

Its website claims it is the most economically competitive Seo MarketPlace, Social Media and Growth Hacking firm in
Description
the market. They do not indicate how many newspapers or customers they have.

On registration the user obtains a link purchase interface. We identify 155 newspapers distributed geographically as
Digital media
follows: Argentina: 3; Mexico: 26; Spain: 126.

All these newspapers are associated with one or more of the following topics: Current affairs; Stock market; Sports;
Themes
Economics; Gastronomy; Marketing; Cars; Tourism; News; Technology; Health; Video games.

Maximum price for link The website operates a price filter, but it appears not to be operative.

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-2407 5


Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

Table 5. Dofollow

Name Dofollow.es

URL http://dofollow.es

While offering a similar service to the above firms, it operates differently. Thus, they offer what they call the dofollow
Description pack. The customer writes a press release including two links to its website (1 for a brand and the other a keyword)
and they undertake to publish the press release in four digital newspapers.

Its website includes general, regional, and specialized newspapers of all types. They state that these media may vary
Digital media
depending on availability.

Themes Unknown.

Maximum price for link € 339 for its most complete package.

4. Native advertising
The general absence of studies examining the link buying/selling sector in the digital news media and, hence, the deve-
lopment of any guidelines for its self-regulation, leads us here to consider the possibility of applying best practices in the
so-called native advertising industry.
The Native Advertising Institute defines native advertising as the use of paid ads that match the look, feel and function
of the content of the platform in which they appear (Schauster; Ferrucci; Neill, 2016; Pollitt, 2018).
Native advertising consists of news items, reports and,
in general, quality content, its effectiveness being ba- Native advertising must present a bran-
sed on credibility. These characteristics can be used to ded message that allows readers to re-
provide quality content to publications (Sweetser et al., cognize not only the fact that it is spon-
2016; Carlson, 2016). An essential point is that native sored content, but also the logical intent
advertising must present a branded message that allows
readers to recognize not only the fact that it is sponso-
of the advertisement to persuade and sell
red content (Ferrer-Conill, 2016; Amazeen; Muddiman,
2017; Amazeen; Wojdynski, 2018), but also the logical intent of the advertisement to persuade and sell (Mathiasen,
2018).
As such, the idea is that the digital press should have a model for including sponsored content that allows it to be diffe-
rentiated from their editorial content, and which, moreover, ensures it can be integrated naturally in the publication,
maintaining a level of quality similar to that of the platform that hosts it (Cramer, 2016; Li, 2017; Batsell, 2018).

5. Case study
Having identified the key components and actors operating in the link buying/selling industry, we present our case study,
which consists of a comparative analysis. We examined 150 news items that have been published as a direct result of
the buying/selling of links. As such, we are dealing with content specially commissioned with the aim of including links
to improve the website authority of the customers who purchase them.
To shed greater light on this procedure, we first explain how the whole process works. First, the customer contacts one
of the link building firms described in the section above to purchase backlinks to its website from the digital news media.
This news outlet then publishes content that includes links to the customer’s website. In so doing so the process is termi-
nated, following payment by the buyer at the price stipulated for receipt of backlinks. It should be stressed that what is
purchased is the link or backlink and that the content is merely the vehicle in which it is included, which generally results
in content unrelated to the newspapers normal editorial line.
To explore this market, we conducted an analysis whose object of study was three digital news media of medium to high
importance.
Given the nature of this analysis, we do not explicitly identify the name of each news outlet, but rather describe them as
accurately as possible using a series of data files (Tables 6 to 8). In these files we specifically incorporate the data provi-
ded by Alexa Rank, a ranking developed by Amazon, based on web traffic.
In addition, to lend greater credibility to the ranking of these three digital news media companies, we incorporate daily
unique user data for each of the three websites. To do so, we used Site Worth Traffic, which measures website traffic pro-
viding unique and total user data, social network performance metrics, and a complete analysis of the site’s evolution.
To select the news stories from the three media com-
News media websites can offer backlinks
panies, we purchased three items from the Prensarank
website (one item for each news media). Then, having of great authority

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-24077 6


Off-page SEO and link building: General strategies and authority transfer in the digital news media

Table 6. Digital news media company 1 Table 7. Digital news media company 2

Media company 1 (MC1) Media company 2 (MC2)

Media type Generalist Media type Generalist

Country Spain Country Spain

Alexa ranking Ranked 252nd in Spain (June 2018) Alexa ranking Ranked 702nd in Spain (June 2018)

Daily unique users 69,782 Daily unique users 20,864

examined these news stories, we were able to identify a Table 8. Digital news media company 3
search pattern for each item, and with this to create what Media company 3 (MC3)
is known as its ‘footprint’: that is, a type of advanced
Media type Generalist
search (Google, 2018) that allows the highly precise selec-
tion of well-characterized web page types. Country Spain

In this way we identified three footprints that allowed us Alexa ranking Ranked 4,763rd in Spain (June 2018)
to locate 50 news items purchased from each of the three Daily unique users 5,979
media companies. Each of these footprints, in the form of
an advanced search equation, is constructed as follows (using the site search operator):
- site: MC1(media company 1) + name of link buying/selling company.
- site: MC2(media company 2) + the word “remitido” (or “press release”).
- site: MC3(media company 3) + name of a news item contributor.
Having obtained the 150 news items (50 for each media company) by applying the respective equations, we were then
able to isolate the following elements by responding to the six questions below, based on recommendations made by
the Native Advertising Institute and the Nieman Reports:
- Is the news item specifically identified as sponsored content?
- Is the story reported newsworthy, that is, is the item directly linked to a breaking news story or current affairs?
- How many hyperlinks are included in each news item?
- Are the hyperlinks coherent with the content of the news item?
- Do the hyperlinks point to an authoritative website providing users with complementary quality information?
- What themes are the commissioned news items included in?

6. Results
Below, we first present our main findings. Next, we re- A number of intermediaries have emer-
view our research objectives and questions in order to ged to act as go-betweens for the websi-
present our conclusions, and we finish with proposals te managers that need backlinks and the
for the development of new lines of research. online news media
6.1. Main findings
From our study of the 150 news items commissioned in the three digital news media companies, the following results
can be highlighted:
- News originating from the purchasing of a link is not clearly identified as sponsored content or advertising.
- The content does not describe or narrate a breaking news story or current affairs, that is, it is not a typical news story,
but rather the content is timeless, generally involving recommendations and advice.
- The need to include the literal anchor text (the text that activates the link) as commissioned by the customer leads to
errors of grammar and syntax in the writing of the content. The reason for this is that the authors opt to respect the
keyword or phrase commissioned by the customer even if it does not fit with the syntax or phrase in which it is embe-
dded.
- When a news item contains more than one link, the need to maintain two or more links in the same item for sites of
distinct natures results in a lack of coherence between the links and the content of the news story.
Tables 9 to 11 show the results for each of the three digital news media companies in greater detail.

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-2407 7


Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

Table 9. Results for MC 1

Are news items identified as sponsored Identification somewhat ambiguous. Items are identified as a Communicado (or news release).
content? The headline is displayed in the following format: “News release: title of the story”.

Is the content newsworthy? No. It is timeless involving recommendations and offering tips.

Of the 50 items analysed:


1 item includes one hyperlink
How many hyperlinks are included in each
8 items include two hyperlinks
item?
4 items include three hyperlinks
37 items include four hyperlinks

Are the hyperlinks coherent with the No. Most are shoehorned into the item; others use a syntactically incorrect generic anchor text.
content? In some of the items with more than one hyperlink, there is no thematic link between them.

Do the hyperlinks point to an authoritative


No. In general, hyperlinks point to websites that are not authoritative and, therefore, do not
website providing users with complemen-
provide noticeable added value for the user-reader.
tary quality information?

What themes are the commissioned news The main themes are business, the home, beauty, tourism, productivity, cars, weddings, fas-
items included under? hion, health, and decoration.

Table 10. Results for MC2

Are news items identified as sponsored Ambiguous. Items are identified with a tag that reads Remitido (or news/press release) followed
content? by the headline.

Is the content newsworthy? No. It is timeless involving recommendations and offering tips.

Of the 50 items analysed:


10 items include one hyperlink
How many hyperlinks are included in each
13 items include two hyperlinks
item?
6 items include three hyperlinks
21 items include four hyperlinks

Are the hyperlinks coherent with the No. Most are shoehorned into the item; others use a syntactically incorrect generic anchor text.
content? In some of the items with more than one hyperlink, there is no thematic link between them.

Do the hyperlinks point to an authoritative


No. In general, hyperlinks point to websites that are not authoritative and, therefore, do not
website providing users with complemen-
provide noticeable added value for the user-reader.
tary quality information?

What themes are the commissioned news The main themes are work, recipes and gastronomy, business, cars, healthy living, gadgets,
items included under? cooking, fashion trends and styles, and fortunes and tarot.

Table 11. Results for MC3

Are news items identified as sponsored


No. Items are presented as another news story, that is, as editorial content.
content?
Is the content newsworthy? No. It tends to be timeless involving recommendations and offering tips.

Of the 50 items analysed:


17 items include one hyperlink
How many hyperlinks are included in each
11 items include two hyperlinks
item?
2 items include three hyperlinks
20 items include four hyperlinks

Are the hyperlinks coherent with the No. Most are shoehorned into the item; others use a syntactically incorrect generic anchor text.
content? In some of the items with more than one hyperlink, there is no thematic link between them

Do the hyperlinks point to an authoritative


No. In general, hyperlinks point to websites that are not authoritative and, therefore, do not
website providing users with complemen-
provide noticeable added value for the user-reader.
tary quality information?

What themes are the commissioned news The main themes are business, virtual spaces, tourism, music, health, cars, investments, holi-
items included under? days, problem pages, and travel.

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-24077 8


Off-page SEO and link building: General strategies and authority transfer in the digital news media

7. Discussion and conclusions


7.1. Discussion
Link building strategies and link buying can greatly benefit both SEO companies in their bid to provide their clients’
websites with greater authority, as well as news media companies as they seek to grow their revenue. However, without
adequate regulation, users stand to suffer, being presented with poor quality content and information that do not res-
pond to criteria of journalistic or editorial feeds, but rather to those of advertising.
Specifically, we have seen that in two of the three Spanish news media companies analysed some attempt is made to
signal a divide between sponsored and editorial content, but such attempts are ultimately ambiguous.
Instead of identifying the content with an unmistakable
Google has had a major influence on
tag indicating sponsorship or advertising, other labels
are employed, such as comunicado (news release) or re- the way in which search engines display
mitido (news/press release). This is better than nothing, their results pages
but it remains nevertheless ambiguous. Comunicado or
remitido are usual journalistic terms for referring to press releases that serve as the basis for perfectly valid editorial
content, which is why these tags must be considered inadequate, albeit that they do represent some attempt on the part
of the publication to indicate their actual content.
In contrast, one of the news media companies does not seek to make any distinction in content, which is a more serious
matter.
In all three cases, readers may well think they are reading editorial content and, therefore, believe that the linked sites
have been selected for their quality when in fact what they are reading is advertising or sponsored content.
Moreover, as their origin is not strictly editorial, the content tends to be largely superficial and to have little or no rela-
tionship with the linked sites.
These two closely related factors have a somewhat negative impact on the quality of the content of the news media
companies. However, it is apparent that if we adhere to the Native Advertising model, the interests of all parties can be
reconciled: Advertisers can obtain authoritative links, the content can be genuinely interesting – while at the same time
being clearly identified as sponsorship – and the media can have a new model of sustainability.
7.2. Conclusions
To present the conclusions, we first go back to the study’s initial objectives to consider how far they have been fulfilled.
Then we do the same with the research questions.
Objectives

Objective 1. To analyse and characterise a new line of activity in the digital news media centred on link buying/selling
and to identify the actors involved.

We have shown that a new model of economic activity has emerged based on link buying/selling and that this activity
is becoming increasingly more commonplace, as demonstrated by our close monitoring of the sector over the last two
years. As a result, the number of news media companies now included on the websites studied here (Prensarank, Unan-
cor, Publisuite and RT Gopress) has experienced constant growth.
We have shown that this line of activity adds value to each party involved –the digital news media, the customers that
buy links and the firms that act as intermediaries in the sales transaction- as it seeks to fulfil three main objectives:
- Providing a new revenue stream, albeit that for the time being it remains a fairly marginal stream for news media
companies.
- Obtaining greater website authority and improving the visibility of the websites that buy backlinks.
- Generating revenue in the form of commissions to the intermediary firms dealing in hyperlinks.

Objective 2. To classify the content published as a result of this activity and its implications for off-page SEO strategies.

We have shown that the sector does not operate a system of self-regulation, since each of the three news media com-
panies analysed applies different criteria. Furthermore, contrary to native advertising, the sponsored content does not
conform to the look, feel and function of the content of the platform on which they appear.
Different degrees of ethical awareness can also be identified, since news media companies 1 and 2 at least go some way
to specifically identifying this content (by labelling items as comunicados or remitidos), while company 3 avoids drawing
any distinction between editorial and sponsored content.

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-2407 9


Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

Objective 3. To provide guidelines for the possible improvement of this activity by developing a set of best practices
modelled on so-called native advertising.

Based on native advertising regulations, an initial proposal of best practices for the writing of news items for link selling
should consider the following guidelines:
- There should be a clear indication that the news story published is sponsored content or advertising – the distinction
being that the latter is provided by the advertiser, the former by the news media company itself.
- The news item should match the look, feel and function of the content of the platform on which it appears.
- The information included in the commissioned news item should be newsworthy or, at least, useful for the reader, and
should be based on current news stories. The news story ought to be written with the user in mind and should not
be motivated solely by the hyperlink that has been purchased. Its features should serve not only the needs of the link
buyers but also those of the readers.
- More than one link can be included in a news story provided there is a thematic connection between them that does
not affect the story’s overall coherence.
- The hyperlinks and their anchor texts must be orthographically and syntactically coherent with the text of the news
item.
- As a rule, if the hyperlinks do not lead to an authoritative website that provides useful, complementary information to
readers, then this link should not be added to editorial content. Instead, these links should be published in a section
dedicated exclusively to sponsored content or advertising and separated from the newspapers’ usual sections.
Research questions
Next, we return to the research questions posed at the outset to examine the responses obtained from the case study
reported above.

Question 1. What are the main characteristics of this new line of activity centred on link buying/selling in the digital
news media and who are its main actors?

We have shown that it is possible to both clearly identify and determine the characteristics of this line of activity in the
news media centred on the acquisition of links and content that act as vectors for these links and content.
It is a business model in which the three main actors, i.e. the digital news media, their customers, and link buying in-
termediaries, all benefit. The news media and the intermediaries obtain an economic return, while the clients obtain
greater web site authority and visibility. The loser in the activity is, however, journalistic quality and, with it, the readers
of the news media.

Question 2. What are the main characteristics of the content published as link vectors?

The analysis shows that the news items identified in this case study present the following characteristics:
- They do not carry clear labels identifying their content as advertising or sponsored.
- They are timeless, focusing primarily on providing advice and basic recommendations on a huge variety of topics ran-
ging from tourism, cooking, and cars, to investments, beauty, and technology, and many others.
- They can include up to four backlinks. These links are often shoehorned into the content, not only because they are
poorly constructed in terms of their semantics but also because they link to websites that do not provide complemen-
tary quality information for their readers.

Question 3. Is it possible to develop a set of best practices for this activity based on native advertising in order to
improve sector practices?

Here, we have taken the concept of native advertising as our reference because it can be considered to provide interes-
ting precedents and, as such, to be a model for future regulations governing paid links in the digital press.
Broadly speaking, the news items in our sample point clearly to the need to develop a set of best practices, preferably so
that the media companies can self-regulate themselves, rather than depend on an external regulator.
Digital news media readers deserve the highest degree of quality and transparency, characteristics that ultimately bene-
fit the news media themselves, especially if we consider the acute crisis they are currently experiencing. It is important
that the media generate additional revenue streams, which is why this line of business should be understood as being
both necessary and timely.
However, the sector’s legitimacy calls for a highly transparent and stringent system of self-regulation and, here, we have
identified some of the essential elements that need to be taken into consideration in developing such a system. The

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-24077 10


Off-page SEO and link building: General strategies and authority transfer in the digital news media

key idea in the process is that the transfer of authority The transfer of authority effected by the
effected by link buying/selling should not negatively im-
link buying/selling should not negatively
pact the content quality or the reading experience of the
news media that participate in this business model. Ad- impact the quality or the reading expe-
ditionally, maximum transparency must be guaranteed rience of the news media involved in this
at all times. business model
8. Future research
More ethical studies need to be undertaken within the digital news media to determine best practices for the selling of
commissioned news items and hyperlinks. In this way it should be possible to reconcile the sector’s legitimate interest
for sponsorship or advertising revenue with the interests of their users who consume news and with their right to recei-
ve quality content which, even if sponsored, should be in line with the general orientation of the news outlet.
Within the field of SEO, analyses could be undertaken of the actual impact of links of this type in terms of improving
the ranking of the websites that receive them. To do this, analytical frameworks need to be designed and employed in
conjunction with such SEO tools as Sistrix, SEMrush, Ahrefs, or Majestic, among others.

9. References
Amazeen, Michelle A.; Muddiman, Ashley R. (2017). “Saving media or trading on trust? The effects of native advertising
on audience perceptions of legacy and online news publishers”. Digital journalism, v. 6, n. 2, pp. 176-195.
https://open.bu.edu/bitstream/handle/2144/27151/Amazeen_Muddiman_2017.pdf?sequence=1
https://doi.org/10.1080/21670811.2017.1293488
Amazeen, Michelle A.; Wojdynski, Bartosz W. (2018). “The effects of disclosure format on native advertising recognition
and audience perceptions of legacy and online news publishers”. Journalism, pp. 1-20.
https://open.bu.edu/handle/2144/27308
https://doi.org/10.1177/1464884918754829
Batsell, Jake (2018). “4 steps to bring ethical clarity to native advertising”. Neiman report, September 23rd.
https://niemanreports.org/articles/4-steps-to-bring-ethical-clarity-to-native-advertising
Booth, Andrew; Papaionnou, Diana; Sutton, Anthea (2012). Systematic approaches to a successful literature review.
London: Sage. ISBN: 978 0 857021359
Brin, Sergey; Page, Lawrence (2000). “The anatomy of a large-scale hypertextual web search engine”. Stanford Univer-
sity.
http://infolab.stanford.edu/~backrub/google.html
Cámaras-León, Nuria (2018). “Linkbuilding 2018, guía de enlazado perfecto (+12 predicciones expertos)”. Unancor, 11th
January.
https://www.unancor.com/blog/guia-linkbuilding
Carlson, Matt (2014). “When news sites go native: Redefining the advertising – editorial divide in response to native
advertising”. Journalism, v. 16, n. 7, pp. 849-865.
https://doi.org/10.1177/1464884914545441
Cooper, Jon (2012). “Link building tactics. The complete list”. Point Blank SEO, April 1st.
http://pointblankseo.com/link-building-strategies
Cramer, Theresa (2016). “The deal with disclosure and the ethics of native advertising”. Digital content text, Sept. 23rd.
https://digitalcontentnext.org/blog/2016/09/06/the-deal-with-disclosure-and-the-ethics-of-native-advertising
Crowe, Anna L. (2017). “Illustrated guide to link building”. Search engine journal.
https://www.searchenginejournal.com/link-building-guide
Dean, Brian (2018). “The definitive guide (2018 update)”. Backlinko, March 11th.
https://backlinko.com/link-building
Ferrer-Conill, Raul (2016). “Camouflaging church as state”. Journalism studies, v. 17, n. 7, pp. 904-914.
https://doi.org/10.1080/1461670X.2016.1165138
Fishkin, Rand (2016). “Targeted link building in 2016 - Whiteboard Friday”. Moz, Jan 29th.
https://moz.com/blog/targeted-link-building-in-2016
García-Carretero, Lucía; Codina, Lluís; Díaz-Noci, Javier; Iglesias-García, Mar (2016). “Herramientas e indicadores SEO:
características y aplicación para análisis de cibermedios”. El profesional de la información, v. 25, n. 3, pp. 497-504.
https://doi.org/10.3145/epi.2016.may.19

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-2407 11


Carlos Lopezosa; Lluís Codina; Carlos Gonzalo-Penela

Giomelakis, Dimitrios; Veglis, Andreas (2015). “Employing search engine optimization techniques in online news arti-
cles”. Studies in media and communication, v. 3, n. 1, pp. 22-33.
https://doi.org/10.11114/smc.v3i1.683
Giomelakis, Dimitrios; Veglis, Andreas (2016). “Investigating search engine optimization factors in media websites. The
case of Greece”. Digital journalism, v. 4, n. 3, pp. 379-400.
https://doi.org/10.1080/21670811.2015.1046992
González-Villa, Juan (2017). “Cómo hacer link building: estrategias y ejemplos prácticos. Useo, 30th March.
https://useo.es/como-hacer-link-building
Gonzalo-Penela, Carlos (2006). “Tipología y análisis de enlaces web: aplicación al estudio de los enlaces fraudulentos y
de las granjas de enlaces”. BiD: textos universitaris de biblioteconomia i documentación, n. 16.
http://bid.ub.edu/16gonza2.htm
Gonzalo-Penela, Carlos; Codina, Lluís; Rovira, Cristòfol (2015). “Recuperación de información centrada en el usuario y
SEO: categorización y determinación de las intenciones de búsqueda en la Web”. Index comunicación, v. 5, n. 3, pp. 19-27.
http://journals.sfu.ca/indexcomunicacion/index.php/indexcomunicacion/article/view/197/175
Google (2018). Google guide making searches even easier. Search operators.
http://www.googleguide.com/advanced_operators_reference.html
Harry, David (2013). “How search engines rank web pages”. Search engine watch, Sept. 23rd.
https://searchenginewatch.com/sew/news/2064539/how-search-engines-rank-web-pages
Hart, Chris (2008). Doing a literature review: Releasing the social science research imagination. London: Sage. ISBN: 978
0 761959755
Kleinberg, Jon M. (1998). “Authoritative sources in a hyperlinked environment”. In: Procs. of the ACM-SIAM Symposium
on discrete algorithms, pp. 1-33.
https://www.cs.cornell.edu/home/kleinber/auth.pdf
Lewandowski, Dirk (2012). “A framework for evaluating the retrieval effectiveness of search engines”. In: Jouis, Chris-
tophe; Biskri, Ismail; Ganascia, Jean-Gabriel; Roux, Magali. Next generation search engine: Advanced models for infor-
mation retrieval. Hershey, PA: IGI Global, pp. 456-479. ISBN: 978 1 466603318
https://arxiv.org/pdf/1511.05817.pdf
Li, You (2017). “Contest over authority”. Journalism studies, pp. 1-19.
https://doi.org/10.1080/1461670X.2017.1397531
Mathiasen, Stine F. (2018). “10 quick takeaways from native advertising days 2018”. Native Advertising Institute, Sept.
23th.
https://nativeadvertisinginstitute.com/blog/takeaways-native-advertising-days-2018
Monterde, Nacho (2016). “Introducción al link building”. SEO azul, March 4th.
https://www.seoazul.com/introduccion-al-link-building
Pollitt, Chad (2018). The global guide to technology 2018. A resource for marketers, advertisers, media buyers, commu-
nicators, publishers and ad tech professionals.
https://nativeadvertisinginstitute.com
Publisuites (2018). “Estudio del uso de linkbuilding”. Publisuite, 15th March.
https://www.publisuites.com/blog/estudio-de-linkbuilding-publisuites
Rowe, Kevin (2018). “How link building will change in 2018”. Search engine journal, Feb. 2nd.
https://www.searchenginejournal.com/how-link-building-will-change/231707
Schauster, Erin E.; Ferrucci, Patrick; Neill, Marlene S. (2016). “Native advertising is the new journalism: How deception
affects social responsibility”. American behavioral scientist, v. 60, n. 12, pp. 1408-1424.
https://doi.org/10.1177/0002764216660135
Serrano-Cobos, Jorge (2015). “SEO: Introducción a la disciplina del posicionamiento en buscadores”. Colección EPI Scho-
lar. Barcelona: Editorial UOC. ISBN: 978 84 9064 956 5
Sweetser, Kaye D.; Joo, Sun; Golan, Guy J.; Hochman, Asaf (2016). “Native advertising as a new public relations tactic”.
American behavioral scientist, v. 60, n. 12, pp. 1442-1457.
https://doi.org/10.1177/0002764216660138
Thelwall, Mike (2004). Link analysis: An information science approach. Amsterdam: Elsevier. ISBN: 978 0 12 088553 4
Yin, Rober K. (2014). Case study research. Design and methods. Canada: SAGE. ISBN: 978 1 452242569

e280107 El profesional de la información, 2019, v. 28, n. 1. eISSN: 1699-24077 12


INCREASING TRAFFIC WHAT’S INSIDE

TO YOUR WEBSITE Key Concepts.......................................1


How Search Engines Work..............2
THROUGH SEARCH ENGINE Getting Started.....................................2

OPTIMIZATION TECHNIQUES Planning..........................................2


1. Questions to Ask Yourself...........2

Small businesses that want to learn how to attract more 2. Setting SEO Objectives
and Goals....................................2
customers to their website through marketing strategies such 3. Do-It-Yourself Options................3
as search engine optimization will find this booklet useful. You 4. Choosing an SEO Specialist
may want to read this booklet in conjunction with other booklets to Work With................................3
in this series such as Successful Online Display Advertising and 5. Understanding Best Practices,
Pitfalls and Barriers...................4
Social Media for Small Business.
Implementing SEO...............................6
1. Keywords....................................6
Key Concepts 2. Finding the Right Keywords.......7
3. Search Engine
Search engine optimization (SEO) involves designing, writing, and coding a Optimization Techniques............7
website in a way that helps to improve the volume and quality of traffic to
4. Link Popularity – A Key
your website from people using search engines. These “free,” “organic,” or
Factor for Increasing a
“natural” rankings on the various search engines can be influenced, but not Website’s Page Ranking.............8
controlled, by effective SEO techniques. Websites that have higher rankings
5. Keyword Conversion...................9
(i.e. presented higher in the search results) are identified to a larger number
of people who will then visit those sites. Test, Measure, Test Again....................9
1. Webmaster Tools........................9
The majority of web traffic is driven by major search engines, including
2. Tracking Your Progress/
Google, Bing, YouTube, AOL, Yahoo, Duck Duck Go, Ask Jeeves and other Website Analytics......................10
country-specific ones (e.g. Baidu in China).
Future of Search
Engine Optimization...........................10

Related Topics Covered


in Other Booklets...............................10

Glossary of Terms..............................11

Disclaimer: This booklet is intended for informational purposes only and does not constitute legal, technical, business or other
advice and should not be relied on as such. Please consult a lawyer or other professional advisor if you have any questions
related to the topics discussed in the booklet. The Ontario Government does not endorse any commercial product, process
or service referenced in this booklet, or its producer or provider. The Ontario Government also does not make any express or
implied warranties, or assumes any legal liability for the accuracy, completeness, timeliness or usefulness of any information
contained in this booklet, including web-links to other servers. All URLs mentioned in this document will link to an external website.
2

How Search Engines Work 3. Who are my competitors?

Search engines have four functions—crawling, building 4. Who are my allies and associates to help influence
an index, calculating relevancy and rankings, and serving my optimization results?
results. They scour your website and, for each page, index
all of the text they can pick up, as well as a great deal 5. What is the budget and time I can allocate to
of other data about that page’s relation to other pages, optimizing my website initially and going forward?
and in some cases all or a portion of the media available
on the page as well. Search engines index all of this 2. Setting SEO Objectives and Goals
information so that they can run search queries efficiently.
The ultimate goal of search engine optimization is to
Search engines create these databases by performing boost your revenue by driving traffic to your website.
periodic crawls of the Internet. They must weigh the value However, there are other important objectives:
of each page and the value of the words that appear on it.
• To establish you as an expert in your field. Visibility
Search engines employ secret algorithms (mathematical
in search engines creates an implied endorsement
formulas) to determine the value they place on such
effect where searchers associate quality, relevance
elements as inbound and outbound links, density of
and trustworthiness with sites that rank highly for
keywords and placement of words within the site
their queries.
structure, all of which may affect your SEO ranking.
Search engines have difficulty indexing multimedia, but • To enhance product awareness. It is better to have an
there are workarounds which will be discussed later in image or video displayed as opposed to just text since
the booklet. it will attract more attention.
The newest trend in search engines, and likely the future • To increase sales leads. The goal is to drive the
of search in general, is to move away from keyword-based right traffic to your site by encouraging people to
searches to concept-based personalized searches. When provide qualified contact information for future
a person clicks on certain search results, search engines relationship building.
like Google, Bing and others record this information to
collect trends of interest and then will personalize the • To reduce cost per order. Free search engine traffic
search results based on specific interests. This is still a will help you reduce the cost of advertising compared
developing field, but appears to have good potential in to other media channels.
making searches more relevant.
• To encourage repeat visitors. Optimized pages help
The following pages outline SEO techniques that will help customers find additional products or services more
you to draw more visitors to your website. easily and quickly after they have purchased from you,
thus improving customer support and service.

• To qualify visitors. Search can help you understand the


Getting Started stage your buyer is at—just beginning or further along.
Getting a clear picture of searcher and visitor intent
Planning helps you to adjust your site to accommodate
their needs.
1. Questions to Ask Yourself
When planning your search engine optimization strategy,
1. Who am I trying to attract to my website? Who is my bear in mind that, although many site owners seek a #1
target audience? ranking, search engines want the #1 result to be the one
that offers the most value to users. Unless your website
2. Do I have the resources and knowledge-base in-house
offers this, it may be unrealistic for you to expect to get to
to optimize my site or should I use the services of an
that position. Be realistic when setting your objectives.
SEO specialist?
3

3. Do-It-Yourself Options • Allocate appropriate time for continuous efforts.

Many small business owners hire the services of an SEO • Have the appropriate metrics in place for
expert to optimize their websites. If you would prefer to do review and analysis so you can continue to
it yourself, here are some questions to ask yourself first: improve your positioning.

1. Do I really have the time it takes to optimize my


website? Entrepreneurs wear many hats, and there
4. Choosing an SEO Specialist
may be too many other business priorities that take to Work With
precedence over a website project. Because SEO is such a specialized and ever-changing
world, you should consider working with an expert.
2. Will doing my own SEO save money? Time spent
The best time to hire an SEO specialist is either at the
doing one job is time away from doing another. Could
creation of a new website or the major redesign of an
focusing on web development and SEO instead of
existing one. This will ensure that you implement the
business planning, building customer relationships
necessary steps to make your site optimized right from
and handling staff issues end up costing more money
the start.
in the long run?
Here is a list of key questions to ask to help in choosing
3. How strong are my web development skills? Be
an SEO specialist:
honest with yourself. SEO requires specific technical
and marketing expertise. And marketing is an
ongoing process. Questions to Ask a Potential SEO Specialist
1. What are your most important SEO techniques?
If you are satisfied with the answers and your
preference is still the do-it-yourself approach, 2. Can you show me examples of your previous work
here are some guidelines: and share some success stories?
• Define your target audience and brainstorm keyword 3. Do you offer any online marketing services or advice
phrases that are applicable to them. to complement your SEO business?
• Identify your competitors and conduct a competitive 4. Do you follow the Google Webmaster Guidelines?
analysis using tools like compete.com or wordtracker.com,
or review their websites. 5. What kind of results do you expect to see, and in what
• Conduct formal or informal surveys of your timeframe? How do you measure your success?
customers and prospects or set up Google 6. What is your experience in my industry?
www.google.com/webmasters/tools/ or Bing
Webmaster www.bing.com/toolbox/webmaster/ 7. What is your experience in my country/city?
to see what keywords people are searching for
in your field. 8. How long have you been in business?

• Write appropriate content, utilizing keyword best 9. How can I expect to communicate with you? Will
practices (see below). you share with me all the changes you make to my
• Understand what is entailed to optimize a website. site, and provide detailed information about your
Know who to ask to implement your ideas, or have recommendations and the rationale behind them?
the appropriate resources (software) or knowledge 10. How do you charge for your services? Is this a
base to do it yourself. one-time fee or an ongoing contract? What are
the deliverables if it is ongoing?
4

Types of Services Most SEO Specialists Provide • Keyword and competitive research
• Review of your site content and/or structure • SEO training
• Advice on technical aspects of SEO and their
impact on your website
• Content development and/or editing CAUTION: Be wary of emails that may appear
legitimate but are often spammers offering SEO
• Management of online business
services and claiming they can “Guarantee #1 Ranking”.
development campaigns

5. Understanding Best Practices, Pitfalls and Barriers

SEO BEST PRACTICES

Category Actions
Keyword Search • Find the words and phrases your customers use rather than industry jargon.
• Look for synonyms.
• Reflect the answers to viewer questions in your keywords and in the content
of your pages.
• Move away from thinking of keywords as data. Imagine instead the person
who will be typing in that keyword and what they are searching for.
Quality Content • Content is still “king” and search engines are looking for keyword phrases
surrounded by semantic phrasing that supports the overall theme of that
section. Try to incorporate synonyms.
SEO Local (Optimizing your site • Optimize your website to acquire foot traffic or local area interest. Add
to attract local business) service location addresses and/or city listings and include good local links
to your pages.
• Choose the proper categories in Google Places.
• Make sure Google can recognize that your website and your Google Places
page are associated and linked.
• Ask users for citations and reviews which will link back to you.
Social SEO (Optimizing your • Use social cues such as Twitter shares, Facebook likes and social bookmarking
site for social networks and which heavily influence search rankings. For more optimization suggestions such
social media) as tagging and keyword titles, see the Social Media for Small Business booklet.
Guest SEO • Be a guest blogger on other related blogs.
• Consider targeting blogs that aren’t direct matches to your industry to get
a leg up on your competition. For more about blogging, see the Blogs for
Small Business booklet.
continued on next page
5

Category Actions
Link Building • Use reciprocal link exchange moderately. Instead, let link building happen
naturally through people retweeting and passing on your good content
and articles.
• Ensure quality of links rather than quantity. The higher the quality of links,
the more trust and authority will be established.
• Use targeted keywords in anchor text.
• When you do a link exchange, don’t always link back to your home page.
Instead, provide a link to the most relevant section of your site that relates
to the anchor link – e.g. “what you need to know about gluten-free products”
should link to the page about gluten-free products.
• Ensure any out of date or old web pages are redirected to the relevant new
pages through a 301 redirect.
Technical Considerations • Keyword density (how often keyword phrases are used in comparison to total
number of words on that page) of over 10% is considered suspicious and does
not look like naturally written text. Aim for 3–7% of total words per page to be
the keywords you are trying to optimize for.
• Have a short title tag (6 or 7 words at most), with the most important keywords
near the beginning and used only once.
• Avoid cumbersome URLs. Instead, create user-friendly URLs for easy accessibility
by viewers and search engines – e.g. change www.mysite.com/recipes.php?object=1
&type=2&kind=3&node=5&arg=6 to www.mysite.com/recipes/apples.
• Submit an XML site map to the search engines after making any major
structural changes to your site.

SEO PITFALLS/BARRIERS

Category Actions
Marketing • When choosing keyword phrases, avoid using the word “free” unless you’re
offering something unconditionally.
• Be careful adjusting page titles without altering page content. Google may view
it unfavourably.
• Avoid overly aggressive or manipulative SEO techniques, such as loading too
many keywords in the website’s content, which might get your site excluded
from a search engine.
• Exclude linking affiliates that are not relevant as this could negatively affect
your positioning.
continued on next page
6

Category Actions
Technical • In structural design, do not use frame set up (check with your developer).
• Avoid free hosting since it is difficult for search engines to wade through
all the data when there are many sites sharing free hosting.
• Watch for and remove any broken links (URL links that no longer work and
are not accessible).
• Do not use FLASH, videos or images without alternative tags (alt tags),
transcripts or synopsis content.
• Do not try to optimize by using an excessive number of keywords, especially
unrelated ones, as this will affect the performance of all your keywords.
• Avoid artificially inflated keyword density (over 10%) or you will risk getting
banned from search engines.
• Keep the number of links on a page to a reasonable number since Google
does not like pages that consist mainly of links.
• Avoid writing text that is the same colour as page background since you
will be penalized by Google for this practice.

Implementing SEO • Other tenses (as with “develop”, “developed”,


“developing”)

1. Keywords • Different spellings (alternate spellings, British/


American spellings, and, yes, misspellings)
Knowing what keywords your target audience will use to
find the products or services that you offer is the most • Turning verbs into nouns (as in “select” >>
critical consideration in search engine optimization. “selection”, “selector”)
Put yourself in the shoes of your prospects and start by
• Other prefixes and suffixes (e.g. prefix “pre/selling”;
brainstorming all the keyword phrases they might use,
or suffix “find/able” or “suggest/ive”)
both single words and up to four or five keyword phrases
(called long tail keywords). • Any associated acronyms (e.g. “WAN” as well as
“Wide Area Network”)
An analysis of search behaviour typically shows that, when
people use a one word or two word search phrase, they are You will also want to think in terms of expanding a
at the beginning stages of their research, whereas use of search term by:
a four or five plus keyword phrase reveals they are close
to the “buying” stage (e.g. “Car sales” vs. “Mercedes 350 • Adding other words to the beginning or end.
SL vintage”). Common ones searchers use are “how to”,
“how to do”, “how do I”, “how can I”, or “where
To assist in analyzing keywords, you can also can I find”;
use online tools like Google Ad Words Selector
Tool https://adwords.google.com, Wordtracker • Breaking it apart with other words or rearranging
https://freekeywords.wordtracker.com, SEOBook keywords (as in “strawberry rhubarb pie” vs.
Keyword Tool http://tools.seobook.com/keyword-tools/ “strawberry pie and rhubarb pie” and “the 2012 NHL
seobook/ and Wordstream Keyword Niche Tool playoffs” vs “the playoffs in the 2012 NHL season”).
www.wordstream.com/keyword-niche-finder. When developing your content sections and body text,
After brainstorming keyword phrases, look for creative consider the following:
ways like these to expand your keywords: • Group like-minded keyword phrases and divide
• Plural vs. singular (“examples” vs. “example”) your content areas to determine your site map
navigation layout.
• “Double” words (like “everyday” and “every day “)
7

• Prioritize money-making keyword phrases (those that 5. Ensure that the content on your website is high quality
lead to a sale, as identified by examining your metrics) and is consistent with your keywords.
and make sure you position them within your page
so that search engines and your audience can easily 6. Check out various online resources (some are free)
identify them. that will help you assess the relevance of your keywords
by typing “keyword tools” into a search engine.
• Use keyword phrases in your web address that
best describe the page content – e.g. http://www. 3. Search Engine
yourdomainname.com/choosing-keyword-phrases.html.
Optimization Techniques
• For each page, choose a title (60 characters), SEO involves a wide range of techniques, some of which
description (150 characters) and Meta keywords you may be able to do yourself and others that will require
reflecting the theme and content of the page, web development expertise. Techniques include increasing
with each word separated by a comma. the number of links from other websites to your web
pages, editing the content of the website, reorganizing
• Placement of keywords is very important. Think
the structure of your website, and coding changes. It also
backwards, and put the “result/ conclusion” at the
involves addressing problems that might prevent search
beginning, thus keeping priority keywords “above
engines from fully “crawling” a website.
the fold” (i.e. closest to the top of the page). When
indexing your site, search engines move their robots
from top to bottom, left to right, so the placement of On Page and Off Page Optimization Techniques
your keywords should be optimized strategically in for Increasing Traffic to Your Website
order to be picked up by the engines.
On Page Optimization Techniques
• Avoid filling the top part of your site with large images
or image navigation (text that is actually an image) On page optimization includes those techniques that can
as you will miss the opportunity to have keywords be done on the pages of a website. On page optimization
captured for indexing. relates to those things that are within your control – i.e. the
content of your website. On page optimization techniques
help the search engine crawlers read the website content.
2. Finding the Right Keywords A readable site helps to show quality and will result in
higher ranked web pages.
Steps for finding the right keywords
(or search terms) Review the following with your website developer to
1. Determine what your existing and potential customers ensure that all these items have been considered:
might be looking for. Ask your current customers On Page SEO Checklist
what terms they would use to find the products and
services offered by your business. • Always start with keyword selection, research
and testing.
2. Brainstorm a list of keywords that are related to
your business. • Have a Meta description tag for each page.
• Create ALT tags for all images.
3. Check out the competition to see how many other
websites are listed in search engines (particularly • Place a keyword phrase in H1, H2, H3, H4 tags and
Google) for that keyword. You can use tools such in URL structure (domain name and your pages).
as Google Page Rank—which goes from 1 to 10
• Develop an internal linking strategy.
(10 being the best ranked)—to see which websites
hold the top positions. • Have relevant, keyword-rich content.

4. Select the best keywords—the ones most suited to • Follow keyword density rules.
your business and target audience. • Create and submit site maps, both XML and user facing.
8

• Design for usability and accessibility. 3. Host awards or certifications within your industry,
create an award ‘badge’ and then have it link back to
• Track target and converting keywords.
your site.

Off Page Optimization Techniques 4. Utilize social media and search strategies:
Off page optimization includes those techniques that can Twitter. This can be very effective in
be done outside your website to increase traffic to your capturing attention.
website. The various free off page optimization techniques
(also known as free traffic sources) that you can use to • Follow thought leaders within your industry.
drive traffic to your website and increase its ranking level • Make a note of what they like to tweet about.
in major search engines include link swaps, blogging, social
networks, white papers, infographics and forum postings. • Check their personal websites for more info.
• Look at what kind of content they retweet.
Additional guidelines to consider:
• Retweet their content.
• Include the keyword in your domain name when
registering both your main website and any micro-sites. • Interact with them constructively.
• Ask for their opinion.
• Include the keyword when naming your digital
resources such as a white paper, video and images Blog commenting. Find high-quality blogs written by
(e.g. Rather than naming your image, “image-1.jpg”, people with whom you are seeking links and provide
include a descriptive name like “weather-satellite.jpg” constructive, useful comments. This can prompt them to
or “weather-satellite-tips.pdf”). click through to your website and create a relationship
building opportunity where they can either become
• Ensure targeted anchor text is keyword rich so that
influencers willing to share your content and/or invite
when people copy blog posts or other content, that
you to become a guest blogger with your byline
linked anchor text will come with it and generate
pointing back to your site. All this contributes to gaining
more links back to you.
“inbound” links, which help your overall link popularity.

4. Link Popularity – A Key Factor for Profiling. Complete your profiles in the social circles
and map listings.This improves the likelihood for
Increasing a Website’s Page Ranking
your business to be more visible to search engines.
One of the most critical ways to improve your website’s
ranking in the search engine results pages is improving Keyword tags. Use these in content posts and post
the number and quality of websites that link to your site. titles to reinforce the specific keywords that your
Google PageRank is a system for ranking web pages audience is searching for.
used by the Google search engine. PageRank assesses
the extent and quality of web pages that link to your web RSS feeds. Constantly making fresh content available,
pages. Because Google is currently one of the most combined with “pinging” and natural share among
popular search engines worldwide, the ranking of your followers, shows search engines how popular your
web pages received on Google PageRank will affect the “brand” is.
number of visitors to your website. Techniques to 5. Build a free tool that links back to your site. If it is
improve your page rank are discussed below. useful, people will want to download it and possibly
share it. Include a byline that brings them back to
Link Building Techniques your site.
1. Send out product samples and ask for reviews, then 6. Build supplementary micro-sites and link back to
ask reviewers to link to your site. your main site.
2. Participate as a guest blogger or guest newsletter
contributor, with a link back to your site.
9

More about Increasing Link Popularity


This table highlights some tips for increasing link popularity. For best results, ensure that links to your website
are relevant.

Tips to Increase Link Popularity Description


Distribute Articles/Press Releases to Build inbound links by distributing articles/press releases through
Other Websites websites and article directories. Make sure that the articles contain
a link back to your website.

Participate in Link Campaigns Businesses can ask partners (e.g. suppliers), other businesses,
professional organizations, chambers of commerce and customers to
add links from their websites. Be prepared to link back to their sites.
The more relevant the partner, the better.
Create Interesting Content Provide interesting information that users may find useful. Offer tips to
benefit users (e.g. Pitfalls to Avoid When Hiring Contractors). Create
tools that people will use (e.g. Product Quality Checklists). Create lists
(people love Top 10 lists). Submit to article directories, press release
websites, and/or guest blog.
Give Testimonials Many would think it is important to get testimonials, but it is equally
important to give testimonials, as long as they are authentic and well
deserved. Ask to have your link come back to your site.

Turn Raw Numbers into a Data Story Be creative in taking your numbers and crafting an interesting story.
When you are the resource reference, then all others who use this story
will link back to you.

5. Keyword Conversion engines find from indexing your site. You can also see
broken links and the pages from where they originate
Site owners used to be satisfied simply to attract traffic in order to fix them quickly.
to their sites; now they want to know what is working,
such as which keywords most often lead to a sale. Google Webmaster tools can be found at www.google.com/
If you’re trying to sell a product or service through webmasters/tools. You require a Google account to set up
your website, knowing the keywords that lead to this the webmaster account. You will then need to follow
conversion at a high rate is an enormously valuable Google’s instructions to verify your website. Verified site
marketing asset. With this knowledge, you can adjust owners can see information about how Google crawls,
your site content accordingly. The following reference indexes and ranks your site. One of the key indicators
link will help you with conversion tracking: that can be gathered from this tool is an understanding
http://support.google.com/adwords/bin/ of the keywords that visitors to your site are using, and
answer.py?hl=en&answer=1722022. then comparing them to the keywords that Google finds
on your site. You can modify your content to reduce the
“Bounce Rate”, which occurs when someone enters
your site, doesn’t find what they like and leaves before
Test, Measure, Test Again proceeding to any other part of the site.

1. Webmaster Tools Yahoo and Bing have now merged together to offer
The three major search engines, Google, Bing and Yahoo, Bing Webmaster Tools. Getting started involves setting
have webmaster tools that you as a site owner can sign up a Bing webmaster tools account at www.bing.com/
up for at no cost to manage your website statistics, webmaster/WebmasterManageSitesPage.aspx, then
submit your content and site map and view diagnostic validating your website, creating and uploading a
errors, malware or other concerns that the search sitemap and developing a search optimization plan.
You should sign up for both.
10

2. Tracking Your Progress/ Future of Search Engine


Website Analytics
Optimization
You should regularly check your search results on the
• Social rules. Expect to see links with long term
main search engines. Type in keywords or phrases that
longevity on social networks to rank higher and both
describe your business to find out how it ranks in the
web page links and page rank to matter less and less.
search engine results pages. You can also use various
online tools (e.g. Web Trends webtrends.com, • Personal SEO results. With users offering personal
One Stat www.onestat.com/ and Google Analytics information to profiling sites like Facebook, Twitter,
www.google.com/analytics) to check your ranking against Google Places, etc. and search engines able to track
that of your competitors. Key performance indicators personal preferences via user clicks, search results
(KPIs) to monitor include the number of landing pages and advertisements can now be customized to suit
you have, the bounce rate of those pages and the number individual preferences. We will see more of this
of keywords driving traffic to each of those landing pages. customization as technologies become more fine-tuned
to collect this kind of information and users become
You can look at the following metrics to track your
more willing to provide personal data.
optimization progress and help you make decisions to
improve your website and your marketing strategies: • Quality vs. quantity. With the introduction of Google
Panda (one of Google’s ranking search engine filters
Metrics Function that aims to lower the rank of low-quality sites), review
and clean up of your site must be ongoing to ensure
Page views Web pages that attract the most
your pages offer quality content, not just quantity.
visitors. You will also see web pages
Make sure you keep content relevant and fresh.
that are not performing for you.
Referrers The number of inbound links or • Mobile search will continue to grow. Searching using
back links. voice and tablet technologies will grow in importance.
With iPhone Siri and Google’s own voice system, it will
Where are visitors coming from (e.g.
not be long before searching using voice will become
via a search engine or another link)?
an everyday occurrence.
Bounces The number of clicks on links that lead
away from your website.
Traffic
reports
The number of visitors, new
and returning. Related Topics Covered
Entrance What words are people using to in Other Booklets
keywords find your site? What pages are they • Social Media for Small Business
landing on?
• Blogs for Small Business
Exit pages What pages are people leaving the site
from? Are they not finding what they • E-Commerce: Purchasing and Selling Online
initially thought they were looking for? • Integrating Mobile With Your Marketing Strategy
Google Page This measurement is what Google uses
Rank to determine how popular your site is. To view or download these booklets visit
It measures how important those sites Ontario.ca/ebusiness.
linking to yours are in relation to your
site. It’s not about quantity, but quality.
11

Glossary of Terms Link building: The process of gaining links to other


websites that link to your website.
301 Redirect: Method of letting web browsers and
search engines know that a web page or site has been Link popularity: The number and quality of links that
permanently moved to a new page or site. point to your website (i.e. back links). The number,
quality and credibility of these links can influence
Algorithm: A mathematical formula providing a set your page rank.
of instructions for completing a task. Crawler-based
search engines use a set of instructions to index and Link swap: An exchange where site owners agree to
rank websites. mutually link to each other’s sites.

Anchor text: Words used in the link text. It is best to Malware: Malicious software that can destroy a
stay away from words like [Read More], [See More], and computer. Common examples of malware include
use a more keyword-rich phrase pointing to where the viruses, Trojans, worms and spyware.
link is directed – e.g. “Discover Niagara’s Hotel Package
Meta tags: Keywords, description and content describing
Deals today”.
your website that is contained in the section of HTML
Backlinks: Links from external sites that connect to coding and is not visible on your website.
your website. They are also referred to as inbound links.
Outbound links: Links from your website to
Blogs: A blog (short for weblog) is an online journal. Most other websites.
blogs have an open format that allows any Internet user
Reciprocal links: Exchange of links between websites.
to post entries (comments, questions) to other bloggers.
Blog discussions are usually organized according to Referrers: Sites that suggest your site through links
certain themes or topics. coming from their website, blog, email, directory, tool, etc.
Bounces/Bounce Rate: Bounce rate is the percentage of Search Engine Results Page (SERP): The pages that
visitors that visit one page on your website then exit the result from a search engine query run by a user. You can
site before visiting another page. run a search using certain keywords to assess where
your web pages are ranking.
Crawlers: (also called spiders, robots, or bots).
A program which searches or browses the Web Social media optimization (SMO): Using social media
in a logical, automated manner. Search engines activity to attract visitors to websites by using methods
use crawlers to find up-to-date information. such as adding social media features (e.g. RSS feeds,
sharing buttons) to the website content and doing
Frames: (Frame set-up). A browser display area
promotional activities like blogging, participating in
(web page) is divided into two or more sections (frames).
discussion groups and updating social networking profiles.
The contents of each frame are taken from a different
web page. Submission: The process of submitting a website to
search engines so they are aware of the website and
Google PageRank: A rough indication of the popularity
can crawl it.
and importance of sites that point to your page. A higher
PageRank indicates a more popular page.

HTML (Hyper-Text Mark-up Language): A programming


language used to create sites and pages on the web. This
is the primary language of websites.
For more information contact:

Telephone: (416) 314-2526


Facsimile: (416) 325-6538
E-mail: E-Business@ontario.ca

This publication is part of an E-Business Toolkit which includes a


series of booklets on advanced e-business topics and an introductory
handbook How You Can Profit from E-Business. The entire Toolkit is
available at ontario.ca/ebusiness.

© Queen’s Printer for Ontario, 2013


Understanding User –Web
Interactions via Web Analytics
iii

Synthesis Lectures on Information


Concepts, Retrieval, and Services

Editor
Gary Marchionini, University of North Carolina, Chapel Hill

Understanding User – Web Interactions via Web Analytics


Bernard J. ( Jim) Jansen
2009

XML Retrieval
Mounia Lalmas
2009

Faceted Search
Daniel Tunkelang
2009

Introduction to Webometrics: Quantitative web research for the social sciences


Michael Thelwall
2009

Automated Metadata in Multimedia Information Systems: Creation, Refinement,


Use in Surrogates, and Evaluation
Michael G. Christel
2009

Exploratory Search: Beyond the Query-Response Paradigm


Ryen W. White and Resa A. Roth
2009

New Concepts in Digital Reference


R. David Lankes
2009
Copyright © 2009 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.

Understanding User–Web Interactions via Web Analytics


Bernard J. ( Jim) Jansen
www.morganclaypool.com

ISBN: 9781598298512 paperback

ISBN: 9781598298529 ebook

DOI: 10.2200/S00191ED1V01Y200904ICR006

A Publication in the Morgan & Claypool Publishers series

SYNTHESIS LECTURES ON INFORMATION CONCEPTS, RETRIEVAL, AND SERVICES #6

Series Editor: Gary Marchionini, University of North Carolina, Chapel Hill

Series ISSN
ISSN 1947-945X print
ISSN 1947-9468 electronic
Understanding User –Web
Interactions via Web Analytics
Bernard J. ( Jim) Jansen
Pennsylvania State University

SYNTHESIS LECTURES ON INFORMATION CONCEPTS, RETRIEVAL, AND


SERVICES #6
vi

ABSTRACT
This lecture presents an overview of the Web analytics process, with a focus on providing insight
and actionable outcomes from collecting and analyzing Internet data. The lecture first provides
an overview of Web analytics, providing in essence, a condensed version of the entire lecture. The
lecture then outlines the theoretical and methodological foundations of Web analytics in order to
make obvious the strengths and shortcomings of Web analytics as an approach. These foundational
elements include the psychological basis in behaviorism and methodological underpinning of trace
data as an empirical method. These foundational elements are illuminated further through a brief
history of Web analytics from the original transaction log studies in the 1960s through the informa-
tion science investigations of library systems to the focus on Websites, systems, and applications.
Following a discussion of on-going interaction data within the clickstream created using log files
and page tagging for analytics of Website and search logs, the lecture then presents a Web analytic
process to convert these basic data to meaningful key performance indicators in order to measure
likely converts that are tailored to the organizational goals or potential opportunities. Supplemen-
tary data collection techniques are addressed, including surveys and laboratory studies. The overall
goal of this lecture is to provide implementable information and a methodology for understanding
Web analytics in order to improve Web systems, increase customer satisfaction, and target revenue
through effective analysis of user–Website interactions.

KEYWORDS
Web analytics, search log analysis, transaction log analysis, transaction logs, log file, query logs,
key performance indicators, query log analysis, Web search research, Webometrics
vii

Preface

I have based this lecture on my research and practical work in the Web analytics area, along with the
work of many others. One advantage of sustained work over time in a given field is the ability to go
back and correct, modify, expand, and improve, with any luck, previous efforts, documentations, and
writings. Additionally, sustained contribution to a body of knowledge in an area often leads one to
works by and interchanges with other researchers, practitioners, and scholars who enhance and add
to the field. This lecture is the outcome of this continual and reiterative learning process. I hope the
content within jumpstarts the learning process of others in the field of Web analytics.
My goal with this lecture is to present the conceptual aspects of Web analytics, relating these
facets to processes and concepts that address the pertinence of Web analytics, and provide meaning
to the techniques used in Web analytics. To that end, each section of the lecture represents a major
component of the Web analytics field. While many sections are built on previous publications, I
have enhanced each with updated thoughts and directions in the field, drawing from my own in-
sights as well as those of others who are pushing and defining the field of Web analytics, including
both academics and practitioners. Collectively, the sections offer an integrated and coherent primer
of the exciting field of Web analytics.
This is not necessarily a “how-to” book. The field of Web analytics is dynamic, and any
“how-to” book would likely be out of date before it could be published. Instead, in this lecture, I
offer foundational elements that are more enduring than implementation techniques could be. As
such, the lecture may have some enduring value.
ix

Acknowledgments

I sincerely acknowledge and thank the many collaborators with whom I have worked with in the
Web analytics area. I am appreciative of suggestions of material from Eric Peterson and Mark
Ruzomberka, both excellent pracitioners in the art and science of Web analytics. I am also thank-
ful to the reviewers of this manuscript for their valuable input, specifically Dietmar Wolfram and
Fabrizio Silvestri, both excellent researchers in the field of Web log analysis. Finally, I am indebted
to Diane Cerra and Gary Marchionini for their patience and assistance.
xi

Contents

Preface ......................................................................................................................vii

Acknowledgments....................................................................................................... ix

1. Understanding Web Analytics ..............................................................................1

2. The Foundations of Web Analytics: Theory and Methods .....................................5


2.1 Introduction......................................................................................................... 7
2.2 Behavorism .......................................................................................................... 7
2.3 Behaviors ............................................................................................................. 9
2.4 Trace Data ......................................................................................................... 12
2.5 Unobtrusive Methods ........................................................................................ 16
2.6 Web Analytics as Unobtrusive Method ............................................................. 18
2.7 Conclusion......................................................................................................... 19

3. The History of Web Analytics ............................................................................ 21


3.1 Single Websites.................................................................................................. 21
3.2 Library Systems ................................................................................................. 22
3.3 Search Engines .................................................................................................. 22
3.4 Conclusion......................................................................................................... 24

4. Data Collection for Web Analytics ..................................................................... 25


4.1 Web Server Log Files ........................................................................................ 25
4.2 Page Tagging ..................................................................................................... 27
4.3 Conclusion......................................................................................................... 28

5. Web Analytics Fundamentals ............................................................................. 29


5.1 Visitor Type ....................................................................................................... 29
5.2 Visit Length ...................................................................................................... 31
5.3 Demographic and System Statistics .................................................................. 31
xii UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

5.4 Internal Search .................................................................................................. 32


5.5 Visitor Path ....................................................................................................... 32
5.6 Top Pages .......................................................................................................... 33
5.7 Referrers and Keyword Analysis ........................................................................ 33
5.8 Errors................................................................................................................. 34
5.9 Conclusion......................................................................................................... 34

6. Web Analytics Strategy ..................................................................................... 35


6.1 Key Performance Indicators .............................................................................. 35
6.2 Web Analytics Process ....................................................................................... 36
6.2.1 Identify Key Stakeholders...................................................................... 36
6.2.2 Define Primary Goals for Your Website ................................................ 37
6.2.3 Identify the Most Important Site Visitors ............................................. 37
6.2.4 Determine the Key Performance Indicators .......................................... 37
6.2.5 Identify and Implement the Right Solution .......................................... 38
6.2.6 Use Multiple Technologies and Methods .............................................. 38
6.2.7 Make Improvements Iteratively ............................................................. 38
6.2.8 Hire and Empower a Full-Time Analyst............................................... 38
6.2.9 Establish a Process of Continuous Improvement .................................. 38
6.3 Choosing a Web Analytics Tool ........................................................................ 39
6.4 Conclusion......................................................................................................... 40

7. Web Analytics as Competitive Intelligence ......................................................... 41


7.1 Determining Appropriate Key Performance Indicators..................................... 41
7.1.1 Commerce ............................................................................................. 43
7.1.2 Lead Generation.................................................................................... 45
7.1.3 Content/Media ...................................................................................... 46
7.1.4 Support/Self-Service ............................................................................. 48
7.2 Conclusion......................................................................................................... 49

8. Supplementary Methods for Augmenting Web Analytics .................................... 51


8.1 Surveys .............................................................................................................. 51
8.1.1 Review of Appropriate Survey Literature .............................................. 51
8.1.2 Planning and Conducting a Survey ....................................................... 52
8.1.3 Design a Survey Instrument .................................................................. 55
8.2 Laboratory Studies ............................................................................................ 58
8.3 Conclusion......................................................................................................... 61
CONTENTS xiii

9. Search Log Analytics ......................................................................................... 63


9.1 Introduction....................................................................................................... 63
9.2 Review of Search Analytics ............................................................................... 64
9.2.1 What Is a Search Log? .......................................................................... 64
9.2.2 How Are These Interactions Collected? ................................................ 64
9.2.3 Why Collect This Data? ........................................................................ 64
9.2.4 What Is the Foundation of Search Log Analysis? ................................. 65
9.2.5 How Is Search Log Analysis Used? ....................................................... 66
9.2.6 How to Conduct Search Log Analysis?................................................. 66
9.3 Search Log Analysis Process ............................................................................. 66
9.3.1 Data Collection ..................................................................................... 67
9.3.2 Fields in a Standard Search Log ............................................................ 67
9.3.3 Data Preparation ................................................................................... 69
9.3.4 Cleaning the Data ................................................................................. 71
9.3.5 Parsing the Data .................................................................................... 71
9.3.6 Normalizing Searching Episodes........................................................... 72
9.4 Data Analysis..................................................................................................... 72
9.4.1 Analysis Levels ...................................................................................... 72
9.4.2 Conducting the Data Analysis............................................................... 75
9.5 Discussion.......................................................................................................... 76
9.6 Conclusion......................................................................................................... 77

10. Conclusion ....................................................................................................... 79

11. Key Terms ......................................................................................................... 81

12. Blogs for Further Reading ................................................................................. 87

References ................................................................................................................. 89

Author Biography .................................................................................................... 101


1

CHAPTER 1

Understanding Web Analytics

Let us pretend for the moment that we run an online retail store that sells a physical product, per-
haps the latest athletic shoe, as just an example. How do potential customers find our online store?
Do they find us via major search engines or from other sites? How will we know, and why should
we care? What might it mean if they come to our Website and then immediately leave? What if
the potential customer explores several pages and then leaves? Do these customers’ actions tell us
anything valuable about our Website or call for actions on our part? If a customer starts to make a
purchase but then leaves before completing the order, should we look at a site redesign? To make our
hypothetical online store successful, we need to understand why potential customers behave as they
do, and the possible answers to our questions lie within the field of Web analytics.
The Web Analytics Association (WAA) defines Web analytics as “the measurement, collec-
tion, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web
usage” (http://www.webanalyticsassociation.org/).
This seemingly clear-cut definition is not so clear-cut when we consider the numerous un-
stated assumptions, methods, and tools needed for its implementation. In fact, the definition raises
several critically unanswered questions. For example, the definition leaves Internet data undefined.
What is this Internet data? What are its strengths and shortcomings? Where does one get it? Once
defined, collection implies some application that can do the collecting. What application is doing
the collecting? Measurement implies processes and benchmarks. Where are these processes and
benchmarks? Analysis implies both a methodology and strategy for its conduct, which leads one
from the data to understanding and insight. What is this methodology, and how does one define the
strategy? What constructs is this strategy based upon? Reporting implies an organizational unit in
which to report for some external purpose. Optimizing implies a focus on technology or processes.
Understanding implies a focus on people or contexts.
Answering some of these questions is the goal of this lecture, and the questions and assump-
tions of our definition provide the structure for our discussion of the increasingly important field of
Web analytics.
Given the commercial structure of the Web, Web analytics has typically taken a business
perspective in the practitioner arena. Within academia, a near parallel movement is focusing on
2 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

transaction log analysis (TLA) and Webometrics areas [147], where interesting academic research
is occurring. In this lecture, the perspective will shift from one paradigm to the other. Because the
jargon from the practitioner side is rapidly gaining wider acceptance, this lecture leverages that
terminology, in the main. As such, much of the discussion will take a “business” and “customer”
perspective that could be somewhat alien to some academic readers. I would recommend these aca-
demic readers to adapt. This is the direction that the field is heading, and in the main, it is for the
better. It is where the action is. Practice is informing research.
The commercial force on the Web is pushing Web analytics research outside of academia
at a near unbelievable pace. The drivers for these movements are clear. The Web has significantly
shortened the distance between a business and its customers, both physically and emotionally. The
distance between business and customer is now the duration of a single click. These clicks drive
the economic models that support our Web search engines and provide the economic fuel for an
increasing number of businesses. The click (with the associated customer behavior that accompanies
it) is at the heart of an economic engine that is changing the nature of commerce with the near
instantaneous, real-time recording of customer decisions to buy or not to buy (or some other analyst
defined conversion other than purchase).
As such, Web analytics deals with Internet customer interaction data from Web systems.
From an academic research perspective, this data is known as trace data (i.e., traces left behind
that indicate human behaviors). A basic premise of Web analytics is that this user-generated In-
ternet data can provide insight to understanding these users better or point to needed changes or
improvements to existing Web systems. This can be data collected directly on a given Website or
gathered indirectly from other applications. Almost all direct data that we can collect is behavioral
data, which is data that relates to the behavior of a user on a Web system. As such, this data pro-
vides wonderful insights into what a user is doing. It tells us the “what.” However, its shortcoming
is that it offers little insight into the motivations or decision processes of that user. These are what
academics call the contextual, situational, cognitive, and affective aspects of the user. For example, a
click online could indicate extreme interest, slight consideration, or perhaps a serendipitous experi-
ence. To explore the “why,” we need attitudinal data (i.e., the contextual, situational, cognitive, and
affective stuff ). For these insights, one must typically use other forms of data collection methods to
supplement behavioral data, such as surveys, interviews, or laboratory studies. However, behavioral
data is a great starting point to isolate the most promising possibilities (based on some external goal)
and then move to attitudinal data collection methods in order to investigate possible meanings or
solutions.
As researchers, we collect this behavioral data using an application that logs user behavior
on the Website, along with other associated measures. These logging applications come in a variety
of flavors, with continually changing structure, coding, and features. However, they all perform the
UNDERSTANDING WEB ANALYTICS 3

same core activities—collect and archive data in some type of storage location. This storage location
is typically a log file, characteristically known as a transaction log, hence the name transaction log
analysis (i.e., Web analytics in academic circles). While transaction log formats vary, they all gener-
ally report similar behavioral data, along with associated contextual data concerning the computer
and related software (i.e., operating system, browser type, etc.).
The issue of the computer is extremely important as it serves to highlight one of the key
shortcomings of Internet data, namely, that the data can sometimes be inaccurate. There are sev-
eral sources of data error. Primarily, Internet behavioral data are traceable back to a computer or
computer browser and not necessarily to an exact person, assuming the person does not log into
an account. In many cases this issue may make little difference. If a product is sold, a product is
sold. However, in other situations such as search and visitor counts, this issue can cause numerous
problems. This is especially so with Web search engines where one can log on anonymously. In
addition, common use computers can skew the data. Additionally, with the proliferation of scrap-
ping software one cannot always tell whether the visitor was even a human. In the case of popular
Websites, many times most of the server load is software generated. The behavior of these bots can
significantly skew the data. Other sources of data inaccuracies include the use of cookies, internal
visitors, caching servers, and incorrect page tagging. Finally (and we computer scientists rarely dis-
cuss this), data catch applications are not perfect, based on personal experience error rates are often
in the 5% to 10% range.
Once we have collected the data, as accurately as possible, we begin the process of getting
value by reporting and analyzing it. Reporting is somewhat straightforward and generally involves
compiling data in some aggregate way for clarity and simplification. In analysis, we attempt to lever-
age the data to understand some set goal and, perhaps, to make recommendations for improvement,
identify opportunities, or highlight specific findings. In order to enable practitioners to get value
from the analysis, researchers must establish proven processes and methodologies. Methodologies
must be identified and used to correct for data inaccuracy and typically must be scalable (i.e., able to
handle large volumes of data). The tactical aspects of analysis (and the related issues of data clean-
ing before analysis) can be extremely time-consuming until one establishes an efficient procedure.
However, several commercial tools can aid in the process. The effective analysis must generate the
proper metrics and key performance indicators (KPIs).
From a formal perspective KPIs measure performance based on articulated goals for the busi-
ness, user understanding, or Web system. Each KPI, then, should link directly to goals; therefore,
KPIs enable goal achievement by defining and measuring progress. The setting of these KPIs is of
paramount importance in achieving a related technology, user, or organizational goal.
In defining KPIs, we identify the actions that are desired behaviors and then relate these
desired behaviors toward measurable goals. KPIs will vary based on the organization and Web
4 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

system. Typically, KPIs for commercial sites are overall purchase conversions, average order size,
and items per order. For lead generation sites, KPIs might be overall conversions, conversion by
campaigns, dropouts, and conversions of leads to actual customers. Customer service sites might
focus on reducing expenses and improving customer experiences. For advertising on content sites,
KPIs could be visits per week, page viewed per visit, visit length, advertising click ratio, and ratio of
new to returning visitors.
Analysis results are of little value until one takes action driven by the data that is in line with
the established KPI. One generally refers to this as actionable outcomes. In academic circles, this
may mean generating publications that shed insight on user behavior, or changes to some methods
or system. In a business, this means calculated change to improve the Website or business process
that is directly dependent on the KPI selected.
We directly link KPIs to goals by monetizing (i.e., assigning value to) the desired behaviors
that these indicators reflect. Generally, these goals relate to generating additional revenue, reducing
costs, or improving the user experience. If we want more visitors, we must determine how much
each visitor is worth to us. If we are interested in items ordered, we identify the value of each addi-
tional item ordered to the organization. By clearly articulating this linkage between KPIs and goals,
we can then see the impact of these indicators and make choices about prioritizing opportunities
and problems. This type of analysis can also aid in eliminating unsuccessful projects and determin-
ing the impact of system changes. Such a linkage process can aid in determining the value of Web
campaigns and recognizing the investment return on Web system use.
In a nutshell, this is the field of Web analytics. In the following sections of this lecture, we
investigate each of the concepts and areas in more detail, beginning with an examination of the
theoretical foundations of Web analytics.

• • • •
5

CHAPTER 2

The Foundations of Web Analytics:


Theory and Methods

What are the foundational elements that provide confidence that Web analytics is providing use-
ful insights? To address such a question, we must investigate the underlying constructs of Web
analytics. This section explains the theoretical and methodological foundations for Web analytics,
addressing the fundamentals of the field from a research viewpoint and the concept of Web logs as
a data collection technique from the perspective of behaviorism. By behaviorism, we take a more
liberal view than is traditional, as will be explained.
From this research foundation, we then move to the methodological aspects of Web analyt-
ics and examine the strengths and limitations of Web logs as trace data. We then review the con-
ceptualization of Web analytics as an unobtrusive approach to research and present the power and
deficiency of the unobtrusive methodological concept, including benefits and risks of Web analytics
specifically from the perspective of an unobtrusive method. The section also highlights some of the
ethical questions concerning the collection of data via Web log applications.
Conducting research involves the use of both a set of theoretical constructs and methods for
investigation [74]. For empirical research, the results are linked conceptually to the data collection
process. High-quality research requires a thorough methodological frame. In order to understand
empirical research and the implications of the results, we must thoroughly understand the tech-
niques by which the researcher collected and analyzed data. A variety of methods is available for
research concerning users and information systems on the Web, including qualitative, quantitative,
and mixed methods. The selection of an appropriate method is critical if the research is to have
efficient execution and effective outcomes. The method of data collection also involves a choice of
methods. Web logs (including both transaction logs and search logs) and Web analytics (including
TLA and search log analysis [SLA]) are approaches to data collection and research methodology,
respectively, for both system performance and user behavior analysis that has been used since 1967
[105], in peer-reviewed research since 1975 [116], and in numerous practitioner outlets since the
1990s [118].
6 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

A Web log is an electronic record of interactions that have occurred between a system and
users of that system. These log files can come from a variety of computers and systems (Websites,
to online public access catalogs or OPACs, user computers, blogs, listserv, online newspapers, etc.),
basically any application that can record the user–system–information interactions. Web analytics
also takes various forms but commonly involves TLA, which was preceded by log analysis in the
academic fields of library, information, and computer science. TLA is the methodological approach
to studying online systems and users of these systems. Peters [117] defines TLA as the study of
electronically recorded interactions between online information retrieval systems and the persons
who search for information found in those systems. Since the advent of the Internet, we have had
to modify Peters’ (1993) definition, expanding it to include systems other than information re-
trieval systems. In general, the practitioner side of Web analytics seems to have developed relatively
independently, with few people venturing out and sharing learning between the practitioner and
academic camps.
Partly as a result of this separate development, Web analytics is a broad categorization of
methods that covers several sub-categorizations, including TLA (i.e., analysis of any log from a
system), Web log analysis (i.e., analysis of Web system logs), blog analysis (i.e., analysis of Web logs),
and SLA (analysis of search engine logs), among others. The study of digital libraries is also an in-
teresting domain that involves both searching and browsing. Web analytics enables macro-analysis
of aggregate user data and patterns and microanalysis of individual search patterns. The results
from the analyzed data help to develop systems and services based on user behavior or system per-
formance, and these services and performance enhancements are usually leveraged to achieve other
goals.
From a user behavior perspective, Web analytics is one of a class of unobtrusive methods
(a.k.a., non-reactive or low-constraint). Unobtrusive methods are those that allow data collection
without directly contacting participants. The research literature specifically describes unobtrusive
approaches as those that do not require a direct response from participants [102, 112, 154]. This
data can be gathered through observation or from existing records. In contrast to unobtrusive meth-
ods, obtrusive or reactive approaches, such as questionnaires, tests, laboratory studies, and surveys,
require a direct response from participants [153]. A laboratory experiment is an example of an
extremely obtrusive method. The metaphorical line between unobtrusive and obtrusive methods is
unquestionably blurred, and instead of one thin line there is a rather large gray area. For example,
conducting a survey to gauge the reaction of users to information systems is an obtrusive method.
However, using the posted results from the survey is an unobtrusive method. Granted, this may be
making a strictly intellectual distinction, but the point is that log data falls in the gray area. In some
respects, users know that their actions are being logged or recorded on Websites. However, logging
applications are generally so unobtrusive that they fade into the background [4].
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 7

With this introduction, we now address the specific research and methodological foundations
of Web analytics. We first address the concept of transaction logs as a data collection technique
from the perspective of behaviorism, and then review the conceptualization of Web analytics as
trace data and an unobtrusive method. We present the strengths and shortcomings of the unobtru-
sive methodology approach, including benefits and shortcomings of Web analytics specifically from
the perspective of an unobtrusive method. We end with a short summary and open questions of
transaction logging as a data collection method.

2.1 INTRODUCTION
The use of transaction logs for research purposes certainly falls conceptually within the confines
of the behaviorist paradigm of research. Therefore, behaviorism is the conceptual basis for Web
analytics.

2.2 BEHAVORISM
Behaviorism is a research approach that emphasizes the outward behavioral aspects of thought.
Strictly speaking, behaviorism also dismisses the inward experiential and procedural aspects [137,
152]; importantly, behaviorism has been heavily criticized for this narrow viewpoint. Some of the
pioneers in the behaviorist field are shown in Figure 2.1.
For the area of Web analytics, however, we take a more open view of behaviorism. In this
more accepting view, behaviorism emphasizes observed behaviors without discounting the inner
aspects (i.e., attitudinal characteristics and context) that may accompany these outward behaviors.
This more open outlook of behaviorism supports the position that researchers can gain much from

Ivan Petrovich Pavlov John B. Watson Burrhus Frederic Skinner


FIGURE 2.1: Three pioneers of behaviorist research. Copyright © 2009 Photo Researchers, Inc. All
Rights Reserved. Used with permission.
8 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

studying expressions (i.e., behaviors) of users interacting with information systems. These expressed
behaviors may reflect aspects of the person’s inner self as well as contextual aspects of the environ-
ment within which the behavior occurs. These environmental aspects may influence behaviors while
also reflecting inner cognitive factors.
The primary proposition underlying behaviorism is that all things that people do are be-
haviors. These behaviors include utterances, actions, thoughts, and feelings. With this underlying
proposition, the behaviorist position is that all theories and models concerning people have observa-
tional correlates. Moreover, the behaviors and any proposed theoretical constructs must be mutually
complementary.
Strict behaviorism would further state that there are no differences between the publicly ob-
servable behavioral processes (i.e., actions) and privately observable behavioral processes (i.e., think-
ing and feeling). Due to affective, contextual, situational, or environmental factors, however, there
may be disconnections between the cognitive and affective processes. Therefore, there are sources of
behavior both internal (i.e., cognitive, affective, and expertise) and external (i.e., environmental and
situational). Behaviorism focuses primarily on only what an observer can see or manipulate.
Behaviorism is evident in any research where the observable evidence is critical to the re-
search questions or methods, and this is especially true in any experimental research where the
“operationalization” of variables is required. A behaviorist approach, at its core, seeks to understand
events in terms of behavioral criteria [134, p. 22]. Behaviorist research demands behavioral evi-
dence, and this is particularly important to Web analytics. Within such a perspective, there is no
knowable difference between two states unless there is a demonstrable difference in the behavior
associated with each state.
Research that is grounded in behaviorism always focuses on somebody doing something in a
situation. Therefore, all derived research questions focus on who (actors), what (behaviors), when
(temporal), where (contexts), and why (cognitive). The actors in a behaviorist paradigm are people,
at whatever level of aggregation (e.g., individuals, groups, organizations, communities, nationalities,
societies), whose behavior is studied. All aspects of what the actors do are studied carefully. These
behaviors have a temporal element, and thus researchers need to study when and how long these
behaviors occur. Similarly, the behaviors occur within some context, which are all the environmental
and situational features in which these behaviors are embedded, and this context must be recognized
and analyzed. Finally, the cognitive aspect to these behaviors is the thought and affective processes
internal to the actors executing the behaviors.
From this research perspective, each of these aspects (i.e., actor, behaviors, temporal, context,
and cognitive) are behaviorist constructs. However, for Web analytics, we are primarily concerned
with defining what a behavior is.
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 9

2.3 BEHAVIORS
Defining a behavior is not as straightforward as it may seem at first glance, yet defining a behavior
is critical for Web analytics. In research, a variable represents a set of events where each event may
have a different value. In Web analytics, session duration or number of clicks may be variables that
interest a researcher. The particular variables that a researcher is interested in stem from the research
questions driving the study.
We can define variables by their use in a research study (e.g., independent, dependent, extra-
neous, controlled, constant, and confounding) and by their nature. Defined by their nature, there are
three types of variables: environments (i.e., events of the situation, environment, or context), subjects
(i.e., events or aspects of the subject being studied), and behavioral (i.e., observable events of the
subject of interest).
For Web analytics, behavior is the essential construct of the behaviorist paradigm. At its
most basic, a behavior is an observable activity of a person, animal, team, organization, or system.
Like many basic constructs, behavior is an overloaded term, as it also refers to the aggregate set of
responses to both internal and external stimuli. Therefore, behaviors can also address a spectrum of
actions. Because of its many associations, it is difficult to characterize a word like behavior without
specifying a context in which it takes place to provide the necessary meaning.
However, one can generally classify behaviors into three general categories:

• Behaviors are something that can be detected and, therefore, recorded.


• Behaviors are an action or a specific goal-driven event with some purpose other than the
specific action that is observable.
• Behaviors are reactive responses to environmental stimuli.

In some manner, the researcher must observe these behaviors. In other words, the researcher
must study and gather information on a behavior concerning what the actor does. Classically, obser-
vation is visual, where the researcher uses his/her own eyes, but recording devices, such as a camera,
can assist in the observation. Technology has extended the concept of observation to include other
recording devices. For Web analytics, we extend the notion of observation to include logging soft-
ware. Logging software is really nearly invisible to many users; thus, it allows for a more objective
measure of true user behavior. Web analytics focuses on descriptive observation and logging the
behaviors, as they would occur in a user–system interaction episode.
When studying behavioral patterns with Web analytics and other similar approaches, re-
searchers often use ethograms. An ethogram is a taxonomy or index of the behavioral patterns that
details the different forms of behavior that a particular user exhibits. In most cases it is desirable to
10 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

TABLE 2.1: Taxonomy of user–system behaviors [67].

STATE DESCRIPTION

View results Behavior in which the user viewed or scrolled one or more
pages from the results listing. If a results page was present and
the user did not scroll, we counted this as a View Results Page.

With Scrolling User scrolled the results page.

Without Scrolling User did not scroll the results page.

but No Results in Window User was looking for results, but there were no results in the listing.

Selection Behavior in which the user makes a selection in the results listing.

Click URL (in results listing) Interaction in which the user clicked on a URL of one of the results
in the results page.

Next in Set of Results List User moved to the Next results page.

Previous in Set of Results List User moved to the Previous results page.

GoTo in Set of Results List User selected a specific results page.

View document Behavior in which the user viewed or scrolled a particular


document in the results listings.

With Scrolling User scrolled the document.

Without Scrolling User did not scroll the document.

Execute Behavior in which the user initiated an action in the interface.

Execute Query Behavior in which the user entered, modified, or submitted a query
without visibly incorporating assistance from the system. This cat-
egory includes submitting the original query which was always the
first interaction with system.

Find Feature in Document Behavior in which the user used the FIND feature of the browser.

Create Favorites Folder Behavior in which the user created a folder to store relevant URLs.

Navigation Behavior in which the user activated a navigation button on the


browser, such as Back or Home.
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 11

TABLE 2.1: (continued )

STATE DESCRIPTION

Back User clicked the Back button.

Home User clicked the Home button.

Browser Behavior in which the user opened, closed, or switched


browsers.

Open new browser User opened a new browser.

Switch/Close browser window User switched between two open browsers or closed a browser win-
dow.

Relevance action Behavior such as print, save, bookmark, or copy.

Bookmark User bookmarked a relevant document.

Copy–Paste User copy–pasted all of, a portion of, or the URL to a relevant docu-
ment.

Print User printed a relevant document.

Save User saved a relevant document.

View/Implement assistance Behavior in which the user viewed the assistance offered by the
application.

Implement Assistance Behavior in which the user entered, modified, or submitted a query,
utilizing assistance offered by the application.

Phrase User implemented the PHRASE assistance.

Spelling User implemented the SPELLING assistance.

Synonyms User implemented the SYNONYMS assistance.

Previous Queries User implemented the PREVIOUS QUERIES assistance.

Relevance Feedback User implemented the RELEVANCE FEEDBACK assistance.

AND User implemented the AND assistance.

OR User implemented the OR assistance.


12 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

create an ethogram in which the categories of behavior are objective and discrete, not overlapping
with each other. In an ethogram, the definitions of each behavior and category of behaviors should
be clear, detailed, and distinguishable from each other. Ethograms can be as specific or general as
the study or investigation warrants.
Spink and Jansen [140] and Jansen and Pooch [69] outline some of the key behaviors for
SLA, a specific form of Web analytics. Hargittai [52] and Jansen and McNeese [67] present ex-
amples of detailed classifications of behaviors during Web searching. As an example, Table 2.1
presents an ethogram of user behaviors interacting with a Web browser during a searching session
employed in the study.
There are many way to observe behaviors. In TLA, we are primarily concerned with observ-
ing and recording these behaviors in a file, and we then can view the recorded fields as trace data.

2.4 TRACE DATA


In any study, the researcher has several options for collecting data, and there is no one single best
method for data collection. The decision about which approach or approaches to use depends upon
the research questions (i.e., what needs to be investigated? how one needs to record the data? what
resources are available? what is the timeframe available for data collection? how complex is the data?
what is the frequency of data collection? and how will the data be analyzed?).
When collecting transaction log data, we are generally concerned with observations of be-
havior. The general objective of observation is to record the behavior, either in a natural state or in
a laboratory study. In both settings, ideally, the researcher should not interfere with the behavior.
However, when observing people, the knowledge that they are being observed is likely to alter
participants’ behavior. For example, in laboratory studies, a researcher’s instructions may make a
participant either more or less likely to perform a particular behavior, such as smiling or following a
Web link. With logging software, the introduction of the application may change a user’s behavior.
Although in naturalistic settings, the log application has less of an impact if the user does not feel
that the data will be used immediately for research purposes.
When investigating user behaviors, the researcher must keep in mind these limitations of
observational techniques even while recording behaviors as data for future analysis. The user, a
third party, or the researcher can record the behaviors. Transaction logging is an indirect method of
recording data about behaviors, and the users themselves, with the help of logging software, make
these data records of behavior. We refer to these records as traces. Thus, transaction log records are
a source of trace data.
What is trace data? The processes by which people conduct the activities of their daily lives
many times create things, leave marks, induce wear, or reduce some existing material. Within the
confines of research, these things, marks, and wear become data. Classically, trace data are the physi-
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 13

cal remains of interaction [154, pp. 35–52]. These remains can be intentional (i.e., notes in a diary
or initials on a cave wall) or accidental (i.e., footprints in the mud or wear on a carpet). However,
trace data can also be through third party logging applications. In TLA, we are primarily interested
in this data from third party logging.
Many researchers use physical or, as in the case of Web analytics, virtual traces as indicators
of behavior. These behaviors are the facts or data that researchers use to describe or make inferences
about events concerning the actors. Researchers [154] classify trace data into two general types: ero-
sion and accretion. Erosion is the wearing away of material leaving a trace. Accretion is the buildup
of material, making a trace. Both erosion and accretion have several subcategories. In TLA, we are
primarily concerned with accretion trace data.
Trace data (a.k.a., trace measures) offer a sharp contrast to data collected directly. The great-
est strength of trace data is that it is unobtrusive, meaning the collection of the data does not inter-
fere with the natural flow of behavior and events in the given context. Since the data is not directly
collected, there is no observer present where the behaviors occur to affect the participants’ actions,
and thus, the researcher is getting data that reflects natural behaviors. Trace data is unique; as un-
obtrusive and nonreactive data, it can make a very valuable research contribution. In the past, trace
data was often time consuming to gather and process, making such data costly. With the advent of
transaction logging software, trace data for the studying of behaviors of users and systems in Web
analytics is much cheaper to collect, and consequently, Web analytics and related fields of study have
really taken off.
Interestingly, in the physical world, erosion data is what typically reveals usage patterns (i.e.,
trails worn in the woods, footprints in the snow, fingerprints on a book cover). However, with Web
analytics, logged accretion data indicates the usage patterns (i.e., access to a Website, submission
of queries, Webpages viewed). Specifically, transaction logs are a form of controlled accretion data,
where the researcher or some other entity alters the environment in order to create the accretion
data [154, pp. 35–52]. With a variety of tracking applications, the Web is a natural environment for
controlled accretion data collection. With the user of client apps (such as desktop search bars and
what not), the collection of data is nearly unlimited from a technology perspective.
Like all data collection methods, trace data for studying users and systems has strengths and
limitations. Certainly, trace data are valuable for understanding behavior (i.e., behavioral actions)
in naturalistic environments and may offer insights into human activity obtainable in no other way.
For example, data from Web transaction logs is on a scale available in few other places. However,
one must interpret trace data carefully and with a fair amount of caution because trace data can
be incomplete or even misleading. For example, with the data in transaction logs the researcher
can say a given number of Website users only looked at the Website’s homepage and then left
(a.k.a., homepage bounce rate). However, using trace data alone the researcher could not conclude
14 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

whether the users left because they found what they were looking for, were frustrated because they
could not find what they were looking for, or were in the wrong place to begin with. However,
with some experimental data one could make some reasonable assumptions concerning this user
behavior.
Research using trace data from transaction logs should be analyzed based on the same criteria
as all research data and methods. These criteria are credibility, validity, and reliability.
Credibility concerns how trustworthy or believable the data collection method is. The re-
searcher must make the case that the data collection approach records the data needed to address
the underlying research questions.
Validity addresses whether the measurement actually measures what it is supposed to mea-
sure. There are three kinds of validity:

• Face or internal validity: the extent to which the contents of the test, method, analysis, or
procedure that the researcher is employing measure what they are supposed to measure.
• Content or construct validity: the extent to which the content of the test, method, analy-
sis, or procedure adequately represents all that is required for validity of the test, method,
analysis, or procedure (i.e., are you collecting and accounting for all that you should collect
and account for).
• External validity: the extent to which one can generalize the research results across popu-
lations, situations, environments, and contexts of the test, method, analysis, or procedure.

In inferential or predictive research, one must also be concerned with statistical validity (i.e.,
the degree of strength of the independent and dependent variable relationships). Statistical validity
is actually an important aspect for Web analytics, given the needed ties between data collected and
KPIs.
Reliability is a term used to describe the stability of the measurement. Essentially, reliability
addresses whether the measurement assesses the same thing, in the same way, in repeated tests.
Researchers must always address the issues of credibility, validity, and reliability. Leveraging
the work of Holst [58], the researcher must address six questions in every Web analytics research
project that uses trace data from transaction logs.

• Which data are analyzed? The researcher must clearly communicate in a precise manner
both the format and content of recorded trace data. With transaction log software, this is
much easier than in other forms of trace data, as logging applications can be reverse engi-
neered to articulate exactly what behavioral data is recorded.
• How is this data defined? The researcher must clearly define each trace measure in a man-
ner that permits replication of the research on other systems and with other users. As TLA
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 15

has proliferated in a variety of venues, more precise definitions of measures are developing
[114, 151, 158].
• What is the population from which the researcher has drawn the data? The researcher
must be cognizant of the actors, both people and systems, that created the trace data. With
transaction logs on the Web, this is sometimes a difficult issue to address directly, unless the
system requires some type of logon and these profiles are then available. In the absence of
these profiles, the researcher must rely on demographic surveys, studies of the system’s user
population, or general Web demographics.
• What is the context in which the researcher analyzed the data? It is important for the
researcher to explain clearly the environmental, situational, and contextual factors under
which the trace data was recorded. With transaction log data, this includes providing com-
plete information about the temporal factors of the data collection (i.e., the date and time
the data was recorded) and the make-up of the system at the time of the data recording, as
system features undergo continual change. Transaction logs have the significant advantage
of time sampling of trace data. In time sampling, the researcher can make the observations
at predefined points of time (e.g., every 5 minutes, every second), and then record the ac-
tion that is taking place, using the classification of action defined in the ethogram.
• What are the boundaries of the analysis? Research using trace data from transaction logs
is tricky, and the researcher must be careful not to overreach with the research questions
and findings. The implications of the research are confined by the data and the method
of the data collected. For example, with transaction log data we can rather clearly state
whether or not a user clicked on a link. However, transaction log trace data itself will not
inform us as to why the user clicked on a link. Was it intentional? Was it a mistake? Did
the user become sidetracked?
• What is the target of the inferences? The researcher must clearly articulate the relation-
ship among the separate measures in the trace data either to inform descriptively or in order
to make inferences. Trace data can be used for both descriptive research to improve our
understanding and predictive research in terms of making inferences. These descriptions
and inferences can be at any level of granularity (i.e., individual, collection of individuals,
organization, etc.). However, Hilbert and Redmiles [55] point out, based on their experi-
ences, that transaction log data is best used for aggregate level analysis. I disagree with this
position. With enough data at the individual level, one can tell a lot from log data.

If the researcher addresses each of the six questions, transaction logs are an excellent way to
collect trace data on users of Web and other information systems. The researcher then examines this
data using TLA. The use of trace data to understand behaviors makes the use of transaction logs and
transaction logs analysis an unobtrusive research method.
16 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

2.5 UNOBTRUSIVE METHODS


As noted in the introduction to this lecture, unobtrusive methods are research practices that do not
require the researcher to intrude in the context of the actors and thus do not involve direct elicita-
tion of data from the research participants or actors. Unobtrusive measurement presumably reduces
the biases that result from the intrusion of the researcher or measurement instrument. We should
note, however, that unobtrusive measures reduce the degree of control that the researcher has over
the type of data collected, and importantly, for some research questions, appropriate unobtrusive
measures may simply not be available.
Why is it important for the research not to intrude upon the environment? There are at least
three justifications. First, the Heisenberg uncertainty principle, borrowed from the field of quantum
physics asserts that the outcome of a measurement of some system is neither deterministic nor per-
fect. Instead, a measurement is characterized by a probability distribution. The larger the associated
standard deviation is for this distribution, the more “uncertain” are the characteristics measured for
the system. The Heisenberg uncertainty principle is commonly stated as, “One cannot accurately
and simultaneously measure both the position and momentum of a mass” (http://en.wikipedia
.org/wiki/Uncertainty_principle). In this analogy, when researchers are interjected into an environ-
ment, they become part of the system. Therefore, their very presence in the environment will affect
measurements of the components of that system. A common example is in ethnographic studies
where the researchers interject themselves in a given context.
The second justification for avoiding or at least limiting environmental intrusion is the ob-
server effect. The observer effect refers to the difference that is made to an activity or a person’s
behaviors by it (or the person) being observed. People may not behave in their usual manner if
they know that they are being watched or when being interviewed while carrying out an activity.
In research, this observer effect specifically refers to changes that the act of observing will make on
the phenomenon being observed. In information technology, the observer effect is the potential
impact of the act of observing a process output while the process is running. A good example of the
observer effect in TLA is pornographic searching behavior. Participants rarely search for porn in a
laboratory study while studies employing trace data shows it is a common searching topic [71].
In addition to the uncertainty principle and the observer effect, observer bias adds a third jus-
tification for reducing environmental intrusion. Observer bias is error that the researcher introduces
into measurement when observers overemphasize behavior they expect to find and fail to notice
behavior they do not expect. Many fields have common procedures to address this, although these
procedures are seldom used in information and computer science. For example, the observer bias is
why medical trials are normally double-blind rather than single-blind. Observer bias is introduced
because researchers see a behavior and interpret it according to what it means to them, whereas it
may mean something else to the person showing the behavior. Trace data helps in overcoming the
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 17

observer bias in the data collection. However, as with other methods, trace data has no effect on the
observer bias in interpreting the results from data analysis.
Given the justifications for using unobtrusive methods, we will now turn our attention to
three types of unobtrusive measurement that are applicable to Web analytics, namely indirect analy-
sis, context analysis, and second analysis. Web analytics is an indirect analysis method. The re-
searcher is able to collect the data without introducing any formal measurement procedure. In this
regard, TLA typically focuses on the interaction behaviors occurring among the users, system, and
information. There are several examples of utilizing transaction analysis as an indirect approach [cf.
Refs. 2, 15, 32, 57].
Content analysis is the analysis of text documents. The analysis can be quantitative, quali-
tative, or a mixed methods approach. Typically, the major purpose of content analysis is to iden-
tify patterns in text. Content analysis has the advantage of being unobtrusive and, depending on
whether automated methods exist, can be a relatively rapid method for analyzing large amounts of
text. In Web analytics, content analysis typically focuses on search queries or analysis of retrieved
results. A variety of examples are available in this area of transaction log research [cf. Refs. 7, 16,
51, 151, 158].
Secondary data analysis, like content analysis, makes use of already existing sources of data.
However, secondary analysis typically refers to the re-analysis of quantitative data rather than text.
Secondary data analysis uses data that was collected by others to address different research questions
or to use different methods of analysis than was originally intended during data collection. For ex-
ample, Websites commonly collect transaction log data for system performance analysis. However,
researchers can also use this data to address other questions. Several transaction log studies have
focused on this aspect of research [21, 22, 29, 30, 34, 77, 107, 129].
As a secondary analysis method, Web analytics has several advantages. First, it is efficient in
that it makes use of data collected by a Website application. Second, it often allows the researcher to
extend the scope of the study considerably by providing access to a potentially large sample of users
over a significant duration [81]. Third, since the data is already collected, the cost of using existing
transaction log data is cheaper than collecting primary data.
However, the use of secondary analysis is not without difficulties. First, secondary data is
frequently not trivial to prepare, clean, and analyze [66], especially large transaction logs. Second,
researchers must often make assumptions about how the data was collected because third parties
developed the logging applications. A third and perhaps more perplexing difficulty concerns the
ethics of using transaction logs as secondary data. By definition, the researcher is using the data in a
manner that may violate the privacy of the system users [53]. In fact, some critics point to a grow-
ing concern for unobtrusive methods due to increased sensitivity toward the ethics involved in such
research [112]. Log data may be unobtrusive, but it can certainly be quite invasive.
18 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

2.6 WEB ANALYTICS AS UNOBTRUSIVE METHOD


Web analytics has significant advantages as a methodological approach for the study and investiga-
tion of behaviors. These factors include:

• Scale: Transaction log applications can collect data to a degree that overcomes the critical
limiting factor in laboratory user studies. User studies in laboratories are typically restricted
in terms of sample size, location, scope, and duration.
• Power: The sample size of transaction log data can be quite large, so inference testing can
highlight statistically significant relationships. Interestingly, sometimes the amount of data
in transaction logs from the Web is so large that nearly every relation is significantly cor-
related. Due to the large power, researchers must account for the size effect.
• Scope: Since transaction log data is collected in natural contexts, researchers can investi-
gate the entire range of user–system interactions or system functionality in a multi-variable
context.
• Location: Transaction log data can be collected in naturalistic, distributed environments.
Therefore, users do not have to be in an artificial laboratory setting.
• Duration: Since there is no need for recruiting specific participants for a user study, trans-
action log data can be collected over an extended period.

All methods of data collection have strengths not available with other methods, but they also
have inherent limitations. Transactions logs have several shortcomings. First, transaction log data
is not nearly as versatile relative to primary data because the data may not have been collected with
the particular research questions in mind. Second, transaction log data is not as rich as some other
data collection methods and therefore not available for investigating the range of concepts some
researchers may want to study. Third, the fields that the transaction log application records are many
times only loosely linked to the concepts they are alleged to measure. Fourth, with transaction logs
the users may be aware that they are being recorded and may alter their actions. Therefore, the user
behaviors may not be altogether natural.
Given the inherent limitations in the method of data collection, Web analytics also suffers
from shortcomings derived from the characteristics of the data collection. Hilbert and Redmiles
[56] maintain that all research methods suffer from some combination of abstraction, selection,
reduction, context, and evolution problems that limit scalability and quality of results. Web analytics
suffers from these same five shortcomings.

• Abstraction problem—how does one relate low-level data to higher-level concepts?


• Selection problem—how does one separate the necessary from unnecessary data before
reporting and analysis?
THE FOUNDATIONS OF WEB ANALYTICS: THEORY AND METHODS 19

• Reduction problem—how does one reduce the complexity and size of the data set before
reporting and analysis?
• Context problem—how does one interpret the significance of events or states within state
chains?
• Evolution problem—how can one alter data collection applications without impacting
application deployment or use?

Because each method has its own combination of abstraction, selection, reduction, context,
and evolution problems, astute researchers will employ complementary methods of data collection
and analysis. This is similar to the conflict inherent in any overall research approach. Each research
method for data collection tries to maximize three desirable criteria: generalizability (i.e., the degree
to which the data applies to overall populations), precision (i.e., the degree of granularity of the mea-
surement), and realism (i.e., the relation between the context in which evidence is gathered relative
to the contexts to which the evidence is to be applied). Although the researcher always wants to
maximize all three of these criteria simultaneously, in reality it cannot be done. This is one funda-
mental dilemma of the research process. The very things that increase one of these three features
will reduce one or both of the others.

2.7 CONCLUSION
Recordings of behaviors via transaction log applications on the Web opens a new era for research-
ers by making large amounts of trace data available for use. The online behaviors and interactions
among users, systems, and information create digital traces that permit collection and analysis of
this data. Logging applications provide data obtained through unobtrusive methods, and impor-
tantly, these collections are substantially larger than any data set obtained via surveys or laboratory
studies. As noted earlier, these applications allow the data to be collected in naturalistic settings
with little to no impact by the observer. Researchers can use these digital traces to analyze a nearly
endless array of behavior topics.
Web analytics is a behaviorist research method, with a natural reliance on the expressions of
interactions as behaviors. The transaction log application records these interactions, creating a type
of trace data. As a reminder, trace data in transaction logs are records of interactions as people use
these systems to locate information, navigate Websites, and execute services. The data in transaction
logs is a record of user–system, user–information, or system–information interactions. Moreover,
transaction logs provide an unobtrusive method of collecting data on a scale well beyond what one
could collect in confined laboratory studies. Figure 2.2 provides a recap of the foundation of Web
analytics.
The massive increased availability of Web trace data has sparked concern over the ethical
aspects of using unobtrusively obtained data from transaction logs. For example, who does the
20 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

FIGURE 2.2: Recap of foundational element of Web analytics.

trace data belong to—the user, the Website that logged the data, or the public domain? How does
(or should) one seek consent to use such data? If researchers do seek consent, from whom does the
researcher seek it? Is it realistic to require informed consent for unobtrusively collected data? These
are open questions.

• • • •
21

CHAPTER 3

The History of Web Analytics

There have been an increasing number of review articles on Web analytics research in academia.
One of the first, Jansen and Pooch [69] provide a review of Web transaction log research of Web
search engines and individual Websites through 2000, focusing on query analysis. After reviewing
studies conducted between 1995 and 2000, Hsieh-Yee [59] reports that many studies investigate the
effects of certain factors on Web search behavior, including information organization and presenta-
tion, type of search task, Web experience, cognitive abilities, and affective states. Hsieh-Yee [59]
also notes that many studies lack external validity.
Bar-Ilan [13] presents an extensive and integrative overview of Web search engines and the
use of Web search engines in information science research. Bar-Ilan [13] provides a variety of per-
spectives including user studies, social aspects, Web structure, and search-engine evaluation.
Two excellent historical reviews are Penniman [115, 116], who examines log research from
the very beginning as a participant/observer, and Markey [98, 99], who reviews twenty-five years of
academic research in the area.
Given the availability of these comprehensive reviews, we will touch on some of the previous
work simply to identify the overall trends and to provide historical insight for Web analytics today.
Web analytics studies fall into three categories: (1) those that primarily use transaction-log analysis,
(2) those that incorporate users in a laboratory survey or other experimental setting, and (3) those
that examine issues related to or affecting Web searching.

3.1 SINGLE WEBSITES


Some researchers have used transaction logs to explore user behaviors on single Websites. For
example, Yu and Apps [162] used transaction log data to examine user behavior in the Super-
Journal project. For 23 months (February 1997 to December 1998), the researchers recorded
102,966 logged actions, related these actions to 4 subject clusters, 49 journals, 838 journal issues,
15,786 articles, and 3 Web search engines.
In another study covering the period from 1 January to 18 September 2000, Kea et al. [82]
examined user behavior in Elsevier’s ScienceDirect, which hosts bibliographic information and full-
text articles of more than 1300 journals with an estimated 625,000 users. Loken et al. [96] examined
the transaction log data of the online self-directed studying of more than 100,000 students using a
22 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

Web-based system to prepare for U.S. college admissions tests. The researchers noted several non-
optimal behaviors, including a tendency toward deferring study and a preference for short-answer
verbal questions. The researchers discussed the relevance of their findings for online learning.
Wen et al. [156] investigated the use of click-through data to cluster queries for question
answering on a Web-based version of the Encarta encyclopedia. The researchers explored the simi-
larity between two queries using the common user-selected documents between them. The results
indicate that a combination of both keywords and user logs is better than using either method alone.
Using a Lucent proxy server, Hansen and Shriver [50] used transaction-log analysis to cluster search
sessions and to identify highly relevant Web documents for each query cluster.
Collectively, these studies provide better descriptions of user behaviors and help to refine
transaction log research for Web log analysis of searching and single Websites from an academic
perspective.

3.2 LIBRARY SYSTEMS


Some of the original work in the area of log analysis has occurred in the library fields, with many
studies of library systems [117]. Continuing this rich tradition of using transaction logs to investi-
gate the use of library systems, more sophisticated methods are emerging. For example, Chen and
Cooper [28] clustered users of an online library system into groups based on patterns of states using
transaction logs data. The researchers defined 47 variables, using them to classify 257,000 sessions.
Then they collapsed these 47 variables into higher order groupings, identifying six distinct clusters
of users. In a follow-up study, Chen and Cooper [27] used 126,925 sessions from the same online
system, modeling patterns using Markov models. The researchers found that a third-order Markov
model explained five of the six clusters.
What is coming out of this line of research is a move in the academic field from straightly
descriptive to more predictive elements of analysis, including methods of Web mining [136].

3.3 SEARCH ENGINES


Rather than focusing on single Websites, other researchers have investigated information searching
on Web search engines. Ross and Wolfram [131] analyzed queries submitted to the Excite search
engine for subject content based on the co-occurrence of terms. The researchers categorized more
than 1000 of the most frequently co-occurring term pairs into one or more of 30 developed subject
areas. The cluster analyses resulted in several well-defined high-level clusters of broad subject areas.
He et al. [54] examined contextual information from Excite and Reuters transaction logs, using a
version of the Dempster–Shafer theory to identify search engine sessions. The researchers deter-
mined the average Web user session duration was about 12 min. Özmutlu and Cavdur [109] inves-
tigated contextual information using an Excite transaction log. The researchers explored the reasons
THE HISTORY OF WEB ANALYTICS 23

underlying the inconsistent performance of automatic topic identification with statistical analysis
and experimental design techniques. Xie and O’Hallaron [160] investigated caching to reduce both
server load and user-response time in distributed systems by analyzing a transaction log from the
Vivisimo search engine, from 14 January to 17 February 2001. The researchers report that queries
have significant locality, with query frequency following a Zipf distribution. Lempel and Moran
[92] also investigated clustering to improve caching of search engine results using more than seven
million queries submitted to AltaVista. The researchers report that pre-fetching of search engine
results can increase cache–hit ratios by 50 percent for large caches and can double the hit ratios of
small caches. There is much ongoing work in the area of using logs for search engine and server
caching [10].
In what appears currently to be one of the longest temporal studies, Wang et al. [151] ana-
lyzed 541,920 user queries submitted to an academic-Website search engine during a four-year
period (May 1997 to May 2001). Conducting analysis at the query and term levels, the researchers
report that 38% of all queries contained only one term and that most queries are unique. Eiron and
McCurley [38] used 448,460 distinct queries from an IBM Intranet search engine to analyze the
effectiveness of anchor text.
Pu [122] explored the searching behavior of users searching on two Taiwanese Web search
engines, Dreamer and Global Area Information Servers (GAIS). The average length of English
terms on these two Web search engines was 1.0 term for Dreamer and 1.22 terms for GAIS.
Baeza-Yates and Castillo [9] examined approximately 730,000 queries from TodoCL, a Chilean
search system. They found that queries had an average length of 2.43 terms. A lengthier analysis is
presented in Baeza-Yates and Castillo [8]. Montgomery and Faloutsos [107] analyzed more than
20,000 Internet users who accessed the Web from July 1997 through December 1999 using data
provided by Jupiter Media Metrix (http://www.jupiterresearch.com). The researchers report users
revisited 54 percent of URLs at least once during a searching session.
They also report that browsing patterns follow a power law and the patterns remained stable
throughout the period of analysis. Rieh and Xu [127] analyzed queries from 1,451,033 users of
Excite collected on 9 October 2000. The researchers examined how each user reformulated his/
her Web query over a 24-hour period. Out of the 1,451,033 users logs collected, the researcher
used various criteria to select 183 sessions for manual analysis. The results show that while most
query reformulation involves content changes, about 15% of the reformulation relate to format
modifications.
Huang et al. [60] propose an effective term-suggestion approach for interactive Web search
using more than two million queries submitted to Web search engines in Taiwan. The researchers
propose a transaction log approach to relevant term extraction and term suggestion using relevant
terms that co-occur in similar query sessions.
24 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

Jansen and Spink [70] determined that the typical Web searching session was about 15 min
from an analysis of click through data from AlltheWeb.com. The researchers report that the Web
search engine users on average view about eight Web documents, with more than 66% of searchers
examining fewer than five documents in a given session. Users on average view about two to three
documents per query. More than 55% of Web users view only one result per query. Twenty percent
of the Web users view a Web document for less than a minute. These results would seem to indicate
that the initial impression of a Web document is extremely important to the user’s perception of
relevance.
Beitzel et al. [15] examine hundreds of millions of queries submitted by approximately 50
million users to America Online (AOL) over a 7-day period from 26 December 2003 through 1
January 2004. During this period, AOL used results provided by Google. The researchers report that
only about 2% of the queries contain query operators. The average query length is 2.2 terms, and
81% of users view only one results page. The researchers report changes in popularity and unique-
ness of topically categorized queries across hours of the day. Park, Bae, and Lee (Forthcoming)
analyzed transaction logs of NAVER, a Korean Web search engine and directory service. The data
was collected over a one-week period, from 5 January to 11 January 2003, and contained 22,562,531
sessions and 40,746,173 queries. Users of NAVER implement queries with few query terms, seldom
use advanced features, and view few results pages. Users of NAVER had an average session length of
1.8 queries. Wolfram et al. [159] analyze session clusters from three different search environments.
Web analytics is also entering a variety of areas, including keyword advertising and sponsored
search [65].
Clearly, these research projects provide valuable information for understanding and perhaps
improving user–system and system–information interactions.

3.4 CONCLUSION
What does a historical review of transactional log analysis inform us about the current and possible
future state of Web analytics? In one of the earliest studies employing transaction logs, Penniman
[116, p. 159] stated,“The promise (of transaction logs) is unlimited for evaluating communicative
behavior where human and computer interact to exchange information.” Since the mid-1960s, we
have seen the use of transaction logs evolve from an almost purely descriptive approach focusing
primarily on system effectiveness to one focusing on the combined aspects of the both user and
system. Today, we see these tools being leveraged for more insightful and predictive aspects of the
user–system interaction. Combined with associated research methods, transaction logs have served
a vital function in understanding users and systems.

• • • •
25

CHAPTER 4

Data Collection for Web Analytics

As the previous brief review of research demonstrates, data for Web analytics is plentiful. How the
data is collected, however, is important. There is a proliferation of techniques (e.g., performance
monitors, Web server log files, cookies, and packet sniffing), but the most common individual tech-
niques generally fall into one of two major approaches for collecting data for Web analysis: log files
and page tagging [80]. Most current Web analytic companies use a combination of the two methods
for collecting data. Therefore, anyone interested in Web analytics needs to understand the strengths
and weaknesses of each.

4.1 WEB SERVER LOG FILES


The first method of metric gathering uses log files. Every Web server keeps a log of page requests
that can include (but is not limited to) visitor Internet Protocol (IP) address, date and time of the
request, request page, referrer, and information on the visitor’s Web browser and operating system.
The same basic collected information can be displayed in a variety of ways. Although the format of
the log file is ultimately the decision of the company who runs the Web server, the following four
formats are a few of the most popular: NCSA Common Log, NCSA Combined Log, NCSA Separate
Log, and W3C Extended Log.
The NCSA Common Log format (also known as Access Log format) contains only basic in-
formation on the page request. This includes the client IP address, client identifier, visitor username,
date and time, HTTP request, status code for the request, and the number of bytes transferred dur-
ing the request. The Combined Log format contains the same information as the common log with
the following three additional fields: the referring URL, the visitor’s Web browser and operating
system information, and the cookie. The Separate Log format (or 3-Log format) contains the same
information as the combined log, but it breaks it into three separate files—the access log, the refer-
ral log, and the agent log. The date and time fields in each of the three logs are the same. Table 4.1
shows examples of the common, combined, and separate log file formats (notice that default values
are represented by a dash (-).
Similarly, W3C provides an outline for standard formatting procedures. This format differs
from the first three in that it aims to provide for better control and manipulation of data while
26 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

TABLE 4.1: NCSA log comparison.

TYPE OF LOG EXAMPLE ENTRY

NCSA Common Log 111.222.125.125 - jimjansen [10/Oct/2009:21:15:05 +0500] “GET


/index.html HTTP/1.0” 200 1043

NCSA Combined 111.222.125.125 - jimjansen [10/Oct/2009:21:15:05 +0500] “GET


Log /index.html HTTP/1.0” 200 1043 “http://ist.psu.edu/faculty_pages/
jjansen/ ” “Mozilla/4.05 [en] (WinNT; I)” “USERID=CustomerA;
IMPID=01234”

NCSA Separate Log Common Log:


111.222.125.125 - jimjansen [10/Oct/2009:21:15:05 +0500] “GET
/index.html HTTP/1.0” 200 1043
Referral Log:
[10/Oct/2009:21:15:05 +0500] “http://ist.psu.edu/faculty_pages/
jjansen/ ”
Agent Log:
[10/Oct/2009:21:15:05 +0500] “Microsoft Internet Explorer - 7.0”

TABLE 4.2: W3C extended log file.

TYPE OF LOG EXAMPLE ENTRY

W3C Extended Log #Software: Microsoft Internet Information Services 6.0


#Version: 1.0
#Date: 2009 -05-24 20:18:01
#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem
cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User-Agent)
cs(Referrer)
2009-05-24 20:18:01 172.224.24.114 - 206.73.118.24
80 GET /Default.htm - 200 7930 248 31 Mozilla/
4.0+(compatible;+MSIE+7.01;+Windows+2000+Server)
http://54.114.24.224/
DATA COLLECTION FOR WEB ANALYTICS 27

still producing a log file readable by most Web analytics tools. The extended format contains user
defined fields and identifiers followed by the actual entries, and default values are represented by a
dash (-). Table 4.2 shows an example of an extended log file.
System log files offer several benefits for gathering data for analysis. First, using system log
files does not require any changes to the Website or any extra software installation to create the log
files. Second, because Web servers automatically create these logs and store them on a company’s
own servers, the company has freedom to change their Web analytics tools and strategies at will. Ad-
ditionally, using system log files does not require any extra bandwidth when loading a page, and since
everything is recorded server-side, it is possible to log both page request successes and failures.
Using log files also has some disadvantages. One major disadvantage is that the collected data
is limited to only transactions with the Web server. This means that they cannot log information in-
dependent from the servers, such as the physical location of the visitor. Similarly, while it is possible
to log cookies, the server must be specifically configured to assign cookies to visitors in order to do
so. The final disadvantage is that while it is useful to have all the information stored on a company’s
own servers, the log file method is only available to those who own their Web servers.

4.2 PAGE TAGGING


The second method for recording data for Web analytics is page tagging. In page tagging, an invis-
ible image is used to detect when a page has been successfully loaded and then triggers JavaScript
to send information about the page and the user back to a remote server. According to Peterson
[118] the variables used and amount of data collected in page tagging are dependent on the Web
analytics vendor. Some vendors stress short, easy to use page tags while others emphasize specific
tags that require little post-processing. The best thing to look for with this method, however, is flex-
ibility—being able to use all, part, or none of the tag depending on the needs of the page.
This method of gathering user data offers several benefits. The first is speed of reporting.
Unlike a log file, the data received via page tagging is parsed as it comes in. This allows for near real-
time reporting. Another benefit is flexibility of data collection. More specifically, it is easier to record
additional information about the user without involving a request to the Web server. Examples of
such information include information about a user’s screen size, the price of purchased goods, and
interactions within Flash animations. This is also a useful method of gathering data for companies
that do not run their own Web servers or do not have access to the raw log files for their site (such
as blogs).
Page tagging also entails some disadvantages, most of which are centered on the extra code
that must be added to the Website. This extra code requires the page to use more bandwidth each
time it loads, and it also makes it harder to change analytics tools because the code embedded in the
Website would have to be changed or deleted entirely. The final disadvantage is that page tagging is
28 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

TABLE 4.3: Web server log files versus page tagging [19].

LOG FILES (ALL TYPES) PAGE TAGGING (ALL APPROACHES)

ADVANTAGES DISADVANTAGES ADVANTAGES DISADVANTAGES

Does not require Can only record Near real-time Requires extra
changes to the Website interactions with reporting code added to the
or extra hardware the Web server Website
installation

Does not require extra Server must be Easier to record Uses extra
bandwidth configured to assign additional information bandwidth each
cookies to visitors time the page loads

Freedom to change Only available to Able to capture visitor Can only record
tools with a relatively companies who run interactions within successful page
small amount of hassle their own Web servers Flash animations loads, not failures

Logs both page request Cannot log physical Hard to switch


successes and failures location analytic tools

only capable of recording page loads, not page failures. If a page fails to load, it means that the tag-
ging code also did not load, and there is therefore no way to retrieve information in that instance.
Although log files and page tagging are two distinct ways to collect information about the
Website users or visitors, it is possible to use both together, and many analytics companies provide
ways to use both methods to gather data. Even so, it is important to understand the strengths and
weaknesses of both. Table 4.3 presents the advantages and disadvantages of log file analysis and
page tagging.

4.3 CONCLUSION
Regardless of whether log files or page tagging is used (or new approaches that may be developed),
the data will eventually end up in a log file for analysis. In other words, while the data collection may
differ, the method of analysis remains the same.

• • • •
29

CHAPTER 5

Web Analytics Fundamentals

To understand and derive the benefits of Web analysis, one must first understand metrics, the dif-
ferent kinds of measures available for analyzing user information [19, 111]. Although metrics may
seem basic, once collected we can use these metrics to analyze Web traffic and improve a Website
to meet better the expectations of the site’s traffic. These metrics generally fall into one of four
categories: site usage, referrers (or how visitors arrived at the site), site content analysis, and quality
assurance. Table 5.1 shows examples of types of metrics that we might find in these categories.
Although the type and overall number of metrics varies with different analytics vendors, a set
of basic metrics is common to most. Table 5.2 outlines eight widespread types of information [63]
that measure who is visiting a Website and what they do during their visits, relating each of these
metrics to specific categories.
Each metric is discussed below.

5.1 VISITOR TYPE


Since analyzing Website traffic first became popular in the 1990s with the Website counter, the
measure of Website traffic has been one of the most closely watched metrics. This metric, however,
has evolved from merely counting the number of hits a page receives into counting the number of
individuals who visit the Website.
Ignoring the software robots that can make up a large portion of traffic [68], there are two
types of visitors: new visitors, meaning those who have not previously visited the site, and repeat
visitors, meaning those who have been to the site previously. In order to track visitors in such a way,
a system must be able to determine individual users who access a Website; each individual user is
called a unique visitor. Ideally, a unique visitor is just one visitor, but this is not always the case. It is
possible that multiple users access the site from the same computer (perhaps on a shared household
computer or a public library). In addition, most analytic software relies on cookies to track unique
users. If a user disables cookies in the browser or if they clear their cache, then the visitor will be
counted as new each time he or she enters the site.
Because of this, some companies have instead begun to track unique visits or sessions. A
session begins once a user enters the site and ends when a user exits the site or after a set amount
of time of inactivity (usually 30 minutes). The session data does not rely on cookies and can be
30 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

TABLE 5.1: Metrics categories [63].

SITE USAGE REFERRERS SITE CONTENT QUALITY


ANALYSIS ASSURANCE

• Geographic infor- • How many people • Effectiveness of key • Broken pages or


mation place bookmarks to content server errors
• How many people the site • Most popular pages • Visitor response to
repeatedly visit the • The search terms • Top entry pages errors
site people used to find • Top exit pages
• Numbers of visitors your site • Top pages for single
and sessions • Which Websites page view sessions
• Search engine are sending visitors • Top paths through
activity to your site the site

TABLE 5.2: Eight common metrics of Website analysis [19].

METRIC DESCRIPTION CATEGORY

Demographics and The physical location and information of the Site Usage
System Statistics system used to access the Website

Errors Any errors that occurred while attempting to retrieve Quality


the page Assurance

Internal Search Information on keywords and results pages viewed Site Usage
Information using a search engine embedded in the Website

Referrering URL and Which sites have directed traffic to the Website and Referrers
Keyword Analysis which keywords visitors are using to find the Website

Top Pages The pages that receive the most traffic Site Content
Analysis

Visit Length The total amount of time a visitor spends on the Website Site Usage

Visitor Path The route a visitor uses to navigate through the Site Content
Website Analysis

Visitor Type Who is accessing the Website (returning, unique, etc.) Site Usage
WEB ANALYTICS FUNDAMENTALS 31

measured easily. Since there is less uncertainty with visits, it is considered to be a more concrete and
reliable metric than unique visitors. This approach is also more sales-oriented because it considers
each visit an opportunity to convert a visitor into a customer instead of looking at overall customer
behavior [17].

5.2 VISIT LENGTH


Also referred to as Visit Duration or Average Time on Site, visit length is the total amount of time
a visitor spends on a site during one session. One possible area of confusion when using this metric
is handling missing data. This can be caused either by an error in data collection or by a session
containing only one page visit or interaction. Since the visit length is calculated by subtracting the
time of the visitor’s first activity on the site from the time of the visitor’s final activity, when one of
those pieces of data is missing, according to the WAA, the visit length is calculated as zero [23].
When analyzing the visit length, the measurements are often broken down into chunks of
time. StatCounter, for example, uses the following time categories [63]:

• Less than 5 seconds


• 5 seconds to 30 seconds
• 30 seconds to 5 minutes
• 5 minutes to 20 minutes
• 20 minutes to 1 hour
• Greater than 1 hour

The goal of measuring the data in this way is to keep the percentage of visitors who stay on
the Website for less than five seconds as low as possible. If visitors stay on a Website for such a short
amount of time, either they arrived at the site by accident or the site did not have relevant informa-
tion. By combining this information with information from referrers and keyword analysis, we can
determine which sites are referring well-targeted traffic and which sites are referring poor quality
traffic.

5.3 DEMOGRAPHIC AND SYSTEM STATISTICS


For some companies, well-targeted traffic means region-specific traffic. For example, if an e-commerce
site can only ship its goods to people in Spain, any traffic to the site from outside of Spain is ir-
relevant. The demographic metric refers to the physical location of the system used to make a page
request. This information can be useful for a Website that provides region-specific services. In ad-
dition, region-specific Websites also want to make sure they tailor their content to the group they
are targeting; thus, demographic information can also be combined with information on referrers to
determine if a referral site is directing traffic to a site from outside a company’s regions of service.
32 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

In addition to demographic location, companies also need information about the hardware
and software with which visitors access a Website, and system statistics provide information such
as browser type, screen resolution, and operating system. By using this information, companies can
tailor their Websites to meet visitors’ technical needs, thereby ensuring that all customers can access
the Websites.

5.4 INTERNAL SEARCH


If a Website includes a site-specific search utility, then it is also possible to measure internal search
information. This can include not only keywords but also information about which results pages
visitors found useful. There are several uses for analyzing internal search data [3]:

• Identify products and services for which customers are looking, but that are not yet pro-
vided by the company
• Identify products that are offered, but which customers have a hard time finding
• Identify customer trends
• Improve personalized messages by using the customers’ own words
• Identify emerging customer service issues
• Determine if customers are provided with enough information to reach their goals
• Make personalized offers

By analyzing internal search data, we can use the information to improve and personalize the
visitors’ experiences.

5.5 VISITOR PATH


Excluding visitors who leave the site as soon as they enter, each visitor creates a path of page views
and actions while perusing the site. By studying these paths, we can identify any difficulties a user
has viewing a specific area of the site or completing a certain action (such as making a transaction
or completing a form).
According to an article by the WAA, there are two schools of thought regarding visitor path
analysis. The first is that visitor actions are goal-driven and performed in a logical, linear fashion.
For example, if a visitor wants to purchase an item, the visitor will first find the item, add it to the
cart, and proceed to the checkout to complete the process. Any break in that path (i.e., not complet-
ing the order) signifies user confusion and is viewed as a problem.
The second school of thought is that visitor actions are random and illogical and that the only
path that can provide accurate data on a visitor’s behavior is the path from one page to the page im-
WEB ANALYTICS FUNDAMENTALS 33

mediately following it. In other words, the only page that influences visitors’ behavior on a Website
is the one they are currently viewing. For example, visitors on a news site may merely peruse the
articles with no particular goal in mind. This method of analysis is becoming increasingly popular
because companies find it easier to examine path data in context without having to reference the
entire site in order to study the visitors’ behavior.

5.6 TOP PAGES


The first page a visitor views makes the greatest impression about the Website. These first pages are
called top pages and generally fall into three categories: top entry pages, top exit pages, and most popular
pages. By knowing the top entry page, organizations can ensure that the page has relevant informa-
tion and provides adequate navigation to important parts of the site. Similarly, identifying popular
exit pages makes it easier to pinpoint areas of confusion or missing content.
The most popular pages are the areas of a Website that receive the most traffic. This metric
gives insight into how visitors are utilizing the Website and which pages are providing the most
useful information. This kind of information shows whether the Website’s functionality matches up
with the organization’s goals; if most of the Website’s traffic is being directed away from the main
pages of the site, the Website cannot function to its full potential.

5.7 REFERRERS AND KEYWORD ANALYSIS


Users often reach Websites through a referral page, the page visited immediately before entering a
Website, or rather, a site that has directed the user (a.k.a., traffic) to the Website. A search engine
result page link, a blog entry mentioning the Website, and a personal bookmark are examples of
referrers. By using this metric, organizations can determine advertising effectiveness and search en-
gine popularity. As always, it is important to look at this information in context. If a certain referrer
is doing worse than expected, it could be caused by the referring link text or placement rather than
the quality of the referrer. Conversely, an unexpected spike in referrals from a certain page could be
either good or bad depending on the content of the referring page.
In the same way that the referrer metric helps us to assess referrer effectiveness, keyword
analysis helps us to measure the referrer value of referring search engines and shows which keywords
have brought in the most traffic. By analyzing the keywords visitors use to find a page, we can de-
termine what visitors expect to gain from the Website and use that information to better tailor the
Website to their needs. It is also important to consider the quality of keywords. Keyword quality is
directly proportional to revenue and can be determined by comparing keywords with visitor path
and visit. Good keywords will bring quality traffic and more income to the site.
34 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

5.8 ERRORS
Errors are the final metric. Tracking errors has the obvious benefit of being able to identify and fix
any errors in the Website, but it is also useful to observe how visitors react to these errors. The fewer
visitors who are confused by errors on a Website, the less likely visitors are to exit the site because
of an error.

5.9 CONCLUSION
Once we understand these eight fundamental metrics, we can begin to develop a coherent Web
analytics strategy.

• • • •
35

CHAPTER 6

Web Analytics Strategy

Through unobtrusive transaction logs and page tags we can gather a massive amount of data about
user–system interaction, and by employing fundamental metrics we can evaluate human behavior
within the interactional context. In order to gain the most from these massive datasets, however, we
must strategically select and employ the fundamental metrics in relation to KPIs. For example, by
collecting various Web analytics metrics, such as number of visits and visitors and visit duration, we
can develop KPIs, thereby creating a versatile analytic model that measures several metrics against
each other to define visitor trends [19, 111]. One primary concern in developing a coherent Web
analytic strategy is understanding the relationships among the foundational metrics and KPIs.

6.1 KEY PERFORMANCE INDICATORS


KPIs provide an in-depth picture of visitor behavior on a site. This information allows organiza-
tions to align their Websites’ goals with their business goals for the purpose of identifying areas of
improvement, promoting popular parts of the site, testing new site functionality, and ultimately
increasing revenue. This section covers the most common metrics, different ways for gathering
metrics, methods for utilizing KPIs, best key practices, and the selection criteria for choosing the
right Web analytics tool. In brief, this section describes the overall process of and provides advice
for Web analytics integration, and discusses the future of Web analytics.
Before beginning this discussion, we need to clarify the exact meaning of frequently used
terms. For our purpose, we will use the following definitions [20]:

• Measurement: In the most general terms, measurement can be regarded as the assignment
of numbers to objects (or events or situations) in accord with some rule (measurement
function). The property of the objects that determines the assignment according to that
rule is called magnitude, the measurable attribute; the number assigned to a particular
object is called its measure, the amount or degree of its magnitude. Importantly, the rule
defines both the magnitude and the measure.
36 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

• Web Page Significance: Significance metrics formalize the notions of “quality” and “rel-
evance” of Web pages with respect to users’ information needs. Significance metrics are
employed to rate candidate pages in response to a search query and to influence the quality
of search and retrieval on the Web.
• Usage Characterization: Patterns and regularities in the way users browse Web resources
can provide invaluable clues for improving the content, organization, and presentation of
Websites.

Usage characterization metrics measure user behavior for this purpose.

6.2 WEB ANALYTICS PROCESS


Although we can collect metrics from a Website, we must be mindful of how we gather them and
how we use them to select and filter information. Effective design and use of Web analytics requires
us to do so [24]. How can the strategic use of Web analytics help improve an organization? To
answer this, the WAA provides nine key best practices to follow when analyzing a Website [101].
Figure 6.1 outlines this process.

6.2.1 Identify Key Stakeholders


The first step in the process of Web analytics is to identify the key stakeholders, meaning anyone
who holds an interest in the Website. This designation includes management, site developers, visi-
tors, and anyone else who creates, maintains, uses, or is affected by the site. In order for the Website

Web Analytics Process Guide


Determine
Identify the Most
Identify Key Define Primary the Key
Important Site
Stakeholders Goals Performance
Visitors
Indicators

Identify and Hire and


Use Multiple Make
Implement Empower a
Technologies Improvements
the Right Full-Time
and Methods Iteratively
Solution Analyst

Establish a Process of
Continuous Improvement

FIGURE 6.1: The process of Web analytics [101].


WEB ANALYTICS STRATEGY 37

to be truly beneficial, it must integrate input from all major stakeholders. Involving people from
different parts of the company also makes it more likely that they will embrace the Website as a
valuable tool.

6.2.2 Define Primary Goals for Your Website


Knowing the key stakeholders and understanding their primary goals assists us in determining the
primary goals of the Website. Such goals could include increasing revenue, cutting expenses, and
increasing customer loyalty [101]. After defining the Website’s goals, it is important to prioritize
them in terms of how the Website can most benefit the company. While seemingly simplistic, the
task can be challenging. Political conflict between stakeholders and their individual goals as well as
inaccurate assumptions they may have made while determining their goals can derail this process.
Organizational leaders may need to be consulted to manage the conflict, but we can assist by keep-
ing the discussion focused on the overarching organizational goals and Website capabilities. By
going through this process, a company can minimize conflict among competing goals and maintain
relationships with various stakeholders.

6.2.3 Identify the Most Important Site Visitors


One group of stakeholders that is critical to Websites is visitors. According to Sterne, corporate
executives categorize their visitors in terms of importance. Most companies classify their most im-
portant visitors as ones who either visit the site regularly, stay the longest on the site, view the most
pages, purchase the most goods or services, purchase goods most frequently, or spend the most
money [143]. There are three types of customers: (1) customers a company wants to keep who
have a high current value and high future potential, (2) customers a company wants to grow who
can either have a high current value and low future potential or low current value and high future
potential, and (3) customers a company wants to eliminate who have a low current value and low
future potential. The most important visitor to a Website, however, is the one who ultimately brings
in the most revenue. Categorizing visitors as customer types enables us to consider their goals more
critically. What improvements can we make to the Website in order to improve visitors’/customers’
browsing experiences and intentionally to grow more visitors and revenue?

6.2.4 Determine the Key Performance Indicators


To assist organizations in determining how to improve their Website, Web analytics offers the stra-
tegic use and monitoring of KPIs. This involves picking the metrics that will be most beneficial in
improving the site and eliminating the ones that will provide little or no insight into its goals. The
38 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

Website type—commerce, lead generation, media/content, or support/self-service—plays a key role


in which KPIs are most effective for analyzing site traffic.

6.2.5 Identify and Implement the Right Solution


After the KPIs have been defined, the next step is identifying the best Web analytics technology to
meet the organization’s specific needs. The most important things to consider are the budget, soft-
ware flexibility, ease of use, and compatibility between the technology, and the metrics. McFadden
wisely suggests that a pilot test of the top two vendor choices can ease the decision making [101].
We will expand on this topic in the next section.

6.2.6 Use Multiple Technologies and Methods


Web analytics is not the only method available for improving a Website. To achieve a more holistic
view of a site’s visitors, we can also use tools such as focus groups, online surveys, usability studies,
and customer services contact analysis [101].

6.2.7 Make Improvements Iteratively


While we all may want to address identified Website problems quickly, we should make only grad-
ual improvements. This incremental method allows us to monitor whether a singular change is an
improvement and how much of an improvement. Importantly, it also provides us with the oppor-
tunity to assess the change within the overall Website context and observe what systemic issues the
small change may have created.

6.2.8 Hire and Empower a Full-Time Analyst


Monitoring and assessing a Website’s effectiveness is a complex and compounding process. For
these reasons, it is important to have one person who consistently analyzes the data generated and
determine when, how, and why new information is needed. According to the WAA, a good ana-
lyst understands organizational needs (which means communicating well with stakeholders); has
knowledge of technology and marketing; has respect, credibility, and authority; and is already a
company employee. Although it may seem like hiring a full-time analyst is expensive, many experts
agree that the return on revenue should be more than enough compensation [101].

6.2.9 Establish a Process of Continuous Improvement


Once the Web analysis process is established, continuous evaluation is paramount. This means
reviewing the goals and metrics and monitoring new changes and features as they are added to
WEB ANALYTICS STRATEGY 39

the Website. It is important that the improvements are adding value to the site and meeting
expectations.

6.3 CHOOSING A WEB ANALYTICS TOOL


Once one decides what is wanted from Web analysis, it is time to find the right tool. Kaushik
outlines ten important questions to ask Web analytics vendors [79]:

• What is the difference between your tool and free Web analytics tools? Since the com-
pany who owns the Website will be paying money for a service, it is important to know
why that service is better than free services (e.g., Google Analytics). Look for an answer
that outlines the features and functionality of the vendor. Do not look for answers about
increased costs because of privacy threats or poor support offered by free analytics tools.
• Do you offer a software version of your tool? Generally, a business will want to look for
a tool that is software based and that can run on their own servers. If a tool does not have
a software version but plans to make one in the future, it shows insight into how prepared
they are to offer future products if there is interest.
• What methods do you use to capture data? As stated earlier, there are two main ways to
capture visitor data from a Website: log files and page tagging. Ideally, we prefer a vendor
that offers both, but what they have used in the past is also important. Because technol-
ogy is constantly changing, we want a company that has a history of keeping up with and
perhaps even anticipating market changes and that has addressed these dynamics through
creative solutions.
• Can you help me calculate the total cost of ownership for your tool? The total cost of
ownership for a Web analytics tool depends on the specific company, the systems they have
in place, and the pricing of the prospective Web analytics tool. In order to make this calcu-
lation, we must consider the following:
1. Cost per page view
2. Incremental costs (i.e., charges for overuse or advanced features)
3. Annual support costs after the first year
4. Cost of professional services (i.e., installation, troubleshooting, or customization)
5. Cost of additional hardware we may need
6. Administration costs (which includes the cost of an analyst and any additional employ-
ees we may need to hire)
• What kind of support do you offer? Many vendors advertise free support, but it is impor-
tant to be aware of any limits that could incur additional costs. It is also important to note
how extensive their support is and how willing they are to help.
40 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

• What features do you provide that will allow me to segment my data? Segmentation al-
lows companies to manipulate their data. Look for the vendor’s ability to segment the data
after it is recorded. Many vendors use JavaScript tags on each page to segment the data as
it is captured, meaning that the company has to know exactly what it wants from the data
before having the data itself; this approach is less flexible.
• What options do I have to export data into our system? It is important to know who ulti-
mately owns and stores the data and whether it is possible to obtain both raw and processed
data. Most vendors will not provide companies with the data exactly as they need it, but it
is a good idea to realize what kind of data is available before making a final decision.
• Which features do you provide for integrating data from other sources into your tool?
Best practice, as noted previously, recommends using multiple technologies and methods in
order to inform decision making. If a company has other data it wants to bring to the tool
(such as survey data or data from an ad agency), then it is important to know whether this
information can be integrated into the vendor’s analytic tool.
• What new features are you developing that would keep you ahead of your competition?
Not only will the answer to this question tell how much the vendor has thought about
future functionality, but it will also show how much they know about their competitors.
If they are trying to anticipate changes and market demands, then they should be well in-
formed about their competition.
• Why did you lose your last two clients? Who are they using now? The benefits of this
question are obvious—by knowing how they lost previous business, the business can be
confident that it has made the right choice.

6.4 CONCLUSION
With an effective Web analytics strategy in place, we can turn our attention to understanding user
behaviors and identifying necessary or potentially beneficial system improvements. In practice, this
is rarely the end. Web analytics strategy typically supports some overarching goals.

• • • •
41

CHAPTER 7

Web Analytics as Competitive


Intelligence

In order to get the most out of Web analytics, we must first effectively choose which metrics to ana-
lyze and then combine them in meaningful ways [19].This means knowing the Website’s business
goals and then determining which KPIs will provide the most insight for these business goals.

7.1 DETERMINING APPROPRIATE KEY PERFORMANCE


INDICATORS
To determine appropriate KPIs, one must know your business goals. Every company has specific
business goals. Every part of the company works together to achieve them, and the company Web-
site is no exception. In order for a Website to be beneficial, information gathered from its visitors
must not merely show what has happened in the past, but it must also be able to improve the site
for future visitors. The company must have clearly defined goals and must use this information to
support strategies that will help it achieve those goals.
For a Website, the first step is making sure the data collected from the site is actionable. Ac-
cording to the WAA [101], in order for a company to collect actionable data, it must meet three
criteria [144]:

• the business goals must be clear,


• technology, analytics, and the business must be aligned, and
• the feedback loop must be complete.

There are many possible methods for meeting these criteria. One is Alignment-Centric Per-
formance Management [14]. This approach goes beyond merely reviewing past customer trends to
carefully selecting a few key KPIs based on future business objectives. Even though a wealth of met-
rics is available from a Website, not all of the metrics are relevant to a company’s needs. Moreover,
reporting large quantities of data is overwhelming, so it is important to look at metrics in context
and use them to create KPIs that focus on outcome rather than activity. For example, a customer
42 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

service Website might view the number of emails responded to on the same day they were sent as
a measurement of customer satisfaction. A better way to measure customer satisfaction, however,
might be to survey the customers on their experience. Although this measurement is subjective, it
is a better representation of customer satisfaction because even if a customer receives a response the
same day he or she sent an email, the customer may still be dissatisfied with the service experience
[14].
Following “The Four M’s of Operational Management,” as outlined by Becher [14], can
facilitate effective selection of KPIs:

• Motivate: ensure that goals are relevant to everyone involved.


• Manage: encourage collaboration and involvement for achieving these goals.
• Monitor: once selected, track the KPIs and quickly deal with any problems that may
arise.
• Measure: identify the root causes of problems and test any assumptions associated with
the strategy.

By carefully choosing a few, quality KPIs to monitor and making sure everyone is involved
with the strategy, we can more easily align a Website’s goals with the company’s goals because the
information is targeted and stakeholders are actively participating.
Another method for ensuring actionable data is Online Business Performance Management
(OBPM) [132]. This approach integrates business tools with Web analytics to help companies make
better decisions quickly in an ever-changing online environment where customer data is stored in
a variety of different departments. The first step in this strategy is gathering all customer data in a
central location and condensing it so that the result is all actionable data. Once this information is
in place, the next step is choosing relevant KPIs that align with the company’s business strategy and
then analyzing expected versus actual results [132].
In order to choose the best KPIs and measure the Website’s performance against the goals of
a business, there must be effective communication between senior executives and online managers.
The two groups should work together to define the relevant performance metrics, the overall goals
for the Website, and the performance measurements. This method is similar to Alignment-Centric
Performance Management in that it aims to aid integration of the Website with the company’s
business objectives by involving major stakeholders. The ultimate goals of OBPM are increased
confidence, organizational accountability, and efficiency [132].
Of course, one must identify KPIs based on the Website type. Unlike metrics, which are
numerical representations of data collected from a Website, KPIs are tied to a business strategy and
are usually measured by a ratio of two metrics. By choosing KPIs based on the Website type, a busi-
WEB ANALYTICS AS COMPETITIVE INTELLIGENCE 43

TABLE 7.1: The four types of Websites and examples of associated KPIs [101].

WEBSITE TYPE KPIS

Commerce • Average order value


• Average visit value
• Bounce rate
• Conversion rates
• Customer loyalty

Content/Media • New visitor ratio


• Page depth
• Returning visitor ratio
• Visit depth

Lead Generation • Bounce rate


• Conversion rates
• Cost per lead
• Traffic concentration

Support/Self-service • Bounce rate


• Customer satisfaction
• Page depth
• Top internal search phrases

ness can save both time and money. Although Websites can have more than one function, each site
belongs to at least one of the four main categories: commerce, lead generation, content/media, and
support/self-service [101]. Table 7.1 shows common KPIs for each Website type.
We discuss each Website type and related KPIs below.

7.1.1 Commerce
The goal of a commerce Website is to get visitors to purchase goods or services directly from the
site, with success gauged by the amount of revenue the site brings in. According to Peterson, “com-
merce analysis tools should provide the ‘who, what, when, where, and how’ for your online purchas-
ers” [118, p. 92]. In essence, the important information for a commerce Website is to answer the
following questions: Who made (or failed to make) a purchase? What was purchased? When were
44 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

purchases made? From where are customers coming? How are customers making their purchases?
The most valuable KPIs used to answer these questions are conversion rates, average order value,
average visit value, customer loyalty, and bounce rate [101]. Other metrics to consider with a com-
merce site are which products, categories, and brands are sold on the site and an internal site product
search that could signal navigation confusion or a new product niche [118].
A conversion rate is the number of users who perform a specified action divided by the total
of a certain type of visitor (i.e., repeat visitors, unique visitors, etc.) over a given period. Types of
conversion rates will vary by the needs of the businesses using them, but two common conversion
rates for commerce Websites are the order conversion rate (the percent of total visitors who place
an order on a Website) and the checkout conversion rate (the percent of total visitors who begin
the checkout process). There are also many methods for choosing the group of visitors on which to
base the conversion rate. For example, businesses may want to filter visitors by excluding visits from
robots and Web crawlers [5], or they may want to exclude the traffic that “bounces” from the Web-
site or (a slightly trickier measurement) the traffic that is determined not to have intent to purchase
anything from the Website.
Commerce Websites commonly have conversion rates of around 0.5%, but generally good
conversion rates will fall in the 2% range, depending on how a business structures its conversion rate
[41]. Again, the ultimate goal is to increase total revenue. According to eVision, a search engine
marketing company, for each dollar a company spends on improving this KPI, there is 10 to 100
multiple return [39]. The methods a business uses to improve the conversion rate (or rates), how-
ever, are different depending on which target action that business chooses to measure.
Average order value is a ratio of total order revenue to number of orders over a given period.
This number is important because it allows the analyst to derive a cost for each transaction. There
are several ways for a business to use this KPI to its advantage. One way is to break down the
average order value by advertising campaigns (i.e., email, keyword, banner ad, etc.). In this way, a
business can see which campaigns are bringing in the best customers and then opt to spend more
effort refining strategies in those areas [119]. Overall, however, if the cost of making a transaction is
greater than the amount of money customers spend for each transaction, then the site is not fulfill-
ing its goal. There are two main ways to correct this. The first is to increase the number of products
customers order per transaction, and the second is to increase the overall cost of purchased products.
A good technique for achieving either of these goals is product promotions [101], but many factors
influence how and why customers purchase what they do on a Website. These factors are diverse
and can range from displaying a certain security image on the site [97] to updating the site’s internal
search [161]. Like many KPIs, improvement ultimately comes from ongoing research and a small
amount of trial and error.
WEB ANALYTICS AS COMPETITIVE INTELLIGENCE 45

Another KPI, average visit value, measures the total number of visits to the total revenue and
essentially informs businesses about the traffic quality. It is problematic for a commerce site when,
even though it may have many visitors, each visit generates only a small amount of revenue. In that
case, increasing the total number of visits would likely increase profits only marginally. The average
visit value KPI is also useful for evaluating the effectiveness of promotional campaigns. If the aver-
age visit value decreases after a specific campaign, it is likely that the advertisement is not attracting
quality traffic to the site. Another less common factor in this situation could be broken links or a
confusing layout in a site’s “shopping cart” area. A business can improve the average visit value by
using targeted advertising and employing a layout that reduces customer confusion.
One way to assess customer quality is to identify customer loyalty. This KPI is the ratio of
new to existing customers. Many Web analytics tools measure this using visit frequency and transac-
tions, but there are several important factors in this measurement including the time between visits
[100]. Customer loyalty can even be measured simply with customer satisfaction surveys [133].
Loyal customers will not only increase revenue through purchases but also through referrals, poten-
tially limiting advertising costs [123].
Yet another KPI that relates to customer quality is bounce rate. Essentially, bounce rate
measures how many people arrive at a homepage and leave immediately. Two scenarios gener-
ally qualify as a bounce. In the first scenario, a visitor views only one page on the Website. In the
second scenario, a visitor navigates to a Website but only stays on the site for 5 seconds or less [6].
This could be due to several factors, but in general visitors who bounce from a Website are not
interested in the content. Like average order value, this KPI helps show how much quality traf-
fic a Website receives. A high bounce rate may reflect counterintuitive site design or misdirected
advertising.

7.1.2 Lead Generation


The goal for a lead generation Website is to obtain user contact information in order to inform them
of a company’s new products and developments and to gather data for market research; these sites
primarily focus on products or services that cannot be purchased directly online. Examples of lead
generation include requesting more information by mail or email, applying online, signing up for a
newsletter, registering to download product information, and gathering referrals for a partner site
[23]. The most important KPIs for lead generation sites are conversion rates, cost per lead (CPL),
bounce rate, and traffic concentration [101].
Similar to commerce Website KPIs, a conversion rate is the ratio of total visitors to the
amount of visitors who perform a specific action. In the case of lead generation Websites, the most
46 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

common conversion rate is the ratio of total visitors to leads generated. The same visitor filtering
techniques mentioned in the previous section can be applied to this measurement (i.e., filtering
out robots and Web crawlers and excluding traffic that bounces from the site). This KPI is an es-
sential tool in analyzing marketing strategies. Average lead generation sites have conversion rates
ranging from 5–6% to 17–19% for exceptionally good sites [46]. Conversion rates that increase
after the implementation of a new marketing strategy indicate that the campaign was successful.
Decreases in conversion rates indicate that the campaign was not effective and probably needs to be
reworked.
Another way to measure marketing success is CPL, which is the ratio of total expenses to total
number of leads or how much it costs a company to generate a lead; a more targeted measurement
of this KPI would be the ratio of total marketing expenses to total number of leads. A good way to
measure the success of this KPI is to make sure that the CPL for a specific marketing campaign is
less than the overall CPL [155]. Ideally, the CPL should be low, and well-targeted advertising is
usually the best way to achieve this.
Lead generation bounce rate is the same measurement as the bounce rate for commerce sites.
This KPI measures visitor retention based on total number of bounces to total number of visitors
(a bounce is characterized by a visitor entering the site and immediately leaving). Lead generation
sites differ from commerce sites in that they may not require the same level of user interaction. For
example, a lead generation site could have a single page where users enter their contact information.
Even though they only view one page, the visit is still successful if the Website is able to collect the
user’s information. In these situations, it is best to base the bounce rate solely on time spent on the
site. As with commerce sites, the best way to decrease a site’s bounce rate is to increase advertising
effectiveness and decrease visitor confusion.
The final KPI is traffic concentration, or the ratio of the number of visitors to a certain area
in a Website to total visitors. This KPI shows which areas of a site have the most visitor interest. For
lead generation Websites, it is ideal to have a high traffic concentration on the page or pages where
users enter their contact information.

7.1.3 Content/Media
Content/media Websites focus mainly on advertising, and the main goal of these sites is to increase
revenue by keeping visitors on the Website longer and to keep visitors coming back to the site. In
order for these types of sites to succeed, site content must be engaging and frequently updated. If
content is only part of a company’s Website, the content used in conjunction with other types of
pages can be used to draw in visitors and provide a way to immerse them in the site. The main KPIs
are visit depth, returning visitors, new visitor percentage, and page depth [101].
WEB ANALYTICS AS COMPETITIVE INTELLIGENCE 47

Visit depth (also referred to as depth of visit or path length) is the measurement of the ratio
between page views and unique visitors, or how many pages a visitor accesses each visit. As a general
rule, visitors with a higher visit depth interact more with the Website. If visitors are only viewing
a few pages per visit, then they are not engaged, indicating that the site’s effectiveness is low. One
way to increase a low average visit depth is by creating more targeted content that would be more
interesting to the Website’s target audience. Another strategy could be increasing the site’s interac-
tivity to encourage the users to become more involved with the site and to motivate them to return
frequently.
Unlike the metric of simply counting the number of returning visitors on a site, the return-
ing visitor KPI is the ratio of unique visitors to total visits. A factor in customer loyalty, this KPI
measures the effectiveness of a Website attracting repeat visitors. A lower ratio for this KPI is best
because it indicates more repeat visitors and more visitors who are interested in and trust the con-
tent of the Website. If this KPI is too low, however, it might signal problems in other areas such as
a high bounce rate or even click fraud. Click fraud occurs when a person or script is used to gener-
ate visits to a Website without having genuine interest in the site. According to a study by Blizzard
Internet Marketing, the average for returning visitors to a Website is 23.7% [157]. As with many of
the other KPIs for content/media Websites, the best way to improve the returning visitor rate is by
having quality content and encouraging interaction with the Website.
Content/media sites are also interested in attracting new visitors, and the new visitor ratio
compares new visitors with unique visitors to determine if a site is attracting new people. New visi-
tors can be brought to the Website in a variety of different ways, so a good way to increase this KPI
is to try different marketing strategies to determine which campaigns bring the most (and the best)
traffic to the site. When using this KPI, we must keep the Website’s goal in mind. Specifically, is
the Website intended more to retain or to attract customers? When measuring this KPI, the age
of the Website plays a role—newer sites will want to attract new people. As a rule, however, the
new visitor ratio should decrease over time as the returning visitor ratio increases. The final KPI for
content/media sites is page depth. This is the ratio of page views for a specific page and the number
of unique visitors to that page. This KPI is similar to visit depth, but its measurements focus more
on page popularity. Average page depth can indicate interest in specific areas of a Website over time
and measure whether the interests of the visitors match the goals of the Website. If one particular
page on a Website has a high page depth, then that page is of particular interest to visitors. An ex-
ample of a page in a Website expected to have a higher page depth would be a news page. Informa-
tion on a news page is updated constantly so that, while the page is still always in the same location,
the content of that page is constantly changing. If a Website has high page depth in a relatively
unimportant part of the site, it may signal visitor confusion with navigation in the site or an incor-
rectly targeted advertising campaign.
48 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

7.1.4 Support/Self-Service
Websites offering support or self-service are interested in helping users find specialized answers
for specific problems. The goals for this type of Website are increasing customer satisfaction and
decreasing call center costs; it is more cost-effective for a company to have visitors find information
through its Website than it is to operate a call center. The KPIs of interest are visit length, content
depth, and bounce rate. In addition, other areas to examine are customer satisfaction metrics and
top internal search phrases [101].
Page depth for support/self-service sites is the same measurement as page depth content/me-
dia sites, namely the ratio of page views to unique visitors. With support/self-service sites, however,
high page depth is not always a good sign. For example, a visitor viewing the same page multiple
times may show that the visitor is having trouble finding helpful information on the Website or even
that the information the visitor is looking for does not exist on the site. The goal of these types of
sites is to help customers find what they need as quickly as possible and with the least amount of
navigation through the site. The best way to keep page depth low is to keep visitor confusion low.
As with the bounce rate of other Website types, the bounce rate for support/self-service sites
reflects ease of use, advertising effectiveness, and visitor interest. A low bounce rate means that
quality traffic is coming to the Website and deciding that the site’s information is potentially useful.
Poor advertisement campaigns and poor Website layout will increase a site’s bounce rate.
Customer satisfaction deals with how users rate their experience on a site and is usually
collected directly from the visitors (not from log files), either through online surveys or through
satisfaction ratings. Although it is not a KPI in the traditional sense, gathering data directly from
visitors to a Website is a valuable tool for figuring out exactly what visitors want. Customer satis-
faction measurements can deal with customer ratings, concern reports, corrective actions, response
time, and product delivery. Using these numbers, we can compare the online experience of the
Website’s customers with the industry’s average and make improvements according to visitors’ ex-
pressed needs.
Site navigation is important to visitors, and top internal search phrases, which apply only to
sites with internal search capabilities, can be used to measure what information customers are most
interested in that can inform site navigation improvements. Moreover, internal search phrases can
be used to direct support resources to the areas generating the most user interest, as well as to iden-
tify which parts of the Website users may have trouble accessing. Other problems may also become
obvious. For example, if many visitors are searching for a product not supported on the Website,
then this may indicate that the site’s marketing campaign is ineffective.
Regardless of Website type, the KPIs listed above are not the only KPIs that can prove useful
in analyzing a site’s traffic, but they provide a good starting point. The main thing to remember is
WEB ANALYTICS AS COMPETITIVE INTELLIGENCE 49

that no matter what KPIs a company chooses to use, they must be aligned with its goals, and more
KPIs do not necessarily mean better analysis: quality is more important than quantity.

7.2 CONCLUSION
Any organization, business, or Website must start with clearly defined goals because they are essen-
tial for any successful strategy. With a clearly defined and understood strategy, we can then plan and
implement the tactics necessary for executing this strategy. These tactics are based on KPIs—which
are the measures and metrics of performance. As such, KPIs are the foundation for any Web goal
achievement.

• • • •
51

CHAPTER 8

Supplementary Methods for


Augmenting Web Analytics

While the relatively unobtrusive methods of data collection that we have discussed thus far are very
valuable, proponents of using transaction logs for Web analysis typically admit that the method has
shortcomings [66, 91], as do all methodological approaches. These shortcomings include failing to
understand the affective, situational, and cognitive aspects of system users. Therefore, we must look
to other methods in order to address some of these shortcomings and limitations [124]. Fortunately,
the Web and other information technologies provide a convenient means for employing surveys and
survey research for such a purpose.
As an overview, we discuss surveys and laboratory studies as viable alternative methods for
Web log analysis, and then present a brief review of survey and laboratory research literature, with a
focus on the use of surveys and laboratory studies for Web-related research. The section then identi-
fies the steps in implementing survey research and designing a survey instrument and a laboratory
study.

8.1 SURVEYS
Survey research is a method for gathering information by directly asking respondents about some
aspect of themselves, others, objects, or their environment. As a data collection method, survey in-
struments are very useful for a variety of research designs. For example, researchers can use surveys
to describe current characteristics of a sample population and to discover the relationship among
variables. Surveys gather data on respondents’ recollections or opinions; therefore, surveys provide
an excellent companion method for Web analytics that typically focus exclusively on actual behav-
iors of participants [125].
After reviewing some studies that have used surveys for Web research, we will discuss how to
select, design, and implement survey research.

8.1.1 Review of Appropriate Survey Literature


Although surveys have been used for hundreds of years, the Web provides a remarkable channel for
the use of surveys to conduct data collection [75]. Many of these Internet surveys have focused on
52 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

demographical aspects of Web use over time [83] or one particular Website feature [150]. Treiblma-
ier [148] presents an extensive review of the use of surveys for Website analysis.
Survey respondents may include general Web users or samples from specific populations. For
example, Huang [61] surveyed users of continuing education programs. Similarly, Jeong et al. [76]
surveyed travel and hotel shoppers, and Kim and Stoel [86] surveyed female shoppers who have
purchased apparel online.
For academic researchers, a convenience sample of students is often used to facilitate survey
studies, including the users of Web search engines [139]. McKinney et al. [103] used both under-
graduate and graduate students as their sample examining Website use. The major advantages of
using students that are often cited include a homogeneous sample, access [62], familiarity with the
Internet [67], and creation of experimental settings [130]. There are concerns in generalizing these
results [1], most notably for Websites and services where students have limited domain or system
knowledge [86, 89]. However, as a sample of demographic slice of the Web population, students
appear to be a workable convenience sample with results from studies with students [cf. Refs. 67,
84] similar to those using more rigorous sampling methods [cf. Refs. 51, 83]. Organizations such
as the Pew Research Center’s Internet and American Life Project use random samples of the U.S.
Web population for their surveys [125].
For the Web, the most common type of survey instruments are electronic or Web surveys.
Jansen et al. [75] define an electronic survey as “one in which a computer plays a major role in both
the delivery of a survey to potential respondents and the collection of survey data from actual re-
spondents” (p. 1). Several researchers have examined electronic survey approaches, techniques, and
instruments with respect to methodological issues associated with their use [33, 35, 40, 43, 90, 145].
There have been mixed research results concerning the benefits of electronic surveys [85, 104, 142,
149]. However, researchers generally agree that electronic surveys offer faster response times and
decreased costs. The electronic and Web-based surveys allow for a nearly instantaneous data collec-
tion into a backend database, which reduces potential errors caused by manual transcription.
Regardless of which delivery method is used, survey research requires a detailed project plan-
ning approach.

8.1.2 Planning and Conducting a Survey


Although conducting a survey may appear to be an easy task, the reality is quite the opposite. Suc-
cessful survey research requires detailed planning. The goal of any survey is to shed insight into
how the respondents perceive themselves, their environment, their context, their situation, their
behaviors, or their perceptions of others.
SUPPLEMENTARY METHODS FOR AUGMENTING WEB ANALYTICS 53

To execute a survey, the researcher must identify the content area, construct the survey in-
strument, define the population, select a representative sample, administer the survey instrument,
analyze and interpret the results, and then communicate the results. While these steps are some-
what linear, they also overlap and may require several iterations. A 10-step survey research process
is illustrated in Table 8.1, based on a process outlined in Graziano and Raulin [45].
Steps 1 and 2: Determine the specific information desired and define the population to
be studied. The information being sought and the population to be studied are the first tasks of
the survey researcher. The goals of the survey research will determine both the information being
sought and the target population. Additionally, the goals will drive both the construction and ad-
ministration of the survey. If we use a survey to supplement ongoing Web log analysis, then these
decisions will follow the established parameters.
Step 3: Decide how to administer the survey. There are many possibilities for administering
a survey, ranging from face-to-face (i.e., an interview), to pen and paper, to the telephone (i.e., phone
survey), to the Web (i.e., electronic survey). A survey can also be a mixed mode survey, combining

TABLE 8.1: Process for conducting and implementing a survey [125].

STEP ACTIONS

1 Determine the specific information desired

2 Define the population to be studied

3 Decide how to administer the survey

4 Design a survey instrument

5 Pretest the survey instrument with a subsample

6 Select a sampling approach and representative sample

7 Administer the survey instrument to the sample

8 Analyze the data

9 Interpret the findings

10 Communicate the results to the appropriate audience


54 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

more than one of these approaches. The exact method selected really depends on the answers to steps
one and two (i.e., what information is needed and what population is studied). Used in conjunction
with Web analytics, surveys can be conducted either before or after a laboratory study. A survey can
also be used to gain insight into the demographics of the wider Web population.
Step 4: Designing a survey instrument. Developing a survey instrument involves several
steps. The researcher must determine what questions to ask, in what form, and in what order. The
researcher must construct the survey so that it adequately gathers the information being sought. A
basic rule of survey research is that the instrument should have a clear focus and should be guided
by the research questions or hypotheses of the overall study. This implies that survey research is not
well suited to early exploratory research because it requires some orderly expectations and focus
from the researcher.
Step 5: Pretest the survey instrument with a subsample. Once the researcher has the survey
instrument ready and refined, the researcher must pilot test the survey instrument. In this respect,
a survey instrument is like developing a system artifact, where a system is beta-tested before wider
deployment. Generally, one conducts the pilot test on a sample that represents the population being
studied, after which the researcher may (generally, will) refine the survey instrument further. De-
pending on the extent of the changes, the survey instrument may require another pilot test.
Step 6: Select a sampling approach and representative sample. Selecting an adequate and
representative sample is a critical and challenging factor in survey research. The population for sur-
vey study is the larger group about or from whom the researcher desires to obtain information. From
this population, we need to survey a representative sample. If we are administering a survey to the
respondents of a laboratory study, the representativeness is not a problem because the respondents
are the representative sample. Selecting a representative sample of Web users, however, requires
careful planning.
Whenever we use a sample as a basis for generalizing to a population, we engage in an induc-
tive inference from the specific sample to the general population. In order to have confidence in
inductive inferences from sample to population, the researcher must carefully choose the sample to
represent the overall population. This is especially true for descriptive research, where the researcher
wishes to describe some aspect of a population that may depend on demographic characteristics. In
other cases, such as verifying the application of universal theoretical constructs, for example, Zipf ’s
Law [164], sampling is not as important since these universal constructs should apply to everyone
within the population.
Sampling procedures typically fall into three classifications:

• Convenience sampling (i.e., selecting a sample with little concern for its representative-
ness to some overall population),
SUPPLEMENTARY METHODS FOR AUGMENTING WEB ANALYTICS 55

• Probability sampling (i.e., selecting a sample where each respondent has some known
probability of being included in the sample), and
• Stratified sampling (i.e., selecting a sample that includes representative samples of each
subgroup within a population).

Step 7: Administer the survey instrument to the sample. For actually gathering the survey
data, the researcher must determine the most appropriate manner to administer the survey instru-
ment. Many surveys are administered via the Web or electronically, as the Web offers substantial
benefits in its easy access to a wide population sample. Additionally, administering a survey elec-
tronically, even in a laboratory study, has significant advantages in terms of data preparation for
analysis. The survey can be administered once to a cross-sectional portion of the population, or it
can be administered repeatedly to the same sample population.
Step 8: Analyze the data. Once the data is gathered, we must determine the appropriate
method for analysis. The appropriate form of analysis is dependent on the research questions, hy-
potheses, or types of questions used in the survey instruments. The available approaches are qualita-
tive, quantitative, and mixed methods.
Step 9: Interpret the finding. Like many research results, the interpretation of survey data
can be somewhat subjective. When results are in question, it may point to the need for further re-
search. One of the best aids in interpreting results is the literature review. What have results from
previous work pointed out? Are these results in line with those previous researches? Or do the re-
sults highlight something new?
Step 10: Communicate the results to the appropriate audience. Finally, the results of any
survey research must be packaged for the intended audience. For academic purposes, this may mean
a scholarly paper or presentation. For commercial organizations, this may mean a white paper for
system developers or marketing professionals.
Each of these steps can be challenging. However, designing a survey instrument (e.g., Steps 4
and 5) can be the most difficult aspect of the survey research. We address this development in more
detail in the following section.

8.1.3 Design a Survey Instrument


Before designing a survey instrument, the researcher must have a clear understanding of the type
of data desired and must keep the instrument focused on that area. The key to obtaining good data
via a survey is to develop a good survey instrument that is based on the research questions. The re-
searcher should develop a set of objectives with a clear list of all needed data. The research goals and
list of needed data will serve as the basis for the questions on the survey instrument.
56 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

Which features of Instant Messaging programs do you find most


What is your gender? useful when it comes to sharing information with teammates?
a. Male a. Real-Time Chat
b. Female b. File Sharing
c. Chat logs
d. None

FIGURE 8.1: Examples of multiple-choice questions.

A survey instrument is a data collection method that presents a set of questions to a re-
spondent. The respondent’s responses to the questions provide the data sought by the researcher.
Although seemingly simple, it can be very difficult to develop a set of questions for a survey instru-
ment. Some general guidelines for developing survey instruments are [113]:

• State on the survey instrument the research goal: At the top of the survey instrument, in-
clude a very brief statement explaining the purpose of the survey and assuring respondents
of their anonymity.
• Provide instructions for completing the survey instrument: To assist in ensuring that
survey results are valid, include instructions on how to respond to questions on the survey
instrument. Generally, there is a short introductory set of instructions at the top of the
survey instrument. Provide additional instructions for specific questions if needed.
• Place questions concerning personal information at the end of the survey: Demographic
information is often necessary for survey research. Place these questions at the end of the
survey. Providing personal data may annoy some respondents, resulting in incomplete or
inaccurate responses to the survey instrument.
• Group questions on the instrument by subject: If the survey instrument has more than 10
or so questions, the questions need to be grouped by some classification method. Generally,
grouping the questions by subject is a good organization method. If the instrument has

On a scale of 1–7, would you search individually or together with your workmates if you do not
know anything about the problem?
Individual Collaborate
1 2 3 4 5 6 7
* * * * * * *
FIGURE 8.2: Example of a rating question.
SUPPLEMENTARY METHODS FOR AUGMENTING WEB ANALYTICS 57

On a scale of 1–5 (1—never used, 5 — use every day), rank the following items on how expe-
rienced are you with using the following communication/collaboration applications for group
projects?
a. _____ Email
b. _____ Instant messaging
c. _____ Face-to-face meetings
d. _____ Telephone
e. _____ Others (please elaborate)

FIGURE 8.3: Example of a ranking question.

multiple groups of questions, each group should have a heading identifying the grouping.
Grouping questions allows the respondents to focus their responses around the central
theme of the group of questions.
• Present each question and type of question in a consistent structure: A consistent struc-
ture makes it much simpler for respondents and increases the likelihood of valid data.
Explain the proper method for responding to each question and ensure that the response
methods for similar questions are consistent throughout the instrument.

There are three general categories of survey questions, namely multiple-choice, Likert-scale,
and open-ended questions.
Multiple-choice questions. Multiple-choice questions have a closed set of response items
for the respondents to select. Multiple-choice questions are useful when we have a thorough under-
standing of the range of possible responses (see Figure 8.1).
The items for multiple-choice questions must cover all possible alternatives that the re-
spondents might select, and each of the items must be unique (i.e., they must not overlap). Since

As part of your project, I believe that you must have confronted a situation when you did not
really know how to proceed in order to solve a problem or perform a task on the Web.
(a) Can you speak about a specific instance of your project work in which you were uncer-
tain as to how to proceed?

FIGURE 8.4: Example of an open-ended question.


58 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

Which features of Instant Messaging programs do you find most useful when it comes to shar-
ing information with teammates?
a. Real-Time Chat
b. File Sharing
c. Chat logs
d. Others
e. None

FIGURE 8.5: Example of a partially structured question.

presenting all possible alternatives is a difficult task, we normally include a general catch-all item
(e.g., None of the above or Don’t know) at the end of a list of item choices. This approach helps
improve the accuracy of the data collected.
Likert-scale questions. With Likert-scale questions, the items are arranged as a continuum
with the extremes generally at the endpoints. Likert-scale questions may have respondents indicate
the degree to which they agree with a statement (see Figure 8.2) or rank a list of items (see Figure
8.3).
Open-ended questions. As Figure 8.4 demonstrates, open-ended questions have no list of
items for the respondent to choose from.
Open-ended questions are best for exploring new ideas or when the researcher does not know
any of the expected responses. As such, the open-ended questions are great for qualitative research.
The disadvantages to using open-ended questions are that it can be much more time consuming and
difficult to analyze the data because each question must be coded into order to derive variables.
If we have a partial list of possible responses, one can create a partially open-ended question
(see Figure 8.5).

8.2 LABORATORY STUDIES


A laboratory study is research conducted in a laboratory setting to investigate aspects that we cannot
do in a naturalistic setting. While laboratory studies can generate useful qualitative insights, they
typically focus on quantitative research based on hypotheses because the controlled setting allows us
to manage external variables that may otherwise influence the results. This is the power of labora-
tory studies relative to naturalistic studies, surveys, or Web analytics.
The specific major strength of a laboratory study lies in the investigation of dependent vari-
ables. In a laboratory study, we can design a setting to make changes to one or more independent
SUPPLEMENTARY METHODS FOR AUGMENTING WEB ANALYTICS 59

variables in order to investigate the effect on a dependent variable, while controlling for all the other
variables (i.e., control variables). In such a setting, only the variable of interest affects the outcome.
There are multiple ways of designing a laboratory study. As such, laboratory studies can be
very nuanced, and a review of laboratory studies is a lecture in itself. For more detailed examinations
of laboratory studies and experiments, I refer the interested reader to the Controlled Standards of
Reporting Trials (CONSORT) (www.consort-statement.org) that assist in the design of laboratory
experiments, specifically randomized controlled trials. CONSORT provides a 22-item checklist
and a flow diagram for conducting such studies. The checklist items focus on the study’s design,
analysis, and interpretation. The flow diagram illustrates the progress of participants through a
laboratory study. Together, these tools aid in understanding the design and running of the study, the
analysis of the collected data, and the interpretation of the results.
The Common Industry Format (CIF) is an American National Standards Institute (ANSI)
approved standard for reporting the results of usability studies. The National Institute of Standards
and Technology (NIST) developed this criterion to assist in designing and reporting the results of
usability studies targeted specifically for Websites.
One good way to learn about laboratory studies is to read what others have done. What are
some questions that one should ask when designing (or assessing) a laboratory study? One meth-
odology for accessing laboratory studies is the Centre for Allied Heath Evidence (CAHE) Critical
Appraisal Tools (CATs) (http://www.unisa.edu.au/cahe/CAHECATS/). The aim of the approach
is to identify possible methodological flaws in the design phase or in the reporting. With the use
of such a questionnaire, we can design better experiments and make informed decisions about the
quality of research evidence. The assessments presented below are based on the CriSTAL Checklist
for appraising a user study (http://www.shef.ac.uk/scharr/eblib/use.htm).
Does the study address a clearly focused issue? Essentially, the research aims should drive
the study. The issue can deal with the population (user group) studied, the intervention (service or
facility) provided, or the system. The laboratory study design must clearly identify the issue in a us-
able manner and explain how the outcomes (quantifiable or qualitative) are measured.
Is a good case made for the approach that the authors have taken? In designing a user study,
researchers can choose from an extensive array of methods. One way to assess a study, then, is to re-
view the selection of methodology (e.g., regression, ANOVA Analysis Of Variance, factor analysis,
etc.) and design setup (e.g., within or between groups). The method and setup should relate directly
to the research questions or objectives, which are tied to the research aim. A good study will clearly
identify the problem and provide justification for the questions or objectives. The methodology
must be appropriate to the research questions or objectives.
Were the methods used in selecting the users appropriate and clearly described? There are
several aspects of recruiting participants for any laboratory study, including:
60 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

• Type of sample: This addresses how the participants are recruited. Most studies are conve-
nience samples (i.e., you use who you can get). In academia, these are usually students, and
the participants self-select into the study. Randomly selected sample are generally preferred;
however, many times a convenience sample is not a critical shortcoming if the population
demographics are not essential to the research questions.
• Size of sample: Sample size is an important aspect of user studies and it must be managed
carefully. A sample size calculation (i.e., how many participants do you need to represent
the population you are studying) can determine the appropriate size needed to make the
sample representative of the population (i.e., does the sample represent targeted users).
Representativeness matters for quantitative analysis. For usability studies, this is not so
important. Generally, the demographics of the sample (e.g., age, sex, staff grade, location)
must accurately reflect the demographics of the total population. Any motivation for the
participants (i.e., money, course credit, etc.) must be acknowledged.
• Was the data collection instrument/method reliable? Any questionnaire, survey form,
or interview schedule should be pilot tested before its use in the laboratory study. When
adapting an instrument used in previous research, the case must be made for its appropriate
use.

What was the response rate and how representative were respondents of the population
under study? Whether using a convenience or random sample, researchers must ensure that no
subgroups were either over-represented or under-represented. When using convenience samples for
Web laboratory studies, sex is many times an issue.
Are the results complete and have they been analyzed in an easily interpretable way?
Just as there are several choices in methodologies, there are also several choices concerning meth-
ods of analysis and in how to present these results. Regardless, the variables must be defined and
identified.
Are there any limitations in the methodology (that might have influenced results) identi-
fied and discussed? No matter how well one designs a study, selects a sample, executes a methodol-
ogy, and analyzes the data, there are always limitations. After the study is completed, reflect on how
the study might be better implemented next time.
Are the conclusions based on an honest and objective interpretation of the results? Some-
times, a study does not tell you what you want or expect. That is actually good but frustrating.
However, one must base the conclusions clearly on the findings from the study’s data.
Just as log analysis and surveys have limitations, laboratory studies also have limitations that
we must consider [94]. The basic assumption underlying laboratory studies and experiments is that
we can extrapolate the results to the real world. However, when people are involved, this is a dicey
SUPPLEMENTARY METHODS FOR AUGMENTING WEB ANALYTICS 61

assumption, and the validity of results from laboratory studies to contexts outside the laboratory is
not flawless.
Some of the possible issues that can arise are laboratory effects (i.e., the context of the labora-
tory study is not a naturalistic setting), anonymity issues (i.e., the participants know they are being
observed), context (i.e., regardless of the study design, there are aspects beyond the control of the
researcher), and biased sample (i.e., regardless of the sampling method, there are biases created by
participants self selection into the study and by the fact that they do not represent the portion of the
population that never participates). By using Web analytics in conjunction with laboratory studies,
we can address many of these shortcomings.

8.3 CONCLUSION
Web analytics via log data are an excellent means for recording the behaviors of system users and the
responses of those systems. Because they focus on behavioral data only, however, transaction logs
are ineffective as a method of understanding the underlying motivations, affective characteristics,
cognitive factors, and contextual aspects that influence those behaviors. Used in conjunction with
Web logs, surveys and laboratory studies can be effective methods for investigating these aspects.
The combined methodological approaches can provide a richer picture of the phenomenon under
investigation.
In this section, we have reviewed a 10-step procedure for conducting survey research, with
explanatory notes on each step. We then discussed the design of a survey instrument, with examples
of the various types of questions, and then discussed aspects of designing a laboratory study, provid-
ing some key questions that can help us in planning and completing a laboratory study.

• • • •
63

CHAPTER 9

Search Log Analytics

A special case of Web analytics is analyzing data from search logs. Exploiting the data stored in
search logs of Web search engines, Intranets, and Websites can provide important insights into
understanding the information searching tactics of online users. This understanding can inform
information system design, interface development, and information architecture construction for
content collections.
This section presents a review of and foundation for conducting Web SLA [64, 66]. A basic
understanding of search engines and searching behavior is assumed [for a review, see Ref. 93]. SLA
methodology consists of three stages (i.e., collection, preparation, and analysis), and those stages
are presented in detail with discussions of the goals, metrics, and processes at each stage. Fol-
lowing this, the critical terms in TLA for Web searching are defined and suggestions are pro-
vided on ways to leverage the strengths and address the limitations of TLA for Web searching
research.

9.1 INTRODUCTION
Information searching researchers have employed search logs for analyzing a variety of Web infor-
mation systems [34, 73, 78, 151]. Web search engine companies use search logs (also referred to as
transaction logs) to investigate searching trends and effects of system improvements (cf. Google at
http://www.google.com/press/zeitgeist.html or Yahoo! at http://buzz.yahoo.com/buzz_log/?fr=fp-
buzz-morebuzz). Search logs are an unobtrusive method of collecting significant amounts of search-
ing data on a sizable number of system users. There are several researchers who have employed the
SLA methodology to study Web searching. Romano et al. [128] present a methodology for general
qualitative analysis of transaction log data. Wang et al. [151] and Spink and Jansen [140] also pre-
sent explanations of approaches to TLA.
Generally, there are limited published works concerning how to employ search logs to sup-
port the study of Web searching, the use of Web search engines, Intranet searching, or other Web
searching applications. Yet, SLA is helpful for studying Web searching on Websites and Web search
engines.
64 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

9.2 REVIEW OF SEARCH ANALYTICS


9.2.1 What Is a Search Log?
Not surprisingly, a search log is a file (i.e., log) of the communications (i.e., transactions) between
a system and the users of that system. Rice and Borgman [126] present transaction logs as a data
collection method that automatically captures the type, content, or time of transactions made by a
person from a terminal connected to that system. As noted previously, Peters [117] views transac-
tion logs as electronically recorded interactions between on-line information retrieval systems and
the persons who search for the information found in those systems.
For Web searching, a search log is an electronic record of interactions that have occurred during
a searching episode between a Web search engine and users searching for information on that Web search
engine. A Web search engine may be a general-purpose search engine, a niche search engine, a
searching application on a single Website, or variations on these broad classifications. A searching
episode is a period of user interaction with a search engine that may be composed of one or more
sessions. The users may be humans or computer programs acting on behalf of humans. Interactions
are the communication exchanges that occur between users and the system, and either the user or
the system may initiate elements of these exchanges.

9.2.2 How Are These Interactions Collected?


The process of recording the data in the search log is relatively straightforward. Web servers record
and store the interactions between searchers (i.e., actually Web browsers on a particular computer)
and search engines in a log file (i.e., the search log) on the server using a software application. Thus,
most search logs are server-side recordings of interactions. Major Web search engines execute mil-
lions of these interactions per day. The server software application can record various types of data
and interactions depending on the file format that the server software supports.
As mentioned earlier, typical transaction log formats are access log, referrer log, and extended
log. The W3C (http://www.w3.org/TR/WD-logfile.html) is one organizational body that defines
transaction log formats. However, search logs are a special type of transaction log file. This search
log format has most in common with the extended file format, which contains data such as the
client computer’s IP address, user query, search engine access time, and referrer site, among other
fields.

9.2.3 Why Collect This Data?


Collecting the data enables us to analyze it in order to obtain beneficial information. Web analyt-
ics, as we have discussed, can focus on many interaction issues and research questions [36], but it
SEARCH LOG ANALYTICS 65

typically addresses either issues of system performance, information structure, or user interactions.
Blecic et al. [18] define TLA as the detailed and systematic examination of each search command
or query by a user and the following database result or output. Phippen et al. [120] and Spink and
Jansen [140] also provide comparable definitions of TLA.
For Web searching research, we focus on a subset of Web analytics, namely SLA. Web
analytics is useful for analyzing the browsing or navigation patterns within a Website, while
SLA is concerned exclusively with searching behaviors. SLA is defined as the use of data col-
lected in a search log to investigate particular research questions concerning interactions among Web
users, the Web search engine, or the Web content during searching episodes. Within this interaction
context, we can exploit the data in search logs to discern attributes of the search process, such
as the searcher’s actions on the system, the system responses, or the evaluation of results by the
searcher.
The goal of SLA is to gain a clearer understanding of the interactions among searcher, con-
tent and system or the interactions between two of these structural elements, based on whatever
research questions drive the study. Employing SLA allows us to achieve some stated objective, such
as improved system design, advanced searching assistance, or better understanding of some user
information searching behavior.

9.2.4 What Is the Foundation of Search Log Analysis?


SLA lends itself to a grounded theory approach [44]. This approach emphasizes a systematic dis-
covery of theory from data using methods of comparison and sampling. The resulting theories
or models are grounded in observations of the real world, rather than being abstractly generated.
Therefore, grounded theory is an inductive approach to theory or model development, rather than
the deductive alternative [26].
In other words, when using SLA as a methodology, we examine the characteristics of search-
ing episodes in order to isolate trends and identify typical interactions between searchers and the
system. Interaction has several meanings in information searching, addressing a variety of transac-
tions including query submission, query modification, results list viewing, and use of information
objects (e.g., Web page, pdf file, and video).
For the purposes of SLA, we consider interactions the physical expressions of communica-
tion exchanges between the searcher and the searching system. For example, a searcher may submit
a query (i.e., an interaction). The system may respond with a results page (i.e., an interaction). The
searcher may click on a uniform resource locator (URL) in the results listing (i.e., an interaction).
Therefore, for SLA, interaction is a mechanical expression of underlying information needs or
motivations.
66 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

9.2.5 How Is Search Log Analysis Used?


Researchers and practitioners have used SLA (usually referred to as TLA in these studies) to evalu-
ate library systems, traditional information retrieval (IR) systems, and more recently Web systems.
Search logs have been used for many types of analysis; in this review, we focus on those studies that
centered on or about searching. Peters [117] provides a review of TLA in library and experimental
IR systems. Some progress has been made in TLA methods since Peters’ summary [117] in terms
of collection and ability to analyze data. Jansen and Pooch [69] report on a variety of studies em-
ploying SLA for the study of Web search engines and searching on Websites. Jansen and Spink
[71] provide a comprehensive review of Web searching SLA studies. Other review articles include
Kinsella and Bryant [87] and Fourie [42].
Several researchers have viewed SLA as a high-level designed process, including Copper
[31]. Other researchers, such as Hancock-Beaulieu et al. [49], Griffiths et al. [47], Bains [12],
Hargittai [51], and Yuan and Meadows [163], have advocated using SLA in conjunction with other
research methodologies or data collection. Alternatives for other data collection include surveys and
laboratory studies.

9.2.6 How to Conduct Search Log Analysis?


Despite the abundant literature on SLA, there are few published manuscripts on how actually to
conduct it, especially with respect to SLA for Web searching. While some works provide fairly
comprehensive descriptions of the methods employed, including Cooper [31], Nicholas et al. [108],
Wang et al. [151], and Spink and Jansen [140], none presents a process or procedure for actually
conducting TLA in sufficient detail to replicate the method. We will attempt to address this short-
coming, building on work presented in Ref. [66].

9.3 SEARCH LOG ANALYSIS PROCESS


Naturally, research questions need to be articulated in order to determine what data needs to be col-
lected. However, search logs are typically of standard formats due to previously developed software
applications. Given the interactions between users and Web browsers, which are the interfaces to
Web search engines, the type of data that one can collect is standard. Therefore, the SLA methodol-
ogy discussed here is applicable to a wide range of studies.
SLA involves three major stages, namely:

• Data collection: the process of collecting the interaction data for a given period in a trans-
action log;
• Preparation: the process of cleaning and preparing the transaction log data for analysis; and
• Analysis: the process of analyzing the prepared data.
SEARCH LOG ANALYTICS 67

9.3.1 Data Collection


The research questions define what information one must collect in a search log. Transaction logs
provide a good balance between collecting a robust set of data and unobtrusively collecting that data
[102]. If we are conducting a naturalistic study (i.e., outside of the laboratory) on a real system (i.e.,
a system used by actual searchers), then the method of data monitoring and collecting should not
interfere with the information searching process. In addition to the loss of potential customers, a
data collection method that interferes with the information searching process may unintentionally
alter that process. For these reasons, and others, collecting data from real users pursuing needed
information while interacting with real systems on the Web necessarily affects the type of data
realistically obtainable.

9.3.2 Fields in a Standard Search Log


Table 9.1 provides a sample of a standard search log format collected by a Web search engine.
The fields are common in standard Web search engine logs, although some systems may log
additional fields. A common additional field is a cookie identification code that facilitates identify-
ing individual searchers using a common computer. A cookie is a text message given by a Web server
to a Web browser and is stored on the client machine.
In order to facilitate valid comparisons and contrasts with other analysis, a standard termi-
nology and set of metrics [69] is advocated. This standardization will help address one of Kurth’s
critiques [91] concerning the communication of SLA results across studies. Others have also noted
terminology as an issue in Web research [121]. The standard field labels and descriptors are pre-
sented below.
A searching episode is a series of searching interactions within a given temporal span by a single
searcher. Each record, shown as a row in Table 9.1, is a searching interaction. The format of each
searching interaction is:

• User Identification: the IP address of the client’s computer. This is sometimes also an anony-
mous user code address assigned by the search engine server, which is our example in
Table 9.1.
• Date: the date of the interaction as recorded by the search engine server.
• The Time: the time of the interaction as recorded by the search engine server.
• Search URL: the query terms as entered by the user.
• Web search engine server software normally records these fields. Other common fields
include Results Page (a code representing a set of result abstracts and URLs returned by the
search engine in response to a query),
• Language (the user preferred language of the retrieved Web pages),
68 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

TABLE 9.1: Web search engine search log snippet.

USER
IDENTIFICATION DATE TIME QUERY

ce00 25/Apr/2009 04:08:50 Sphagnum Moss Harvesting


+ New Jersey + Raking

38f0 25/Apr/2009 04:08:50 emailanywhere

fabc 25/Apr/2009 04:08:54 Tailpiece

5010 25/Apr/2009 04:08:54 1’personalities AND


gender AND education’1

25/Apr/2009 04:08:54 dmr panasonic

89bf2 25/Apr/2009 04:08:55 bawdy poems”

“Web Analytics 25/Apr/2009

397e0 04:08:56 gay and happy

a9560 25/Apr/2009 04:08:58 skin diagnostic

81343 25/Apr/2009 04:08:59 Pink Floyd cd label


cover scans

3c5c 25/Apr/2009 04:09:00 freie stellen dangaard

9daf 25/Apr/2009 04:09:00 Moto.it

415 25/Apr/2009 04:09:00 Capablity Maturity


Model VS.

c03 25/Apr/2009 04:09:01 ana cleonides paulo fontoura


Note: Items in boldface are intentional errors.
SEARCH LOG ANALYTICS 69

• Source (the federated content collection searched, also known as vertical), and
• Page Viewed (the URL that the searcher visited after entering the query and viewing the
results page, which is also known as either click-thru or click-through).

9.3.3 Data Preparation


Once the data is collected, we are ready to prepare the data. For data preparation, the focus is on
importing the search log data into a relational database (or other analysis software), assigning each
record a primary key, cleaning the data (i.e., checking each field for bad data), and calculating stan-
dard interaction metrics that will serve as the basis for further analysis.
Figure 9.1 shows the Entity–Relation (ER) diagram for the relational database that will be
used to store and analyze the data from our search log. The ER diagram will vary based on the
specific analysis.

term_id cid tot


qtot
search_url
uid qid qry_length
cooc
thetime

(1, n) Co_occur

boolean searching_episode
(0, n) Query

operator
composed_of

(1, n) Terms
(0, n) Query_Total (0, 1) Query_Occurrences

occurrences terms

termid term tfreq

FIGURE 9.1: Web search log ER scheme diagram [66].


70 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

An ER diagram models the concepts and perceptions of the data and displays the conceptual
schema for the database using standard ER notation. Table 9.2 presents the legend for the schema
constructs.
Since search logs are in ASCII format, we can easily import the data into most relational
databases. A key thing is to import the data in the same coding schema in which it was recorded

TABLE 9.2: Search log legend for ER schema constructs [66].

ENTITY NAME CONSTRUCT

Searching_Episodes a table containing the searching interactions

boolean denotes if the query contains Boolean operators

operators denotes if the query contains advanced query operators

q_length query length in terms

qid primary key for each record

qtot number of results pages viewed

searcher_url query terms as entered by the searcher

thetime time of day as measured by the server

uid user identification based on IP

Terms table with terms and frequency

term_ID term identification

term term from the query set

tfreq number of occurrences of term in the query set

Cooc table term pairs and the number of occurrences of those pairs

term_ID term identification

cid the combined term identification for a pair of terms

tot number of occurrences of the pair in the query set


SEARCH LOG ANALYTICS 71

(e.g., UTF-8, US-ASCII). Once imported, each record is assigned a unique identifier or primary
key. Most modern databases can assign this automatically on importation, or we can assign it later
using scripts.

9.3.4 Cleaning the Data


Once the search log data is in a suitable analysis software package, the focus shifts to cleaning the
data. Records in search logs can contain corrupted data. These corrupted records have multiple
causes, but they are mostly the result of errors in logging the data. In the example shown in Table
9.1, the errors are easy to spot (additionally, these records are rendered in boldface), but often a
search log will number millions if not billions of records. Therefore, a visual inspection is not practi-
cal for error identification. From experience, one method of rapidly identifying most errors is to sort
each field in sequence. Since the erroneous data will not fit the pattern of the other data in the field,
these errors will usually appear at the top of, bottom of, or in groups in each sorted field. Standard
database functions to sum and group key fields, such as time and IP address, will usually identify any
further errors. We must remove all records with corrupted data from the transaction log database.
Typically, the percentage of corrupted data is small relative to the overall database.

9.3.5 Parsing the Data


To demonstrate how to parse data, we will use the three fields of The Time, User Identification, and
Search URL, common to all Web search logs to recreate the chronological series of actions in a
searching episode. The Web query search logs usually contain queries from both human users and
agents. Depending on the research objective, we may be interested in only individual human inter-
actions, those from common user terminals, or those from agents. For the running example used
here, we will consider the case of only having an interest in human searching episodes.
Given that there is no way to accurately identify human from non-human searchers [135,
146], most researchers using Web search log either ignore it [25] or assume some temporal or in-
teraction cutoff [107, 135]. Using a cutoff of 101 queries, the subset of the search log is weighted to
queries submitted primarily by human searchers in a non-common user terminal, but 101 queries is
high enough not to introduce bias by too low of a cutoff threshold. The selection of 101 is arbitrary,
and other researchers have used a wide variety of cutoffs. For our example, we will separate all ses-
sions with fewer than 101 queries into an individual search log.
There are several methods to remove these large sessions. One can code a program to count
the session lengths and then delete all sessions that have lengths over 100. For smaller log files (a
few million or so records), it is just as easy to do with SQL queries. To do this, we must first remove
records that do not contain queries. From experience, search logs may contain many such records
72 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

(usually on the order of 35% to 40% of all records) as users go to Websites for purposes other than
searching [72].

9.3.6 Normalizing Searching Episodes


When a searcher submits a query, then views a document, and returns to the search engine, the Web
server typically logs this second visit with the identical user identification and query, but with a new
time (i.e., the time of the second visit). This is beneficial information in determining how many of
the retrieved results pages the searcher visited from the search engine, but unfortunately, it also skews
the results in analyzing the query level of analysis. In order to normalize the searching episodes, we
must first separate these result page requests from query submissions for each searching episode.
In SLA, researchers are often interested in terms and term usage, which can be an entire
study in itself. In these cases, it is usually cleaner to generate separate tables that contain each term
and their frequency of occurrence than to attempt combination tables. A term co-occurrence table
that contains each term and its co-occurrence with other terms is also valuable for understanding
the data. With a relational database, we can generate these tables using scripts. If using text-parsing
languages, we can parse these terms and associated data during initial processing.
There are already several fields in our database, many of which can provide valuable informa-
tion. From these items, we can calculate several metrics, some of which take a long time to compute
for large datasets.

9.4 DATA ANALYSIS


This stage focuses on three levels of analysis. As we discuss these levels, we will step through the
data analysis stage.

9.4.1 Analysis Levels


The three common levels of analysis for examining transaction logs are term, query, and session.

Term level analysis. The term level of analysis naturally uses the term as the basis for analysis. A
term is a string of characters separated by some delimiter such as a space or some other separator.
At this level of analysis, one focuses on measures such as term occurrence, which is the frequency that
a particular term occurs in the transaction log. Total terms are the number of terms in the dataset.
Unique terms are the terms that appear in the data regardless of the number of times they occur. High
Usage Terms are those terms that occur most frequently in the dataset. Term co-occurrence measures
the occurrence of term pairs within queries in the entire search log. We can also calculate degrees of
association of term pairs using various statistical measures [cf. Refs. 131, 135, 151].
SEARCH LOG ANALYTICS 73

The mutual information formula measures term association and does not assume mutual in-
dependence of the terms within the pair. We calculate the mutual information statistic for all term
pairs within the dataset. Many times, a relatively low frequency term pair may be strongly associ-
ated (i.e., if the two terms always occur together). The mutual information statistic identifies the
strength of this association. The mutual information formula used in this research is
P(w 1 , w 2 )
I(w i , w 2 ) = ln
P(w 1 )P(w 2 )
where P(w1), P(w2) are probabilities estimated by relative frequencies of the two words and P(w1, w2)
is the relative frequency of the word pair and order is not considered. Relative frequencies are ob-
served frequencies (F) normalized by the number of the queries:
F1 F2 F2
P (w1) = ; P (w1) = ; P (w1, w2) =
Q Q Q
Both the frequency of term occurrence and the frequency of term pairs are the occurrence of
the term or term pair within the set of queries. However, since a one-term query cannot have a term
pair, the set of queries for the frequency base differs. The number of queries for the terms is the num-
ber of non-duplicate queries in the dataset. The number of queries for term pairs is defined as:
m
Q = ∑ (2n − 3)Qn
n

where Qn is the number of queries with n words (n > 1), and m is the maximum query length. So,
queries of length one have no pairs. Queries of length two have one pair. Queries of length three
have three possible pairs. Queries of length four have five possible pairs. This continues up to the
queries of maximum length in the dataset. The formula for queries of term pairs (Q) account for
this term pairing.

Query level analysis. The query level of analysis uses the query as the base metric. A query is defined
as a string list of one or more terms submitted to a search engine. This is a mechanical definition as
opposed to an information searching definition [88]. The first query by a particular searcher is the
initial query. A subsequent query by the same searcher that is different from any of the searcher’s
other queries is a modified query. There can be several occurrences of different modified queries by a
particular searcher. A subsequent query by the same searcher that is identical to one or more of the
searcher’s previous queries is an identical query.
In many Web search engine logs, when the searcher traverses to a new results page, this inter-
action is also logged as an identical query. In other logging systems, the application records the page
rank. A results page is the list of results, either sponsored or organic (i.e., non-sponsored), returned
74 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

by a Web search engine in response to a query. Using either identical queries or some results page
field, we can analyze the result page viewing patterns of Web searchers.
Other measures are also observable at the query level of analysis. A unique query refers to a
query that is different from all other queries in the transaction log, regardless of the searcher. A
repeat query is a query that appears more than once within the dataset by two or more searchers.
Query complexity examines the query syntax, including the use of advanced searching tech-
niques such as Boolean and other query operators. Failure rate is a measure of the deviation of
queries from the published rules of the search engine. The use of query syntax that the particular IR
system does not support, but may be common on other IR systems, is carry over.

Session level analysis. At the session level of analysis, we primarily examine the within-session
interactions [48]. However, if the search log spans more than one day or assigns some temporal
limit to interactions from a particular user, we could examine between-sessions interactions. A
session interaction is any specific exchange between the searcher and the system (i.e., submitting
a query, clicking a hyperlink, etc.). A searching episode is defined as a series of interactions within
a limited duration to address one or more information needs. This session duration is typically
short, with Web researchers using between 5 and 120 minutes as a cutoff [cf. Refs. 54, 70, 107, 135].
Each choice of time has an impact on the results, of course. The searcher may be multitasking [106,
138] within a searching episode, or the episode may be an instance of the searcher engaged in suc-
cessive searching [95, 110, 141]. This session definition is similar to the definition of a unique visitor
used by commercial search engines and organizations to measure Website traffic. The number of
queries per searcher is the session length.
Session duration is the total time the user spent interacting with the search engine, including
the time spent viewing the first and subsequent Web documents, except the final document. Ses-
sion duration can therefore be measured from the time the user submits the first query until the
user departs the search engine for the last time (i.e., does not return). The viewing time of the final
Web document is not available since the Web search engine server does not record the time stamp.
Naturally, the time between visits from the Web document to the search engine may not have been
entirely spent viewing the Web document, which is a limitation of the measure.
A Web document is the Web page referenced by the URL on the search engine’s results page.
A Web document may be text or multimedia and, if viewed hierarchically, may contain a nearly
unlimited number of sub-Web documents. A Web document may also contain URLs linking to
other Web documents. From the results page, a searcher may click on a URL, (i.e., visit) one or
more results from the listings on the result page. This is click through analysis and measures the page
viewing behavior of Web searchers. We measure document viewing duration as the time from when
a searcher clicks on a URL on a results page to the time that searcher returns to the search engine.
SEARCH LOG ANALYTICS 75

Some researchers and practitioners refer to this type of analysis as page view analysis. Click through
analysis is possible if the transaction log contains the appropriate data. There are many other factors
one can examine, including query graphs [11].

9.4.2 Conducting the Data Analysis


The key to successful SLA is conducting the analysis with an organized approach. One method
is to sequentially number and label the queries (or coded modules) to correspond to the order of
execution and to their function, since many of these queries must be executed in a certain order to
obtain valid results. Many relational database management systems provide mechanisms to add
descriptive properties to the queries. These can provide further explanations of the query function

FIGURE 9.2: SLA numbered and descriptively labeled queries [66].


76 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

or relate these queries directly to research questions. Figure 9.2 illustrates the application of such
an approach.
Figure 9.2 also shows each query in sequence and provides a descriptive tag describing that
query’s function.
SLA involves a series of standard analyses that are common to a wide variety of Web search-
ing studies. Some of these analyses may directly address certain research questions, and others may
be the basis for more in-depth research analysis.
One typical question is, “How many searchers have visited the search engine during this
period?” This query will provide a list of unique searchers and the number of queries they have sub-
mitted during the period. We can modify this question and determine “How many searchers have
visited the search engine on each day during this period.” Naturally, a variety of statistical results can
be determined using the previous queries. For example, we can determine the standard deviation of
number of queries per searcher.
In addition to visits, we may want information about the session lengths (i.e., the number of
queries within a session) for each searcher. Similarly, we may be curious about the number of search-
ers who viewed a certain number of results pages.
We can calculate various statistical results on results page viewing, such as the maximum
number of result pages viewed and queries per day. An important aspect for system designers is
results caching because we need to know the number of repeat queries submitted by the entire set of
searchers during a given period in order to optimize our system’s performance.
Some researchers are more interested in how searchers are interacting with a search engine,
and for this purpose the use of Boolean operators is an important feature. Since most search engines
offer other query syntax than just Boolean operators, we can also investigate the use of these other
operators.
Counting the terms within the transaction log is another typical measurement. We certainly
want to know about query length, the frequency of terms pairs, and the various term frequencies.
The results from this series of queries provide us with a wealth of information about our data
(e.g., occurrences of session lengths, occurrences of query length, occurrences of repeat queries, most
used terms, most used term pairs) and serves as the basis for further investigations (e.g., session
complexity, query structure, query modifications, term relationships).

9.5 DISCUSSION
It is certainly important to understand both the strengths and limitations of SLA for Web search-
ing. First concerning the strengths, SLA provides a method of collecting data from a great number
of users. Given the current nature of the Web, search logs appears to be a reasonable and non-
intrusive means of collecting user system interaction data during the Web information searching
SEARCH LOG ANALYTICS 77

process from a large number of searchers. We can easily collect data on hundreds of thousands to
millions of interactions, depending on the traffic of the Website.
Second, we can collect this data inexpensively. The costs are the software and storage. Third,
the data collection is unobtrusive, so the interactions represent the unaltered behavior of searchers,
assuming the data is from an operational searching site. Finally, search logs are, at present, the only
method for obtaining significant amounts of search data within the complex environment that is the
Web [37]. Of course, researchers can also undertake SLA from research sites or capture client-side
data across multiple sites using a custom Web browser (for the purpose of data collection) that does
not completely mimic the searcher’s natural environment.
There are limitations with SLA, as with any methodology. First, certain types of data are
not in the transaction log, individuals’ identities being the most common example. An IP address
typically represents the “user” in a search log. Since more than one person may use a computer, an
IP address is an imprecise representation of the user. Search engines are overcoming this limitation
somewhat by the use of cookies.
Second, there is no way to collect demographic data when using search logs in a naturalistic
setting. This constraint is true of many non-intrusive naturalistic studies. However, there are several
sources for demographic data on the Web population based on observational and survey data. From
these data sources we may get reasonable estimations of needed demographic data. However, this
demographic data is still not attributable to specific subpopulations.
Third, a search log does not record the reasons for the search, the searcher motivations, or
other qualitative aspects of use. This is certainly a limitation. In the instances where one needs
this data, one should use TLA in conjunction with other data collection methods. However, this
invasiveness reduces the unobtrusiveness, which is an inherent advantage of search logs as a data
collection method.
Fourth, the logged data may not be complete due to caching of server data on the client
machine or proxy servers. This is an often-mentioned limitation. In reality, this is a relatively minor
concern for Web search engine research due to the method with which most search engines dy-
namically produce their results pages. For example, a user accesses the page of results from a search
engine using the Back button of a browser. This navigation accesses the results page via the cache on
the client machine. The Web server will not record this action. However, if the user clicks on any
URL on that results page, functions coded on the results page redirects the click first to the Web
server, from which the Web server records the visit to the Website.

9.6 CONCLUSION
We presented a three-step methodology for conducting SLA, namely collecting, preparing, and
analyzing. We then reviewed each step in detail, providing observations, guides, and lessons learned.
78 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

We also discussed the organization of the database at the ER-level, and we explained the table
design for standard search engine transaction logs. This presentation of the methodology at a
detailed level of granularity will serve as an excellent basis for novice or experienced search log
researchers.
Search logs are powerful tools for collecting data on the interactions between users and sys-
tems. Using this data, SLA can provide significant insights into user–system interactions, and it
complements other methods of analysis by overcoming the limitations inherent in those methods.
By combining SLA with other data collection methods or other research results, we can improve the
robustness of the analysis. Overall, SLA is a powerful tool for Web searching research, and the SLA
process outlined here can be helpful in future Web searching research endeavors.

• • • •
79

CHAPTER 10

Conclusion

This lecture presents an overview of the Web analytics process, with a focus on gaining insight
and actionable outcomes from collecting and analyzing Internet data. The lecture first provides
an overview of Web analytics, providing in essence, a condensed version of the entire lecture. The
lecture then outlines the theoretical and methodological foundations of Web analytics in order to
understand clearly the strengths and shortcomings of Web analytics as an approach. These founda-
tional elements include the psychological basis in behaviorism and methodology underpinning of
trace data as an empirical method. The lecture then presents a brief history of Web analytics from
the original transaction log studies in the 1960s, through the information science investigations of
library systems, to the focus on Websites, systems, and applications. The lecture then covers the
various types of ongoing interaction data within the clickstream created using log files and page
tagging for analytics of Website and search logs. The lecture then presents a Web analytic process to
convert this basic data to meaningful KPIs to measure likely converts that are tailored to the orga-
nizational goals or potential opportunities. Supplementary data collection techniques are addressed,
including surveys and laboratory studies. The lecture then discusses the strengths and shortcoming
of Web analytics. The overall goal of this lecture is to provide implementable information and a
methodology for understanding Web analytics in order to improve Web systems, increase customer
satisfaction, and target revenue through effective analysis of user–Website interactions.
Returning to that online retail store selling the latest athletic shoe, Web analytics can tell us
how potential customers find our online store, including those who are referred from other Websites
and those from search engines. Web analytics provides us the methods to know, and our KPIs tell
us why we should care. Our understanding of customer behavior provided by Web analytics gives us
the tool to determine what it might mean if customers come to our Website and then immediately
leave versus if the potential customer explores several pages and then leaves. We can leverage Web
analytics techniques to glean value from this data. Web analytics allows us to focus on organiza-
tion goals, including getting the customer through the entire shopping cart process. In sum, Web
analytics is the strategic tool to make our hypothetical online store successful by understanding why
potential customers behave as they do and what that behavior means.

• • • •
81

CHAPTER 11

Key Terms

• Abandonment rate: key performance indicator that measures the percentage of visitors
who got to that point on the site but decided not to perform the target action.
• Alignment-centric performance management: method of defining a site’s business goals
by choosing only a few key performance indicators.
• Average order value: key performance indicator that measures the total revenue to the
total number of orders.
• Average time on site: see visit length.
• Behavior: essential construct of the behaviorism paradigm. At its most basic, a behavior is
an observable activity of a person, animal, team, organization, or system. Like many basic
constructs, behavior is an overloaded term because it also refers to the aggregate set of
responses to both internal and external stimuli. Therefore, behaviors address a spectrum
of actions. Because of the many associations with the term, it is difficult to characterize it
without specifying a context in which it takes place to provide meaning.
• Behaviorism: research approach that emphasizes the outward behavioral aspects of thought.
For transaction log analysis, we take a more open view of behaviorism. In this more en-
compassing view, behaviorism emphasizes the observed behaviors without discounting the
inner aspects that may accompany these outward behaviors.
• Checkout conversion rate: key performance indicator that measures the percent of total
visitors who begin the checkout process.
• Commerce Website: a type of Website where the goal is to get visitors to purchase goods
or services directly from the site.
• Committed visitor index: key performance indicator that measures the percentage of visi-
tors that view more than one page or spend more than 1 minute on a site (these measure-
ments should be adjusted according to site type).
• Content/media Website: a type of Website focused on advertising.
• Conversion rate: key performance indicator that measures the percentage of total visitors
to a Website that perform a specific action.
• Cost per lead (CPL): key performance indicator that measures the ratio of marketing ex-
penses to total leads and shows how much it costs a company to generate a lead.
82 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

• Customer loyalty: key performance indicator that measures the ratio of new to existing
customers.
• Customer satisfaction metrics: key performance indicator that measures how the users
rate their experiences on a site.
• Demographics and system statistics: a metric that measures the physical location and
information of the system used to access the Website.
• Depth of visit: key performance indicator that measures the ratio between page views and
visitors.
• Electronic survey: method of data collection in which a computer plays a major role in
both the delivery of a survey to potential respondents and the collection of survey data from
actual respondents.
• Ethogram: index of the behavioral patterns of a unit. An ethogram details the differ-
ent forms of behavior that an actor displays. In most cases, it is desirable to create an
ethogram in which the categories of behavior are objective, discrete, and not overlapping
with each other. The definitions of each behavior should be clear, detailed, and distin-
guishable from each other. Ethograms can be as specific or general as the study or field
warrants.
• Interactions: physical expressions of communication exchanges between the searcher and
the system.
• Internal search: a metric that measures information on keywords and results pages viewed
using a search engine embedded in the Website.
• Key performance indicator (KPI): a combination of metrics tied to a business strategy.
• Lead generation Website: Website used to obtain user contact information in order to
inform them of a company’s new products and developments and to gather data for market
research.
• Log file: log kept by a Web server of information about requests made to the Website in-
cluding (but not limited to) visitor IP address, date and time of the request, request page,
referrer, and information on the visitor’s Web browser and operating system.
• Log file analysis: method of gathering metrics that uses information gathered from a log
file to gather Website statistics.
• Metrics: statistical data collected from a Website such as number of unique visitors, most
popular pages, etc.
• New visitor: a user who is accessing a Website for the first time.
• New visitor percentage: key performance indicator that measures the ratio of new visitors
to unique visitors.
KEY TERMS 83

• Online business performance management (OBPM): method of defining a site’s busi-


ness goals that emphasizes the integration of business tools and Web analytics to make
better decisions quickly in an ever-changing online environment.
• Order conversion rate: key performance indicator that measures the percentage of total
visitors who place an order on a Website.
• Page depth: key performance indicator that measures the ratio of page views for a specific
page and the number of unique visitors to that page.
• Page tagging: method of gathering metrics that uses an invisible image to detect when a
page has been successfully loaded and then uses JavaScript to send information about the
page and the visitor back to a remote server.
• Prospect rate: key performance indicator that measures the percentage of visitors who get
to the point in a site where they can perform the target action (even if they do not actually
complete it).
• Referrers and keyword analysis: a metric that measures which sites have directed traffic to
the Website and which keywords visitors are using to find the Website.
• Repeat visitor: a user who has been to a Website before and is now returning.
• Returning visitors: key performance indicator that measures the ratio of unique visitors
to total visits.
• Search engine referrals: key performance indicator that measures the ratio of referrals to a
site from specific search engines to the industry average.
• Search log analysis (SLA): use of data collected in a search log to investigate particular
research questions concerning interactions among Web users, the Web search engine, or
the Web content during searching episodes.
• Search log analysis (SLA) process: three stage process of collection, preparation, and
analysis.
• Search log: electronic record of interactions that have occurred during a searching episode
between a Web search engine and users searching for information on that Web search
engine.
• Single access ratio: key performance indicator that measures the ratio of total single access
pages (or pages where the visitor enters the site and exits immediately from the same page)
to total entry pages.
• Stickiness: key performance indicator that measures how many people arrive at a home-
page and proceed to traverse the rest of the site.
• Support/self-service Website: a type of Website that focuses on helping users find spe-
cialized answers for their particular problems.
84 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

• Survey instruments: a data collection procedure used in a variety of research designs.


• Survey research: a method for gathering information by directly asking respondents about
some aspect of themselves, others, objects, or their environment.
• Top pages: a metric that measures the pages in a Website that receive the most traffic.
• Total bounce rate: key performance indicator that measures the percentage of visitors who
scan the site and then leave.
• Trace data: measures that offer a sharp contrast to directly collected data. The greatest
strength of trace data is that it is unobtrusive. The collection of the data does not interfere
with the natural flow of behavior and events in the given context. Since the data is not
directly collected, there is no observer present in the situation where the behaviors occur to
affect the participants’ actions. Trace data is unique; as unobtrusive and nonreactive data,
it can make a very valuable research course of action. In the past, trace data was often time
consuming to gather and process, making such data costly. With the advent of transaction
logging software, trace data for the studying of behaviors of users and systems has become
popular.
• Traffic concentration: key performance indicator that measures the ratio of number of
visitors to a certain area in a Website to total visitors.
• Transaction log: electronic record of interactions that have occurred between a system
and users of that system. These log files can come from a variety of computers and systems
(Websites, OPAC, user computers, blogs, listserv, online newspapers, etc.), basically any
application that can record the user–system–information interactions.
• Transaction log analysis (TLA): broad categorization of methods that covers several sub-
categorizations, including Web log analysis (i.e., analysis of Web system logs), blog analy-
sis, and search log analysis (analysis of search engine logs).
• Unique visit: one visit to a Website (regardless of if the user has previously visited the site);
an alternative to unique visitors.
• Unique visitor: a specific user who accesses a Website.
• Unobtrusive methods: research practices that do not require the researcher to intrude in
the context of the actors. Unobtrusive methods do not involve direct elicitation of data
from the research participants or actors. This approach is in contrast to obtrusive methods
such as laboratory experiments and surveys that require researchers to physically interject
themselves into the environment being studied.
• Visit length: a metric that measures total amount of time a visitor spends on the Website.
• Visit value: key performance indicator that measures the total number of visits to total
revenue.
KEY TERMS 85

• Visitor path: a metric that measures the route a visitor uses to navigate through the Web-
site.
• Visitor type: a metric that measures users who access a Website. Each user who visits the
Website is a unique user. If it is a user’s first time to the Website, that visitor is a new visitor,
and if it is not the user’s first time, that visitor is a repeat visitor.
• Web analytics: the measurement of visitor behavior on a Website.
• Web analytics: the measurement, collection, analysis, and reporting of Internet data for
the purposes of understanding and optimizing Web usage (http://www.webanalytics-
association.org/).

• • • •
87

CHAPTER 12

Blogs for Further Reading

Listed below are several practitioner blogs that offer current and insightful analysis on Web analytics.

• Analytics Notes by Jacques Warren, http://www.waomarketing.com/blog


• Occam’s Razor by Avinash Kaushik, http://www.kaushik.net/avinash/
• SemAngel: Web Analytics and SEM Analytics by Gary Angel, http://semphonic.blogs
.com/
• Web Analytics Demystified by Eric Peterson, http://blog.webanalyticsdemystified.com/
weblog/
• Web Analytics Articles by Jim Sterne, http://www.emetrics.org/articlesbysterne.php

• • • •
89

References

[1] S. F. Abdinnour-Helm, B. S. Chaparro, and S. M. Farmer, “Using the End-User Comput-


ing Satisfaction (EUCS) Instrument to Measure Satisfaction with a Web Site,” Decision
Sciences, vol. 36, pp. 341–364, 2005. doi:10.1111/j.1540-5414.2005.00076.x
[2] G. Abdulla, B. Liu, and E. Fox, “Searching the World-Wide Web: Implications from
Studying Different User Behavior,” in the World Conference of the World Wide Web, Internet,
and Intranet, Orlando, FL, 1998, pp. 1–8.
[3] S. E. Aldrich, “The Other Search: Making the Most of Site Search to Optimize the Total
Customer Experience, ” 6 June 2006, retrieved 14 May 2009 from http://www.docuticker
.com/?p=5508.
[4] H. Aljifri and D. S. Navarro, “Search engines and privacy,” Computers & Security, vol. 23,
pp. 379–388, 2004. doi:10.1016/j.cose.2003.11.004
[5] S. Ansari, R. Kohavi, L. Mason, and Z. Zheng, “Integrating E-Commerce and Data
Mining: Architecture and Challenges,” IEEE International Conference on Data Mining,
pp. 27–34, 2001. doi:10.1109/ICDM.2001.989497
[6] A. Avinash, “Bounce Rate: Sexiest Web Metric Ever?,” 26 June 2007, retrieved 15 May
2009 from http://www.mpdailyfix.com/2007/06/bounce_rate_sexiest_web_metric.html.
[7] R. Baeza-Yates, L. Caldeŕon-Benavides, and C. Gonźalez, “The Intention Behind Web
Queries,” in String Processing and Information Retrieval (SPIRE 2006), Glasgow, Scotland,
2006, pp. 98–109. doi:10.1007/11880561_9
[8] R. Baeza-Yates and C. Castillo, “Relating Web Characteristics” [in Spanish], October 2000,
retrieved 15 July 2002, from http://www.todocl.cl/stats/rbaeza.pdf.
[9] R. Baeza-Yates and C. Castillo, “Relating Web Structure and User Search Behavior,” in
10th World Wide Web Conference, Hong Kong, China, 2001, pp. 1–2.
[10] R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, “The
Impact of Caching on Search Engines,” in 30th annual international ACM SIGIR conference
on Research and development in information retrieval, Amsterdam, The Netherlands, 2007,
pp. 183–190. doi:10.1145/1277741.1277775
90 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

[11] R. Baeza-Yates and A. Tiberi, “The Anatomy of a Large Query Graph,” Journal of Physics A:
Mathematical and Theoretical, vol. 41, pp. 1–13, 2008. doi:10.1088/1751-8113/41/22/224002
[12] S. Bains, “End-User Searching Behavior: Considering Methodologies,” The Katharine
Sharp Review, vol. 1, http://www.lis.uiuc.edu/review/winter1997/bains.html, 1997.
[13] J. Bar-Ilan, “The Use of Web Search Engines in Information Science Research,” in Annual
Review of Information Science and Technology, vol. 33, B. Cronin, Ed. Medford, NY, USA:
Information Today, 2004, pp. 231–288. doi:10.1002/aris.1440380106
[14] J. D. Becher, “Why Metrics-Centric Performance Management Solutions Fall Short,” in
Information Management Magazine, vol. March, 2005.
[15] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder, “Hourly Anal-
ysis of a Very Large Topically Categorized Web Query Log,” in the 27th annual inter-
national conference on research and development in information retrieval, Sheffield, UK,
2004, pp. 321–328. doi:10.1145/1008992.1009048
[16] S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, and O. Frieder, “Automatic clas-
sification of Web queries using very large unlabeled query logs,” ACM Transactions on Infor-
mation Systems, vol. 25, no. 9, 2007.
[17] M. Belkin, “15 Reasons Why All Unique Visitors Are Not Created Equal,” 8 April 2006,
retrieved 15 May 2009 from http://www.omniture.com/blog/node/16.
[18] D. Blecic, N. S. Bangalore, J. L. Dorsch, C. L. Henderson, M. H. Koenig, and A. C. Weller,
“Using transaction log analysis to improve OPAC retrieval results,” College & Research
Libraries, vol. 59, pp. 39–50, 1998.
[19] D. L. Booth and B. J. Jansen, “A Review of Methodologies for Analyzing Websites,” in
Handbook of Research on Web Log Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Hershey,
PA: IGI, 2008, pp. 143–164.
[20] B. R. Boyce, C. T. Meadow, and D. H. Kraft, Measurement in Information Science. Orlando,
FL: Academic Press Inc., 1994.
[21] N. Brooks, “The Atlas Rank Report I: How Search Engine Rank Impacts Traffic,” July 2004,
retrieved 1 August 2004 from http://www.atlasdmt.com/media/pdfs/insights/RankReport
.pdf.
[22] N. Brooks, “The Atlas Rank Report II: How Search Engine Rank Impacts Conversions,”
October 2004, retrieved 15 January 2005 from http://www.atlasonepoint.com/pdf/Atlas
RankReportPart2.pdf.
[23] J. Burby, “Build a Solid Foundation With Key Performance Indicators, Part 1: Lead-
Generation Sites,” 20 July 2004, retrieved 30 May 2009 from http://www.clickz.com/
showPage.html?page=3382981.
[24] J. Burby and S. Atchison, Actionable Web Analytics: Using Data to Make Smart Business Deci-
sions. Indianapolis, IN: Wiley, 2007.
REFERENCES 91

[25] F. Cacheda and Á. Viña, “Experiences Retrieving Information in the World Wide Web,”
in 6th IEEE Symposium on Computers and Communications, Hammamet, Tunisia, 2001,
pp. 72–79. doi:10.1109/ISCC.2001.935357
[26] K. Chamberlain, “What is Grounded Theory?,” 6 November 1995, retrieved 17 September
2005 from http://kerlins.net/bobbi/research/qualresearch/bibliography/gt.html.
[27] H.-M. Chen and M. D. Cooper, “Stochastic modeling of usage patterns in a web-based
information system,” Journal of the American Society for Information Science and Technology,
vol. 53, pp. 536–548, 2002. doi:10.1002/asi.10076
[28] H.-M. Chen and M. D. Cooper, “Using clustering techniques to detect usage patterns in a
Web-based information system,” Journal of the American Society for Information Science and
Technology, vol. 52, pp. 888–904, 2001. doi:10.1002/asi.1159
[29] C. Choo, B. Detlor, and D. Turnbull, “A Behavioral Model of Information Seeking on the
Web: Preliminary Results of a Study of How Managers and IT Specialists Use the Web,”
in 61st Annual Meeting of the American Society for Information Science, Pittsburgh, PA, 1998,
pp. 290–302.
[30] A. Chowdhury and I. Soboroff, “Automatic Evaluation of World Wide Web Search Ser-
vices,” in 25th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, Tampere, Finland, 2002, pp. 421–422. doi:10.1145/564376.564474
[31] M. D. Cooper, “Design considerations in instrumenting and monitoring Web-based in-
formation retrieval systems,” Journal of the American Society for Information Science, vol. 49,
pp. 903–919, 1998.
[32] V. Cothey, “A longitudinal study of World Wide Web users’ information searching behav-
ior,” Journal of the American Society for Information Science and Technology, vol. 53, pp. 67–78,
2002. doi:10.1002/asi.10011
[33] M. Couper, “Web surveys: A review of issues and approaches,” Public Opinion Quarterly, vol.
64, pp. 464–494, 2000.
[34] W. B. Croft, R. Cook, and D. Wilder, “Providing Government Information on the Internet:
Experiences with THOMAS,” in the Digital Libraries Conference, Austin, TX, 1995, pp. 19–24.
[35] D. A. Dillman, Mail and Telephone Surveys. New York: John Wiley & Sons, 1978.
[36] M. C. Drott, “Using Web Server Logs to Improve Site Design,” in the 16th Annual Interna-
tional Conference on Computer Documentation, Quebec, Canada, 1998, pp. 43–50. doi:10.11
45/296336.296350
[37] S. T. Dumais, “Web Experiments and Test Collections,” 7–11 May 2002, retrieved 20 April
2003 from http://www2002.org/presentations/dumais.pdf.
[38] N. Eiron and K. McCurley, “Analysis of Anchor Text for Web Search,” in the 26th Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval,
Toronto, Canada, 2003, pp. 459–460. doi:10.1145/860435.860550
92 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

[39] eVision, “Websites that convert visitors into customers: Improving the ability of your Website
to convert visitors into inquiries, leads, and new business,” 27 September 2007, retrieved 15
May 2009 from http://www.evisionsem.com/marketing/webanalytics.htm.
[40] A. Fink, The Survey Handbook (Vol. 1). Thousands Oaks, CA: Sage Publications, 1995.
[41] FoundPages, “Increasing Conversion Rates,” 25 October 2007, retrieved 15 May 2009 from
http://www.foundpages.com/calgary-internet-marketing/search-conversion.html.
[42] I. Fourie, “A Review of Web Information-seeking/searching studies (2000–2002): Implica-
tions for Research in the South African Context,” in Progress in Library and Information
Science in Southern Africa: 2d Biannial DISSAnet Conference, Pretoria, South Africa, 2002,
pp. 49–75.
[43] F. J. Fowler, Improving Survey Questions: Design and Evaluation (Vol. 38). Thousand Oaks,
CA: Sage Publications, 1995.
[44] B. Glaser and A. Strauss, The Discovery of Grounded Theory: Strategies for Qualitative Re-
search. Chicago, IL: Aldine Publishing, 1967.
[45] A. M. Graziano and M. L. Raulin, Research Methods: A Process of Inquiry, 5th ed. Boston:
Pearson, 2004.
[46] M. Greenfield, “Use Web Analytics to Improve Profits for New Year: Focus on Four Key
Statistics,” 1 January 2006, retrieved 14 May 2009 from http://www.practicalecommerce
.com/articles/132/Use-Web-Analytics-to-Improve-Profits-for-New-Year/.
[47] J. R. Griffiths, R. J. Hartley, and J. P. Willson, “An improved method of studying user-
system interaction by combining transaction log analysis and protocol analysis,” Information
Research, vol. 7, http://InformationR.net/ir/7-4/paper139.html, 2002.
[48] M. Hancock-Beaulieu, “Interaction in information searching and retrieval,” Journal of Doc-
umentation, vol. 56, pp. 431–439, 2000.
[49] M. Hancock-Beaulieu, S. Robertson, and C. Nielsen, “Evaluation of online catalogues: an
assessment of methods (BL Research Paper 78),” The British Library Research and Devel-
opment Department, London, 1990.
[50] M. H. Hansen and E. Shriver, “Using Navigation Data to Improve IR Functions in the
Context of Web Search,” in the Tenth International Conference on Information and Knowledge
Management, Atlanta, GA, 2001, pp. 135–142. doi:10.1145/502585.502609
[51] E. Hargittai, “Beyond logs and surveys: In-depth measures of people’s web use skills,” Jour-
nal of the American Society for Information Science and Technology, vol. 53, pp. 1239–1244,
2002. doi:10.1002/asi.10166
[52] E. Hargittai, “Classifying and coding online actions,” Social Science Computer Review, vol.
22, pp. 210–227, 2004. doi:10.1177/0894439303262560
REFERENCES 93

[53] K. Hawkey, “Privacy Issues Associated with Web Logging Data,” in Handbook of Research on
Web Log Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Hershey, PA: IGI, 2008, pp. 80–99.
[54] D. He, A. Göker, and D. J. Harper, “Combining evidence for automatic Web session iden-
tification,” Information Processing & Management, vol. 38, pp. 727–742, September 2002.
doi:10.1016/S0306-4573(01)00060-7
[55] D. Hilbert and D. Redmiles, “Agents for Collecting Application Usage Data Over the
Internet,” in Second International Conference on Autonomous Agents (Agents ’98), Minneapo-
lis/St. Paul, MN, 1998, pp. 149–156. doi:10.1145/280765.280793
[56] D. M. Hilbert and D. F. Redmiles, “Extracting usability information from user interface
events,” ACM Computing Surveys, vol. 32, pp. 384–421, 2000. doi:10.1145/371578.371593
[57] C. Hölscher and G. Strube, “Web search behavior of Internet experts and newbies,” Inter-
national Journal of Computer and Telecommunications Networking, vol. 33, pp. 337–346, 2000.
doi:10.1016/S1389-1286(00)00031-1
[58] O. R. Holst, Content Analysis for the Social Sciences and Humanities. Reading, MA: Perseus
Publishing, 1969.
[59] I. Hsieh-Yee, “Research on Web search behavior,” Library & Information Science Research,
vol. 23, pp. 168–185, 2001. doi:10.1016/S0740-8188(01)00069-X
[60] C.-K. Huang, L.-F. Chien, and Y.-J. Oyang, “Relevant term suggestion in interactive web
search based on contextual information in query session logs,” Journal of the American Society
for Information Science and Technology, vol. 54, pp. 638–649, 2003. doi:10.1002/asi.10256
[61] M.-H. Huang, “Designing Website attributes to induce experiential encounters,” Computers
in Human Behavior, vol. 19, pp. 425–442, 2003. doi:10.1016/S0747-5632(02)00080-8
[62] E. K. R. E. Huizingh, “The antecedents of Web site performance,” European Journal of
Marketing, vol. 36, pp. 1225–1247, 2002. doi:10.1108/03090560210445155
[63] M. Jackson, “Analytics: Deciphering the Data,” 22 January 2007, retrieved 13 May 2009
from http://www.ecommerce-guide.com/resources/article.php/3655251.
[64] B. J. Jansen, “The Methodology of Search Log Analysis,” in Handbook of Research on Web Log
Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Hershey, PA: IGI, 2008, pp. 100–123.
[65] B. J. Jansen, “Paid search,” IEEE Computer, vol. 39, pp. 88–90, 2006. doi:10.1109/
MC.2006.243
[66] B. J. Jansen, “Search log analysis: What is it; what’s been done; how to do it,” Library and
Information Science Research, vol. 28, pp. 407–432, 2006. doi:10.1016/j.lisr.2006.06.005
[67] B. J. Jansen and M. D. McNeese, “Evaluating the effectiveness of and patterns of interac-
tions with automated searching assistance,” Journal of the American Society for Information
Science and Technology, vol. 56, pp. 1480–1503, 2005. doi:10.1002/asi.20242
94 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

[68] B. J. Jansen, T. Mullen, A. Spink, and J. Pedersen, “Automated gathering of Web informa-
tion: An in-depth examination of agents interacting with search engines,” ACM Transac-
tions on Internet Technology, vol. 6, pp. 442–464, 2006. doi:10.1145/1183463.1183468
[69] B. J. Jansen and U. Pooch, “Web user studies: A review and framework for future work,” Jour-
nal of the American Society of Information Science and Technology, vol. 52, pp. 235–246, 2001.
[70] B. J. Jansen and A. Spink, “An Analysis of Web Information Seeking and Use: Documents
Retrieved Versus Documents Viewed,” in 4th International Conference on Internet Comput-
ing, Las Vegas, NV, 2003, pp. 65–69.
[71] B. J. Jansen and A. Spink, “How are we searching the World Wide Web? A compari-
son of nine search engine transaction logs,” Information Processing & Management, vol. 42,
pp. 248–263, 2005.
[72] B. J. Jansen, A. Spink, C. Blakely, and S. Koshman, “Web searcher interactions with the
Dogpile.com meta-search engine,” Journal of the American Society for Information Science and
Technology, vol. 58, pp. 1875–1887, 2006.
[73] B. J. Jansen, A. Spink, and T. Saracevic, “Real life, real users, and real needs: A study
and analysis of user queries on the Web,” Information Processing & Management, vol. 36,
pp. 207–227, 2000. doi:10.1016/S0306-4573(99)00056-4
[74] B. J. Jansen, I. Taksa, and A. Spink, “Research and Methodological Foundations of Transac-
tion Log Analysis,” in Handbook of Research on Web Log Analysis, B. J. Jansen, A. Spink, and
I. Taksa, Eds. Hershey, PA: IGI, 2008, pp. 1–17.
[75] K. J. Jansen, K. G. Corley , and B. J. Jansen, “E-Survey Methodology: A Review, Issues, and
Implications,” in Encyclopedia of Electronic Surveys and Measurements (EESM), J. D. Baker
and R. Woods, Eds. Hershey, PA: Idea Group Publishing, 2006, pp. 1–8.
[76] M. Jeong, H. Oh, and M. Gregoire, “Conceptualizing Web site quality and its consequences
in the lodging industry,” Hospitality Management, vol. 22, pp. 161–175, 2003. doi:10.1016/
S0278-4319(03)00016-1
[77] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay, “Accurately Interpreting
Clickthrough Data as Implicit Feedback,” in 28th annual international ACM SIGIR confer-
ence on research and development in information retrieval, Salvador, Brazil, 2005, pp. 154–161.
doi:10.1145/1076034.1076063
[78] S. Jones, S. Cunningham, and R. McNab, “Usage Analysis of a Digital Library,” in the Third
ACM Conference on Digital Libraries, Pittsburgh, PA, 1998, pp. 293–294. doi:10.1145/2766
75.276739
[79] A. Kaushik (2006, 13 November). Excellent Analytics Tip #8: Measure the Real Conver-
sion Rate & ‘Opportunity Pie,’ from http://www.kaushik.net/avinash/2006/11/excellent-
analytics-tip-8-measure-the-real-conversion-rate-opportunity-pie.html.
REFERENCES 95

[80] A. Kaushik, Web Analytics: An Hour a Day. Indianapolis, IN: Wiley, 2007.
[81] J. Kay and R. C. Thomas, “Studying long-term system use,” Communications of the ACM,
vol. 38, pp. 61–69, 1995. doi:10.1145/213859.214799
[82] H.-R. Kea, R. Kwakkelaarb, Y.-M. Taic, and L.-C. Chen, “Exploring behavior of E-journal
users in science and technology: Transaction log analysis of Elsevier’s ScienceDirect OnSite
in Taiwan,” Library & Information Science Research, vol. 24, pp. 265–291, 2002. doi:10.1016/
S0740-8188(02)00126-3
[83] C. M. Kehoe and J. Pitkow, “Surveying the territory: GVU’s Five WWW User Surveys,”
The World Wide Web Journal, vol. 1, pp. 77–84, 1996.
[84] M. Kellar, C. Watters, and M. Shepherd, “A field study characterizing Web-based informa-
tion seeking tasks,” Journal of the American Society for Information Science and Technology, vol.
58, pp. 999–1018, 2007. doi:10.1002/asi.20590
[85] S. Kiesler and L. S. Sproull, “Response effects in the electronic survey,” Public Opinion
Quarterly, vol. 50, pp. 402–413, 1986. doi:10.1086/268992
[86] S. Kim and L. Stoel, “Apparel Retailers: Website Quality Dimensions and Satisfac-
tion,” Journal of Retailing and Consumer Services, vol. 11, pp. 109–117, 2004. doi:10.1016/
S0969-6989(03)00010-9
[87] J. Kinsella and P. Bryant, “Online public access catalogue research in the United Kingdom:
An overview,” Library Trends, vol. 35, pp. 619–629, 1987.
[88] R. Korfhage, Information Storage and Retrieval. New York: Wiley, 1997.
[89] M. Koufaris, “Applying the technology acceptance model and flow theory to online con-
sumer behavior,” Information Systems Research, vol. 13, pp. 205–223, 2002. doi:10.1287/
isre.13.2.205.83
[90] J. A. Krosnick, “Survey research,” Annual Review of Psychology, vol. 50, 1999, pp. 537–367.
doi:10.1146/annurev.psych.50.1.537
[91] M. Kurth, “The limits and limitations of transaction log analysis,” Library Hi Tech, vol. 11,
pp. 98–104, 1993. doi:10.1108/eb047888
[92] R. Lempel and S. Moran, “Predictive Caching and Prefetching of Query Results in Search
Engines,” in 12th international conference on World Wide Web, Budapest, Hungary, 2003,
pp. 19–28. doi:10.1145/775152.775156
[93] M. Levene, An Introduction to Search Engines and Web Navigation. Essex: Pearson Educa-
tion, 2005.
[94] S. D. Levitt and J. A. List, “What do laboratory experiments measuring social preferences
reveal about the real world?,” Journal of Economic Perspectives, vol. 21, pp. 153–174, 2007.
doi:10.1257/jep.21.2.153
96 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

[95] S.-J. Lin, “Design Space of Personalized Indexing: Enhancing Successive Web Searching
for Transmuting Information Problems,” in Eighth Americas Conference on Information Sys-
tems, Dallas, TX, 2002, pp. 1092–1100.
[96] E. Loken, F. Radlinski, V. H. Crespi, J. Millet, and L. Cushing, “Online study behavior of
100,000 students preparing for the SAT, ACT, AND GRE,” Journal of Educational Comput-
ing Research, vol. 30, pp. 255–262, 2004. doi:10.2190/AA0M-0CK5-2LCM-B91N
[97] MarketingSherpa, “Security Logo in Email Lifts Average Order Value 28.3%,” 20 Oc-
tober 2007, retrieved 13 May 2009 from https://www.marketingsherpa.com/barrier
.html?ident=30183.
[98] K. Markey, “Twenty-five years of end-user searching, part 1: Research findings,” Journal
of the American Society for Information Science and Technology, vol. 58, pp. 1071–1081, 2007.
doi:10.1002/asi.20462
[99] K. Markey, “Twenty-five years of end-user searching, part 2: Future research directions,”
Journal of the American Society for Information Science and Technology, vol. 58, pp. 1123–1130,
2007. doi:10.1002/asi.20601
[100] N. Mason, “Customer Loyalty Improves Retention,” 6 February 2007, retrieved 30 May
2009 from http://www.clickz.com/showPage.html?page=3624868.
[101] C. McFadden, “Optimizing the Online Business Channel with Web Analytics,” 6 July
2005, retrieved 12 May 2009 from http://www.Webanalyticsassociation.org/en/art/?9.
[102] J. E. McGrath, “Methodology Matters: Doing Research in the Behavioral and Social Sci-
ences,” in Readings in Human–Computer Interaction: An Interdisciplinary Approach, 2nd ed.,
R. Baecker and W. A. S. Buxton, Eds. San Mateo, CA: Morgan Kaufman Publishers, 1994,
pp. 152–169.
[103] V. McKinney, K. Yoon, and F. Zahedi, “The measurement of web-customer satisfaction:
An expectation, and disconfirmation approach,” Information Systems Research, vol. 13,
pp. 296–315, 2002. doi:10.1287/isre.13.3.296.76
[104] R. Mehta and E. Sivadas, “Comparing response rates and response content in mail vs. elec-
tronic mail surveys,” Journal of the Market Research Society, vol. 37, pp. 429–439, 1995.
[105] D. Meister and D. Sullivan, “Evaluation of user reactions to a prototype on-line informa-
tion retrieval system: Report to NASA by the Bunker-Ramo Corporation. Report Number
NASA CR-918,” Bunker-Ramo Corporation, Oak Brook, IL, 1967.
[106] M. Miwa, “User Situations and Multiple Levels of Users Goals in Information Problem
Solving Processes of AskERIC Users,” in 2001 Annual Meeting of the American Society for
Information Sciences and Technology, San Francisco, CA, 2001, pp. 355–371.
[107] A. Montgomery and C. Faloutsos, “Identifying Web browsing trends and patterns,” IEEE
Computer, vol. 34, pp. 94–95, July 2001. doi:10.1109/2.933515
REFERENCES 97

[108] D. Nicholas, P. Huntington, N. Lievesley, and R. Withey, “Cracking the code: web log
analysis,” Online and CD ROM Review, vol. 23, pp. 263–269, 1999. doi:10.1108/14684529
910334074
[109] S. Özmutlu and F. Cavdur, “Neural network applications for automatic new topic identifica-
tion,” Online Information Review, vol. 29, pp. 34–53, 2005. doi:10.1108/14684520510583936
[110] S. Özmutlu, H. C. Özmutlu, and A. Spink, “A Study of Multitasking Web Searching,” in
the IEEE ITCC’03: International Conference on Information Technology: Coding and Comput-
ing, Las Vegas, NV, 2003, pp. 145–150. doi:10.1109/ITCC.2003.1197516
[111] S. Özmutlu, H. C. Özmutlu, and A. Spink, “Topic Analysis and Identification of Queries,”
in Handbook of Research on Web Log Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Her-
shey, PA: IGI, 2008, pp. 345–358.
[112] S. Page, “Community research: The lost art of unobtrusive methods,” Journal of Applied
Social Psychology, vol. 30, pp. 2126–2136, 2000.
[113] A. Parasuraman, D. Grewal, and R. Krishnan, Marketing Research, 2nd ed. Cincinnati, OH:
South-Western College Publishing, 1991.
[114] S. Park, H. Bae, and J. Lee, “End user searching: A web log analysis of NAVER, a Korean
web search engine,” Library & Information Science Research, vol. 27, pp. 203–221, 2005.
doi:10.1016/j.lisr.2005.01.013
[115] W. D. Penniman, “Historic Perspective of Log Analysis,” in Handbook of Research on Web
Log Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Hershey, PA: IGI, 2008, pp. 18–38.
[116] W. D. Penniman, “A Stochastic Process Analysis of Online User Behavior,” in Annual Meet-
ing of the American Society for Information Science, Washington, DC, 1975, pp. 147–148.
[117] T. Peters, “The history and development of transaction log analysis,” Library Hi Tech, vol.
42, pp. 41–66, 1993. doi:10.1108/eb047884
[118] E. Peterson, Web Analytics Demystified: A Marketer’s Guide to Understanding How Your Web
Site Affects Your Business, New York: Celilo Group Media, 2004.
[119] E. T. Peterson, “Average Order Value,” 30 July 2005, retrieved 15 May 2009 from http://
blog.webanalyticsdemystified.com/weblog/2005/07/average-order-value.html.
[120] A. Phippen, L. Sheppard, and S. Furnell, “A practical evaluation of Web analytics,” Internet
Research: Electronic Networking Applications and Policy, vol. 14, pp. 284–293, 2004. doi:10.
1108/10662240410555306
[121] J. E. Pitkow, “In Search of Reliable Usage Data on the WWW,” in the Sixth International
World Wide Web Conference, Santa Clara, CA, 1997, pp. 1343–1355. doi:10.1016/S0169-
7552(97)00021-4
[122] H. T. Pu, “An exploratory analysis on search terms of network users in Taiwan [in Chi-
nese],” Central Library Bulletin, vol. 89, pp. 23–37, 2000.
98 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

[123] QuestionPro, “Measuring Customer Loyalty and Customer Satisfaction,” n.d., retrieved
15 May 2009 from http://www.questionpro.com/akira/showArticle.do?articleID=customer-
loyalty.
[124] L. Rainie and B. J. Jansen, “Surveys as a Complementary Method to Web Log Analysis,” in
Handbook of Research on Web Log Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Hershey,
PA: IGI, 2008, pp. 1–17.
[125] L. Rainie and B. J. Jansen, “Surveys as a Complementary Method to Web Log Analysis,” in
Handbook of Research on Web Log Analysis, B. J. Jansen, A. Spink, and I. Taksa, Eds. Hershey,
PA: IGI, 2008, pp. 39–64.
[126] R. E. Rice and C. L. Borgman, “The use of computer-monitored data in information
science,” Journal of the American Society for Information Science, vol. 44, pp. 247–256, 1983.
doi:10.1002/asi.4630340404
[127] S. Y. Rieh and H. Xu, “Patterns and Sequences of Multiple Query Reformulation in Web
Searching: A Preliminary Study,” in 64th Annual Meeting of the American Society for Informa-
tion Science and Technology, 2001, pp. 246–255.
[128] N. C. Romano, C. Donovan, H. Chen, and J. F. Nunamaker, “A methodology for ana-
lyzing Web-based qualitative data,” Journal of Management Information Systems, vol. 19,
pp. 213–246, 2003.
[129] D. E. Rose and D. Levinson, “Understanding User Goals in Web Search,” in World Wide
Web Conference (WWW 2004), New York, 2004, pp. 13–19. doi:10.1145/988672.988675
[130] G. M. Rose, M. L. Meuter, and J. L. Curran, “On-line waiting: The role of download time
and other important predictors on attitude toward e-retailers,” Psychology & Marketing Re-
search, vol. 22, pp. 127–151, 2005. doi:10.1002/mar.20051
[131] N. Ross and D. Wolfram, “End user searching on the Internet: an analysis of term pair
topics submitted to the excite search engine,” Journal of the American Society for Informa-
tion Science, vol. 51, pp. 949–958, 2000. doi:10.1002/1097-4571(2000)51:10<949::AID-
ASI70>3.0.CO;2-5
[132] D. Sapir, “Online Analytics and Business Performance Management,” in BI Report, 2004.
[133] SearchCRM, “Measuring Customer Loyalty,” 9 May 2007, retrieved 15 May 2009 from
http://searchcrm.techtarget.com/general/0,295582,sid11_gci1253794,00.html.
[134] W. Sellars, “Philosophy and the Scientific Image of Man,” in Science, Perception, and Reality.
New York: Ridgeview Publishing, 1963, pp. 1–40.
[135] C. Silverstein, M. Henzinger, H. Marais, and M. Moricz, “Analysis of a Very Large Web
Search Engine Query Log,” SIGIR Forum, vol. 33, pp. 6–12, 1999. doi:10.1145/331403.331405
[136] F. Silvestri, “Mining query logs: Turning search usage data into knowledge,” Foundations
and Trends in Information Retrieval, in press.
REFERENCES 99

[137] B. F. Skinner, Science and Human Behavior. New York: Free Press, 1953.
[138] A. Spink, “Multitasking information behavior and information task switching: An explor-
atory study,” Journal of Documentation, vol. 60, pp. 336–345, 2004. doi:10.1108/002204104
10548126
[139] A. Spink, J. Bateman, and B. J. Jansen, “Searching the Web: A survey of Excite users,” Jour-
nal of Internet Research: Electronic Networking Applications and Policy, vol. 9, pp. 117–128,
1999. doi:10.1108/10662249910264882
[140] A. Spink and B. J. Jansen, Web Search: Public Searching of the Web. New York: Kluwer, 2004.
[141] A. Spink, T. Wilson, D. Ellis, and F. Ford. (1998, April 1998). Modeling Users’ Successive
Searches in Digital Environments. D-Lib Magazine. doi:10.1045/april98-spink
[142] L. S. Sproull, “Using electronic mail for data collection in organizational research,” Academy
of Management Journal, vol. 29, pp. 159–169, 1986. doi:10.2307/255867
[143] J. Sterne, “10 Steps to Measuring Website Success,” n.d., retrieved 15 May 2009 from
http://www.marketingprofs.com/login/join.asp?adref=rdblk&source=/4/sterne13.asp.
[144] J. Sterne, “Web Channel Performance Management: Aligning Web Site Vision and Strat-
egy with Goals and Tactics,” Pilot Software Inc., Mountain View, CA, 2004.
[145] S. Sudman, N. M. Bradburn, and N. Schwarz, Thinking about answers: The application of
cognitive processes to survey methodology. San Francisco: Jossey-Bass Publishers, 1996.
[146] D. Sullivan, “SpiderSpotting: When A Search Engine, Robot Or Crawler Visits,” 6
November 2001, retrieved 5 August 2003 from http://www.searchenginewatch.com/web
masters/article.php/2168001.
[147] M. Thelwall, Introduction to Webometrics: Quantitative Web Research for the Social Sciences. San
Rafael, CA: Morgan and Claypool, 2009. doi:10.2200/S00176ED1V01Y200903ICR004
[148] H. Treiblmaier, “Web site analysis: A review and assessment of previous research,” Com-
munications of the Association for Information Systems, vol. 19, pp. 806–843, 2007.
[149] A. C. B. Tse, K. C. Tse, C. H. Yin, C. B. Ting, K. W. Yi, K. P. Yee, and W. C. Hong, “Com-
paring two methods of sending out questionnaires: E-mail vs. mail,” Journal of the Market
Research Society, vol. 37, pp. 441–446, 1995.
[150] K. Waite and T. Harrison, “Consumer expectations of online information provided by bank
websites,” Journal of Financial Services Marketing vol. 6, pp. 309–322, 2002. doi:10.1057/
palgrave.fsm.4770061
[151] P. Wang, M. Berry, and Y. Yang, “Mining longitudinal web queries: Trends and patterns,”
Journal of the American Society for Information Science and Technology, vol. 54, pp. 743–758,
2003. doi:10.1002/asi.10262
[152] J. B. Watson, “Psychology as the behaviorist views it,” Psychological Review, vol. 20,
pp. 158–177, 1913. doi:10.1037/h0074428
100 UNDERSTANDING USER–WEB INTERACTIONS VIA WEB ANALYTICS

[153] E. J. Webb, D. T. Campbell, R. D. D. Schwartz, L. Sechrest, and J. B. Grove, Nonreactive


measures in the social sciences, 2nd ed. ed. Boston, MA: Houghton Mifflin, 1981.
[154] E. J. Webb, D. T. Campbell, R. D. Schwarz, and L. Sechrest, Unobtrusive Measures (Revised
Edition). Thousand Oaks, CA: Sage, 2000.
[155] WebSideStory, “Use of Key Performance Indicators in Web Analytics,” 20 May 2004, re-
trieved 10 May 2009 from www.4everywhere.com/documents/KPI.pdf.
[156] J.-R. Wen, J.-Y. Nie, and H.-J. Zhang, “Clustering User Queries of a Search Engine,” in
10th International Conference on World Wide Web, Hong Kong, 2001, pp. 162–168. doi:10.
1145/371920.371974
[157] K. White, “Unique vs. Returning Visitors Analyzed,” 10 May 2006, retrieved 31 May 2009
from http://newsletter.blizzardinternet.com/unique-vs-returning-visitors-analyzed/2006/
05/10/#more-532.
[158] D. Wolfram, “Term co-occurrence in internet search engine queries: An analysis of the
Excite data set,” Canadian Journal of Information and Library Science, vol. 24, pp. 12–33,
1999.
[159] D. Wolfram, P. Wang, and J. Zhang, “Identifying Web search session patterns using cluster
analysis: A comparison of three search environments,” Journal of the American Society for
Information Science and Technology, vol. 60, pp. 896–910, 2009. doi:10.1002/asi.21034
[160] Y. Xie and D. O’Hallaron, “Locality in Search Engine Queries and Its Implications for
Caching,” in the Twenty-First Annual Joint Conference of the IEEE Computer and Communi-
cations Societies, New York, 2002, pp. 307–317.
[161] D. Young, “Site search: increases conversion rates, average order value and loyalty,” Practical
Ecommerce, vol. 2009, 2007.
[162] L. Yu and A. Apps, “Studying e-journal user behavior using log files: The experience of Su-
perJournal,” Library & Information Science Research, vol. 22, pp. 311–338, 2000. doi:10.1016/
S0740-8188(99)00058-4
[163] W. Yuan and C. T. Meadow, “A study of the use of variables in information retrieval user
studies,” Journal of the American Society for Information Science, vol. 50, pp. 140–150, 1999.
doi:10.1002/(SICI)1097-4571(1999)50:2<140::AID-ASI5>3.0.CO;2-P, doi:10.1002/
(SICI)1097-4571(1999)50:2<140::AID-ASI5>3.3.CO;2-G
[164] G. K. Zipf, Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-
Wesley Press, 1949.
101

Author Biography

Bernard J. Jansen is an associate professor in the College of Information Sciences and Technology
at the Pennsylvania State University. Jim has more than 150 publications in the area of informa-
tion technology and systems, with articles appearing in a multidisciplinary range of journals and
conferences. His specific areas of expertise are Web searching, sponsored search, and personalization
for information searching. He is coauthor of the book, Web Search: Public Searching of the Web and
coeditor of the book Handbook of Weblog Analysis. Jim is a member of the editorial boards of eight
international journals. He has received several awards and honors, including an ACM Research
Award and six application development awards, along with other writing, publishing, research, and
leadership honors. Several agencies and corporations have supported his research. He is actively in-
volved in teaching both undergraduate and graduate level courses, as well as mentoring students in a
variety of research and educational efforts. He also has successfully conducted numerous consulting
projects. Jim resides with his family in Charlottesville, VA.
Fast Moving Consumer Goods
Analytics Framework
Point of view
Amsterdam, 2017
Key Trends impacting FMCG

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 2
Source: Deloitte university press- Consumer product trends Navigating 2020
Using Analytics to stay ahead of the game
Effective use of analytical capabilities will enable FMCG companies to cope with and even
benefit from the key trends impacting FMCG

1
Unfulfilled economic Analytics supports the shift to value by identifying key price points in the market, defining customer
recovery for core consumer segments, developing new pricing strategies based on competitive intelligence and increasing efficiency in
segments manufacturing and logistics to reduce costs

2
Health, wellness and Companies will experience greater pressure to better align offerings and activities with customer interests
responsibility as the new basis and values. Big Data and analytics help to better understand customer sentiment, preferences and behaviour.
of brand loyalty At the same time data analytics enables supply chain visibility and identifies potential risks

3
Pervasive digitization of the An increasingly larger share of consumer's spend and activity will take place through digital channels.
path to purchase Analytics is key in better understanding of purchase and consumption occasions as well as tailoring channel
experience

4
Proliferation of customization In a world where customized products and personalized, targeted marketing experiences win companies
and personalization market share, technologies like digital commerce, additive manufacturing and artificial intelligence can give a
company an edge by allowing it to create customized product offerings

5
Continued resource
Analytics can fuel a better understanding of the resource market volatility and more efficient use of critical
shortages and commodity
resources in the production process
price volatility

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 3
FMCG Analytics Framework

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 4
FMCG Analytics Framework
Analytic capabilities for better decisions across the FMCG value chain

Marketing/Sales Manufacturing Logistics

Brand Analysis Pricing Competitor Production Asset Quality Inventory Supply Chain Reverse
Strategy Intelligence Efficiency Analytics Analytics Diagnostics Diagnostics Logistics

Digital Marketing Trade Production Workforce Production Location Resource & Fulfilment
Analytics Mix ROI Promotion Forecasting Safety Planning Analytics Route Intelligence
Effectiveness optimization

Business Management & Support

Workforce Sustainability Finance Business Process Program & Portfolio


Analytics Analytics Analytics Analysis Analytics

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 5
FMCG Analytics Framework – Marketing/Sales

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 6
FMCG Value Chain – Marketing/Sales
In the Marketing/Sales process of the FMCG value chain, analyses are geared towards
improving commercial performance and customer centricity

Marketing/Sales
Digital Analytics Pricing Strategy
The online channels are of increasing importance, also in FMCG. Defining a The analysis focuses on demand variation at different price levels with
uniform digital KPI framework and building web analytics capabilities is key different promotion/rebate offers. It is used to determine optimal
to create insights into the digital performance on the ecommerce platforms. prices throughout the product/service lifecycle by customer segment.
Benefits include increasing sales margin, decreasing markdowns and
aiding inventory management.

Brand Analysis Trade Promotion Effectiveness


This analysis focuses on providing insights into the brand perception of This analysis focuses on providing insight in both the effectiveness and
a firm. With the use of (among others) sentiment analysis the firm can planning of trade promotions. These insights allow the company to improve
compare the perception of their brand with that of the main the aforementioned processes to increase sales while keeping the marketing
competitors and create a data driven brand strategy. costs at the same level.

Marketing Mix ROI Competitor Intelligence


Analyses focus on determining the effectiveness of marketing investments. Knowledge is power. Knowledge of what your competitors are doing allows
By reducing ineffective spend and intensifying high return marketing tactics, you to take action quickly in order to gain an advantage. This analysis
the marketing mix is optimized leading to higher returns on the overall focuses on obtaining this knowledge and extracting the actionable insights
marketing spend. that allow one to form data-driven competitor strategy.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 7
Case study – Digital Analytics
Defining a KPI framework and embedding it through online dashboards

Marketing

Challenge
Food company

This global Food company wanted to undergo a digital transformation. However there was little visibility on web
analytics capabilities, no accessibility to in-market web analytics, limited standards and KPI definitions and
reporting. For e-commerce there little to no online market share data available in the countries

Anonymized
Anonymized
Anonymized

Approach
Deloitte supported in defining uniform KPIs and a roadmap for implementation for both domains. Deloitte
supported in extracting web analytics data and requesting in-market data about on-line market share from the
Digital Sales countries. The first phases for both marketing and e-commerce were to develop tooling to measure and compare
digital performance across target countries for both marketing and e-commerce
Food company

• Delivered a marketing dashboard & KPI framework with global definitions


Benefits

• Delivered a (hosted) e-commerce dashboard & KPI framework with global definitions, also making web
analytics more financial by measuring the financial impact of web analytics
• Finally, the roadmaps for both marketing and e-commerce providing clear guidance on maturing in the area of
online marketing and e-commerce

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 8
Case study – Brand Analysis
Investigating brand perceptions by assessing positive and negative opinions
regarding the firm

Brand analysis

Challenge
The firm wanted to investigate its brand perception by assess positive and negative opinions regarding the firm.
They wanted to be able to highlight locations showing positive and negative perceptions. The client also wanted
to compare their firm with the main competitors in order to create a data-driven brand strategy

Brand analysis

Approach
The project involves a web spider which extracts related and unstructured data from the internet from a number
of different sources (social media, blogs, news feeds etcetera). The analysis is then carried out in a text mining
tool to process the data for sentiment related content and output the results to an interactive dashboard for
visualization

Brand analysis
Results

The results of the analysis include sentiment scores across the business areas and a root cause analysis. These
enable a real-time understanding of their online brand and identification of the differentiating factors between
positive and negative perceived programs/areas. The delivered insights can be used to determine the necessary
actions in order to promote the firm’s brand in certain programs/areas

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 9
Case study – Omni channel voice of the customer
Analysis of customer voice topics and sentiment across multiple channels

Sentiment analysis across channels

Challenge
Customers leave their voices across different channels such as company website, third party resellers, customer
service emails, telephone and social. Capturing, classifying and combining data from these channels is
challenging. Our solution enables CMOs to focus their attention where it is most required

Topic classification and frequencies

Approach
This proof-of-concept focusses on three channels (own website, third party website and social). First web
scraping used to collect raw customer voices from different channels in different markets. Then a classification
model is used to identify key topics and subtopics for each voice, another classification model is used to identify
the product(category) of the topic, and finally sentiment analysis is performed on each of the voices. The results
are visualised in an interactive dashboard

Trending topics per product / channel

• The solution provides insights into the sentiment of voices per product category, per market
Results

• Key topics are visible and trending topics can be assessed by product category, channel or market
• The solution provides a quick overview of all voices across all products, channels and markets, but also enable
drill-down to the voice level

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 10
Case study – Marketing Mix ROI
The use of combined online & offline Marketing Mix Modelling to improve the
Marketing ROI

Scenario analysis

Challenge
Deloitte was engaged in improving return on marketing spend and optimizing the advertising investment mix
with disparate departments, differing measuring systems and differing priorities to improve marketing ROI
across both offline and online channels simultaneously. This case was executed for an omnichannel retailer

Marketing ROI

Approach
First the metrics needed for the model were prioritized across products, channels, and categories. A data
warehouse was built to hold the required variables for each product that was needed to continuously run the
Marketing Mix Modelling. With all the data present, the Marketing Mix Model was developed to optimize
marketing ROI by using Scenario analysis and Optimization models. Finally the marketing ROI tracking system
was implemented to continuously track the results of the models

Marketing return curves


Results

• The most significant result was that the marketing ROI doubled over a two-year period
• To ensure recurring improvement, an investment mix allocation change was implemented
• Finally, there was also a strategy shift to target the most profitable customers

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 11
Case study – Pricing Strategy
Using analytics to reshape pricing strategies

Price per segment analysis

Years of inorganic growth and sales led customer negotiations to tailored pricing across trade customers,

Challenge
resulting in large and difficult to defend price variance across customers. Pricing differences between accounts
exposed this CPG client to downward pressure on pricing when trade partners consolidated or buyers
moved retailers
Existing pricing and trade terms structure were not compliant with internal accounting standards
Margin analysis

Approach
Deloitte developed a consistent, commercially justifiable list of pricing and trading terms. The potential impact of
Profitability analysis new pricing and terms on customers was assessed and a high-level roadmap for execution was established. The
business is supported in the preparation for the implementation of the new pricing and trading terms

Price Waterfall • Single common list price for each product


Benefits

• Revised ‘pricing waterfall’ and trade terms framework


• Customer and product level impact analysis
• Trade communication strategy
• High-level implementation roadmap

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 12
Case study – Trade Promotion Effectiveness
Building a shared reporting and analysis solution that allows for
trade promotion evolution

Trade promotion analysis

Challenge
A client’s desired end state with regard to BI was a single integrated and shared reporting and analysis solution;
delivering value in a single version of the truth throughout the organization. As part of this solution they wanted
to gain insight in trade promotion effectiveness through two key dimensions, promotional performance and
promotional planning. This case was executed for a CPG client

Account performance

Approach
Interviews within the company showed that trade promotion management & evaluation is not a focus on
corporate level, but very important on regional level. In order to create a cohesive overview into trade marketing
effectiveness across different dimension (regions, channels, categories, products & sales person) Deloitte had to
tie several data sources together, such as GFK panel data, Nielsen scanning data, IRI data and the client’s own
factory data

Budget analysis

• A tool that allows the client to evaluate trade promotion performance. This way they can evaluate the success
Results

of promotions and the drivers behind it


• A tool that allows the client to evaluate trade promotional planning. As a result they can easily gain insight
into what they are executing according to the year’s promotion plan and whether the trade promotion spend
and discounts are in check with planning & budget

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 13
Case study – Competitor Intelligence
Creating an overall view of the category market post

Overall view

It is important to understand how products are offered to the end consumers via the different retail outlets.

Challenge
Therefore, understanding the competitive market of suppliers as well as retailers is key
The aim of this initiative is to combine disparate data sources in order to develop a solid understanding of the
market position on individual product and category level. This case was executed from a retailer’s point of view
but can be directly applied to FMCG companies

Product category dashboard

Developing a workflow tool to obtain an overall view of the market as well as an interactive dashboard on

Approach
product sales and market positioning, by identifying and combining different data sources such as:
• Internal market sales & market research
• Third party (retailers) sales data (e.g. Nielsen)
• External data sources

Dashboard Deepdive

• Overall view of the market positioning on individual product/category level


Results

• Ability to focus on root cause analyses for positive or negative developments in product/market sales using
interactive dashboarding
• Uncover relative market positions of product groups vis a vis main competitors

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 14
FMCG Analytics Framework – Manufacturing

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 15
FMCG Value Chain – Manufacturing
In the Manufacturing process of the FMCG value chain, analyses are focused on
optimizing production processes taking in consideration forecasting, planning, efficiency
and risk exposure

Manufacturing
Production Forecasting Optimization Asset Analytics
Analyses focus on the evaluation of promotion forecasting based on a Analyses focus on the prediction of the lifetime of long term assets such as
measurement framework of forecasting accuracy/error, bias and building, large machinery and other structural elements. This is done by
stability. Improving forecasting accuracy can potentially lead to calculating the influence of for instance weather, material and usage of
reductions in excess inventory, lower labour costs, lower expedite the assets.
costs, holding costs, spoilage discounts and reduced stock-outs.

Production Efficiency Production Planning


Analyses focus on proactively addressing challenges such as the increase of Analyses focus on the support of delivering more scientific and data based
efficiency and reduction of costs, but also to help identify opportunities for real time contingency plans by generating optimal solutions in short time
consolidating facilities and determine outsourcing and offshore transfer windows after certain disruptions happen.
solutions for international and domestic organizations.

Workforce Safety Quality Analytics


Analyses focus on the identification of the key factors impacting safety Analyses focus on identifying the high impact issues and understanding
related incidents, the design of measurable interventions to minimize a facility’s past performance. The solution consolidates information
safety risk and the prediction of types of person(s) who are most at allowing a better understanding of the organization’s scope drilling
risk to get hurt. down to a single facility to make actionable decisions.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 16
Case study – Production Forecasting Optimization
Production forecasting is a key capability for many manufacturers, improving
forecasting performance is vital to improve product stock-out, while decreasing costs
due to excess inventory

Extra-large error every


x periods may indicate
Accuracy consistently incorrect

Challenge
assumption(s)
Accurate forecasting is a key ability to ensure competitive advantages for every manufacturer. Improving
forecasting capability should be a continuous effort in which periodic or continuous forecasting performance
Autocorrelation
evaluation is an important element. Forecasting demand in FMCG is challenging due to three main reasons: (1)
Demand noise and volatility of demand in market (2) introduction of new products and (3) product promotions
Planner Level of
Accuracy
LAG
Received
Statistical by Supply
Forecast Planning

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Approach
accuracy Promotion forecasting evaluation is performed based on a three-pronged measurement framework. Performance
is measured in terms of (1) forecasting accuracy (or forecasting error), (2) forecasting bias and (3) forecasting
stability. For each of these measurements, several metrics exist and care should be taken to use the most
Production forecast suitable performance metric
performance

Bias stability
Results

Noise Bias Period 1 Period 2


Improving forecasting accuracy can lead to reduction of excess inventory, lower labor costs, lower expedite cost,
holding cost, spoilage discount and reduce stock-outs
+ + + +
- - - -
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 17
Approach – Production Efficiency
Analytics is imperative to quickly and comprehensively evaluate your production
process, identify opportunities for improvement and customize solutions that quickly
drive measurable results

Throughput statistics

Challenge
At any stage of a company’s evolution, improving operating performance is important. Lean methodologies
applied to nearly any organization enable an efficient and lean enterprise. Analytics can support manufacturers
to proactively address the challenges they face today. If applied correctly, analytics can become a major driver
for Lean Six Sigma and other process improvement disciplines seeking to increase efficiency and reduce costs

Analytics assist management teams to devise the appropriate process control strategy and support its
implementation.

Approach
Different methods are applied to uncover potential inefficiency and cost reduction opportunities such as:
Value added activities • Outlier detection
• Predictive modelling
• Scenario modelling
• Optimization & simulation

Key results of production efficiency analytics include;


Results

• identifying opportunities for consolidating facilities, outsourcing and off shore transfer solutions
• identifying unprofitable product lines for manufacturing operations
• reducing idle time for production facilities
• reducing defects and waste

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 18
Case study – Workforce Safety Analytics
Thorough understanding of the dynamics of workplace incidents through the use of
advanced analytics

Injury Risk Profiles

Traditional safety analytics defines the scale of the safety problem, but routinely lacks the insights as to why

Challenge
those safety events occur. A strategic safety profiling analysis can:
• Objectively identify the key factors and behaviours that impact safety related incidents and then design
measurable interventions to minimize safety risk
• Use the profiling model to predict which type of person(s) will get hurt and which employees are most at risk

Injury & Accidents by site

Approach
Over 1.000 unique employees over three years of employee or contractor related data sets have been analysed.
Next, a model is estimated based on this data and the results have been visualized in a dashboard

Injury Impact Driver

• Reduced overall safety risk profile and associated disruption costs


Results

• Actionable and targeted recommendations regarding what operational changes to consider to help
minimise incidents
• Ability to track, measure and report of the effectiveness of the safety compliance program and internal efforts
to minimise risk

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 19
Case study – Asset Analytics
Asset Analytics enables effective decision making by identification and quantification
of asset-related risks

Data exploration

Challenge
For a water distribution utility company, Deloitte developed a model to predict maintenance of pipes. Asbestos
cement pipes may fail due to deterioration caused by lime aggressive water, in combination with other factors
such as traffic loads, point loads and root growth. Errors could have major consequences for the water utility,
customer satisfaction, safety and the environment

Pattern recognition

Approach
During a five week project, asset data such lime aggressiveness of the water, diameter, wall thickness and age of
the pipes was combined with geographical data such as region, soil type and pH and groundwater level. Based
on this dataset, 3 predictive models were trained and evaluated to predict the deterioration of the cement pipes
due to lime aggressive water

Model evaluation
Results

The analysis revealed which asset properties and geographical variables were most informative in the prediction
of pipe failure. Combined with information about the consequences of pipe failure, a quantitative risk model for
the failure of cement pipes could be developed

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 20
Case study – Production Planning
By taking into account certain production planning variables this analysis enables
real time contingency planning for a complex, multi-layered network in case of disruptions

Network visualization

Challenge
Analytics is supporting production planners to proactively address possible unforeseen planning challenges.
This analysis enables real time contingency planning for a complex, multi-layered supply chain network when
certain disruption happen by taking into consideration information about cost, service level, and historical
disruption durations

Real-time contingency input

Approach
An optimal routing plan for a supply chain network is generated under normal conditions using network
programming with the following input: manufacturing costs, capacity and the customer demand of retailers.
Disruptions are real-time resulting in a better suited contingency plan, which enables cost reductions

Trade-off evaluation
Results

Compared to traditional predefined contingency plans, a real time contingency plan is set-up (also incorporating
the considerations of current supply chain status, including initial stock, utilization rate, etc.) to achieve the
expected customer service level with cost efficiency

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 21
Case study – Quality Analytics
Quality Analytics enables to filter to high impact issues and understand a facility’s
past performance

Facilities overview

Challenge
The client was an organization responsible for assessing the security compliance of a large number of
organizations. Disparate reporting and data collection techniques made it difficult for staff and leadership to
prioritize action and identify problem areas

Region overview

Approach
The dashboard gathered all facility information consistently, provided the ability to filter to high impact issues
and understand a facility’s past performance. The solution consolidates all the organizations information that
allows the user to understand the scope of their organization while also being able to drill down to a single
facility in order to make actionable decisions

Facility prioritization tool


Results

The solution provides views for the three types of individuals in the organization (Representative, Field office,
and Regional Manager) as well as prioritization tools and facility details. The tool allows an individual user to
focus on high priority facilities, but with changing definitions of priority. In addition each user can see all the
information they need to understand the scope of their assignments and make decisions

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 22
FMCG Analytics Framework – Logistics

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 23
FMCG Value Chain – Logistics
In the Logistics process of the FMCG value chain, analyses are focused on optimizing
delivery, shipments and warehousing performances

Logistics
Location Analytics Supply Chain Diagnostics
This type of analysis helps solve the problem on what the optimal location is Supply chain diagnostics aims at enabling and improving the ability to
for a certain facility, based on geographical data. As an example, the fire view every item (Shipment, Order, SKU, etc.) at any point and at all
department would want their facilities to be spread throughout a city, so times in the supply chain. Furthermore its goal is to alert on process
that a fire at any point in the city can be reached with an acceptable exceptions, to provide analytics, and to analyse detailed supply chain
response time. data to determine opportunities of cycle time reduction.

Inventory Diagnostics Fulfilment Intelligence


What is the optimal inventory level that on the one hand makes sure Focuses on increased reliability of purchase order submission process
that the customer receives their goods on time, and on the other hand until delivery. Analysing supply chain for identification of common or
ensure that the holding and ordering costs are as low as possible? The consistent disruptions in fulfilment of orders. Reliability is key, even more
goal of the analysis is to solve this problem for the client. so than speed.

Resource & Route Optimization Reverse Logistics


The goal of the model is to optimize the available resources and truck In case of malfunctioning products, companies have to deal with the process
routes. This is executed to maximize profitability by implementing the of reverse logistics. By getting more detailed insights in the costs of this
new optimized route planning model which leads to a reduction of the process, companies can have a better focus on how to reduce these costs.
resource usage.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 24
Approach – Location Analytics
Find whitespots in distribution centers locations

Current distribution
locations

Challenge
Distribution
center A company the Netherlands wanted to expand their business. They want to improve delivery times to the store
location by creating one or more extra distribution centres in the Netherlands. The centres should be placed in
Distribution locations such that they get maximum value in lower delivery times, now and in the near future
center

Determine travel-time
to centers

Approach
For the approach we start from the current distribution centre locations. From these locations we can calculate
the traveling time to stores using Dijkstra’s algorithm. This gives us for each location on the map the travel time
to a distribution centre. These results can then be visualized in a heatmap to immediately locate whitespots in
the store distribution. Furthermore an optimization algorithm was run to determine the optimal distribution of
distribution centres

Visualize and
monitor results
Results

With our results and the new locations the fire departments were able to:
• Significantly reduce the response time, saving lives and reducing costs at the same time
• Reduce the total number of fire departments, while giving better response time performance

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 25
Case study – Inventory Diagnostics
Delivering a robust and user friendly Global Transit Planning Tool

Lane detail

To empower transportation personnel to more efficiently analyse ocean and air supply chain shipment data, a

Challenge
global operating company internally designed a Global Transit Planning (GTP) tool in Tableau. However, the
tool did not achieve high user adoption, since analyses were not intuitive and high manual data updates
were required
The Deloitte team was asked to enhance the tool and incorporate a robust data blending process

Carrier rank

Approach
Enhancing the GTP dashboard and blending the data was achieved in four subsequent phases consisting of:
research, visioning, prototyping and iterating
In the prototyping phase, the team built and refined the dashboards and wrote a Python script which indicates
how the various data sources should feed into the unified view of data

Carrier comparison
Results

The existing GTP tool was adjusted to provide maximum flexibility, automation and collaboration. The user flow
allows users to interact in one cohesive interface, while providing tailored information to their specific role.
The redesigned GTP tool is now well adopted within the organization and used on monthly basis to enable more
effective inventory planning decisions, resulting in the gradual and continuous reduction of in-transit inventory

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 26
Case study – Resource & Route optimization
Maximizing profitability by optimizing resource planning and route optimization

Clear understanding of
all steps

Challenge
A Dutch client that handled waste disposal for large companies struggled with its profitability. After analysis it
was confirmed the one of the key issues was the suboptimal resource planning. Resource planning of trucks and
drivers was done manually, even sometimes by the drivers themselves. The client asked Deloitte to develop and
system for finding optimum routes for their trucks

Approach
First Deloitte created an overview of all different customer locations, the number of available trucks per location,
the working hours, pickup points. Next we calculated the drive time matrix between the different locations.
Subsequently Deloitte created a model that would use a customized ‘cost function’ in which weights could be
given to driving time and driving distance. The cost function would then be optimized and by doing so, providing
the optimal routing for each truck for each day

Optimization process
Results

The model Deloitte created was able to plan to optimal routes for the different trucks much faster and efficiently
than the client was able to do. The new route planning model showed that it was possible to significantly reduce
resource usage – it was possible to sell trucks without loss of client service and satisfaction

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 27
Case study – Supply Chain Diagnostics
Provide insight into key drivers of delivery in full and on time and improving
coverage throughout the supply chain

Clusters of coverage rates and insights visualized in a dashboard


In this particular company, millions of products are continuously being produced and

Challenge
shipped to distribution centres around the globe. In order to satisfy customer demand in
time, it is necessary that the coverage is in order, i.e. the percentage of the products
that arrives at the distribution centres on time and in full. In order to improve the
coverages and meet the set targets, the company wanted insight into the drivers that
most influence the coverage and eventually also the delivery in full and on time.
Therefore they asked Deloitte to perform a detailed analysis on their data

Approach
Collected the ‘15 week coverage rate’ for full year of orders. A clustering technique was
used to cluster 26,000 coverage rates. This technique groups the coverage patterns in
Collected coverage patterns transformed into the identified clusters
buckets of similar patterns, which then comprise a single cluster. Eight buckets of
different coverage patterns were visualized and these buckets gave insight in the drivers
of the coverage for the orders
Coverage patterns of 26000 orders
Coverage

Extracted key insights with incremental business potential such as:

Results
• Carrier performance has the largest impact on the coverage
• Good coverage is usually caused by slack in factory performance
• Identified significant number of orders that were only slightly (1–7 days) late and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 could be quick wins
Week • Actionable insights to improve process and areas of the order pipeline

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 28
Case study – Fulfilment Intelligence
Gaining insights into the digital order pipeline to improve order fulfilment and
speed of delivery

Clear understanding of
all steps

Challenge
Over the last years, online sales channels have become more and more important for companies. With the
increase of online channels, however, customers have become more demanding in terms of delivery time and
service. Reliability is therefore extremely important, even more so than speed. Therefore a large company asked
Deloitte to create a clear picture of the Direct-to-consumers online purchase order submission process through
the different systems and increase the reliability of this process

Deviation from reference


time per order per step
Of the 70,000 total submissions, roughly 70% was completed within the allowed time. 22% was completed at

Approach
around roughly twice the maximum amount of time, 7% within about 6 times the maximum and 1% took even
longer. The analysis was focused on the group that was completed in twice the maximum time (22%) which held
the largest opportunity to identify the delay drivers. Timestamps were created for different stages in the order
submission process combined from multiple source systems. A clustering on deviation from the reference per
time stamp was performed

Optimization opportunities
Results

The analysis led to the identification of several steps within the process that could be improved with low effort
for a relatively high gain. In total more than 20 improvements were made based on the analysis results, leading
not only to a more reliable order submission process, but also to an average time reduction per order of 50%. As
a results, customer satisfaction and loyalty increased

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 29
Case study – Reverse Logistics
Reducing costs on reverse logistics by analysing end-to-end process

Sankey diagram of
product flow

Challenge
A global technology firm struggled with high costs on their service logistics. The scope of service logistics
consisted of shipping parts to client sites and take care of returning the defective parts to global re-
manufacturing sites. Clients were served with premium service levels (i.e. <4 hour recovery). Deloitte was asked
to make a fact based assessment of the service logistics process and advice how costs could be reduced

Approach
During the process the reasons why customers contacted the service desk were analysed, it turned out that 80%
of the problems could be resolved by online support. From the remaining 20%, 80% of the problems could be
Overview of quick wins resolved by the second line support. For the 4% that could not be resolved this way, a replacement needed to be
Results sent. After inspection, it turned out that half of these returned units actually did not have any malfunctions

The result of the analysis was that the main opportunity for savings was not in the cost for logistics (driven by
the stringent service levels and unpredictable failure rates), but was found in avoiding cost (i.e. reducing the
number of replaced products that turned out to be non-defective). This savings should be realized by
continuously improving online information and the customer services departments

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 30
FMCG Analytics Framework – Business Management & Support

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 31
FMCG Value Chain – Business Management & Support
In the Support process of the FMCG value chain analyses are focused on determining
potential improvements in the organization

Business Management & Support


Workforce Analytics Business Process Analytics
Encompasses workforce planning and analytics across all phases of the Help clients to understand their risk exposure better, and to proactively
talent lifecycle. The workforce planning component provides insights identify and mitigate sources of risk on an enterprise scale. Armed with this
and foresight into addressing current and future talent segment information, executive management and boards will be better equipped to
related challenges and development. Moreover, this offering applies navigate challenging economic conditions.
analytics solutions to key talent processes.

Sustainability Analytics Program/portfolio analytics


Helps clients with sustainability related strategies such as assessing Enables clients to model their program/portfolio performance y providing fact
future environmental and health impacts. Using an overview of the based insight into the performance of the total portfolio down to project
most important resources and an insights in the product lifecycle, a level. Among other things, It allows clients to prioritize projects better,
prioritisation can be made which product categories are most at risk identify potential budget overruns in an early stage and optimize resource
and which show the most potential. allocation.

Finance Analytics
• Working capital, spend analytics, double payment, risk and tax analyses
• Helping clients to get control of their financial data, finance analytics
enable clients to model business processes and gain deeper insight into
cost and profitability drivers.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 32
Case study – Workforce Analytics
Strategic Workforce Planning: planning the talent needed for sustainable growth

Clear process to ensure evidence


based outcomes

Challenge
Clients experience a continuously changing environment in which they have to operate. Within this environment
new products and new sales channels are discovered. In order to be able to gain full advantage of these new
opportunities a variety of new skills within the workforce are needed

Using data from different sources such as People-, Customers-, Work- and Finance data, insights can be

Approach
derived in:
• Identifying critical workforce segments. Mapping segments/skills that drive a disproportionate amount of value
Workforce planning tools
creation in comparison to their peers
• Identifying current demand drivers and defining a demand model
• Defining and executing a workforce planning to analyze gaps in the current supply and demand for critical
workforce

Clients get a view of how they should move from the current workforce to the workforce needed in 5 years
Results

from now
The approach used makes sure that clients can use evidence based decision making supported by a variety of
fact based workforce planning tools

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 33
Case study – Sustainability Analytics
Sustainability analytics enforces company’s sustainability-related initiatives

Prioritization of
Product Categories

Challenge
Sustainability analytics can help companies reduce key resource use and at the same time making them less
vulnerable to price and supply volatility. Future risks and opportunities can be identified in areas such as
environmental and health impacts – both within the organization and across the extended supply chain. The
challenge lies in generating the most influential insights from relevant data. These insights are necessary to
develop sustainability related strategies and to improve overall (resource use) efficiency

Reduction Product Analysis

Approach
The approach is divided into three actions:
• Develop a normalized and comprehensive view of resource use to understand (and prioritize) the hot spots
• Conduct a comprehensive analysis of products/services lifecycles to quantify the risks/opportunities
• Align/develop a sustainability strategy using the results of the executed analyses

Supplier Ranking

• Prioritization of product categories: an identification of the top product categories and a prioritization of
categories with most improvement potential
Results

• Reduction product analysis: Development of an implementation strategy and value propositions for the
opportunities of the highest prioritized product groups (how to reduce costs, increase customer preference and
reduce risk)
• Supplier ranking: Ranking of suppliers based on sustainability performance to create individualized
“sustainability report cards” which can be integrated in category buying decision making

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 34
Case study – Working capital
“The Dash for Cash”: Using the Deloitte WCR Cashboard to drive sustainable
performance improvements in working capital

WCR Cashboard

Cash-Conversion-Cycle
As companies try to stay their course in the downturn and beyond, cash is back as king. Working capital is one

Challenge
of the few remaining areas which can rapidly deliver a significant amount of cash to a business without a large
restructuring program
The client asked Deloitte to help in the challenge to free up working capital. Reducing working capital in the
short term is fairly easy; making reductions sustainable and changing the mind-set in operations to that of a
Payables – Purchase to Pay CFO is more difficult

Approach
To enable sustainable reductions, Deloitte deploys a cash-oriented, entrepreneurial approach to working capital
Inventory – Forecast
management that focuses on concrete actions and creating a “cash flow mind-set" to shorten the cash
to Fulfill conversion cycle. The Cashboard™ is a flexible & configurable dashboard that is powerful but still exceptionally
easy to use. As such, it allows frontline operations staff at companies to zoom in on the key opportunities, risks,
trade-offs and root causes

Receivables – Order
to Cash
• It enables continuous monitoring of the working capital levels throughout the entire company – including all
Results

Business units and all geographies


• The interactive environment enables context driven analysis by time, customer, product, business line etc.
• Real time insight into current performance
• Easily adjustable and expandable to your company’s specific needs

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 35
Case study – Spend Analytics
Deloitte Spend & Procurement Analytics provides deep insight in the composition of
the volume of spend and identifies key savings opportunities

General overview

The client was struggling with identifying improvement opportunities because of inaccessible information. As a

Challenge
result, the client was unable to drill down and analyse individual orders and problem solving was limited to the
strategic level
The client asked Deloitte to help identify opportunities for continuous improvement for cost reduction and
provide additional insights into the spending trends of the organization
Price analytics

Approach
Our Spend & Procurement Analytics approach facilitates short time-to-deploy and delivers easy-to-use insight
and contains these key components:
Supplier view • Easy upload of procurement data through standard interfaces
• Engine to create a bottom up calculation of your company’s most important Spend KPIs
• Interactive dashboard enabling context driven analysis by time, supplier, product, business line

Geographical view
Through the Spend & Procurement Analytics Dashboard efficiency and savings opportunities can be identified in
Results

several areas:
• Improve process efficiency by identifying fragmented spend and invoicing
• Identify and expel maverick buying
• Negotiate better contracts
• Reduce costs by optimizing the purchase to pay process

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 36
Case study – Double Payments
Because paying once is enough!

Double invoice tracker


Who pay their invoices twice? Well, for one, all major organizations in 1% of the

Challenge
cases. They usually know this but have no means for pinpointing exactly which
invoices are paid twice
Many organizations check for invoices paid twice, but rarely detect them all. This can
be caused by inaccurate master data or errors due to invoice entries. The
organization asked Deloitte to help in detecting double payments in a better way

The Deloitte Double Invoice Tracker examines all individual invoices, over multiple

Approach
periods in full detail. The Invoice Tracker detects inaccuracies in the master data by
using specially designed algorithms.
By cleverly cross-referencing inaccuracies in the master data with those in the
invoice entries, the Double Invoice Tracker can find lost cash and insights into the
master data quality

The Deloitte Double invoice tracker saves money and helps improving master data

Results
quality, by giving:
• An overview of all the invoices paid twice, including supplier information, so the
restitution process can be started immediately
• Insights into the master data quality
• Insights into the aggregate purchasing expenditures and how these are divided

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 37
Solution – Business Process analytics
Deloitte’s process analytics solution Process X-ray reconstructs what really happened
in the process and provides the capabilities to find the root cause

End-to-end process view

Process variation is at least 100 times greater than clients imagine. In fact, 5,000 or more variations are

Challenge
common in most end-to-end processes. Such high levels of variability are a natural enemy of scalability,
efficiency, and process control

Process -ray
Process execution is facilitated by different departments and functions, making it difficult to get and end-to-end
view of the process

Throughput times
X TM
Process analytics provides visibility of what is really happening based on the actual event data captured in

Approach
transactional systems. This is far different from the subjective recollections or assertions of people
It provides end-to-end visibility of the process, tearing down the walls between functions and departments and
providing an internal benchmark
Process analytics offers the scalability to analyze large volumes of transaction data from different systems (SAP,
Oracle, JDEdwards, SalesForce, etc.)

Handovers

Organizations can benefit from iPL by:


Results

• Reducing operational cost


• E.g. identify and reduce rework activities
• Increasing control & compliance
• Monitor segregation of duties
• Reducing working capital

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 38
Solution – Program Portfolio Analytics
Deloitte’s iPL solution enables timely monitoring by disclosing project portfolio
performance anytime anywhere

Deloitte’s iPL

Portfolio management
Typical challenges that an organization faces relating to monitoring the portfolio performance:

Challenge
• Getting performance reports is very time consuming and therefore the frequency of delivering these reports is
usually low
• The reports created are static and therefore provide no possibilities to analyze into a detailed level and from
different perspectives

Finance
• Decision making is mostly based on one dimension only (e.g. time spent)

Approach
Deloitte’s iPL solution is aimed at fact based prioritization and tracking of project performance and enables
financial, resourcing, risk and issue analyses
Resource management iPL combines data from multiple sources and visualizes the results in an interactive analysis environment which
can be accessed online

Organizations can benefit from iPL by:


Results

Planning & Scheduling


• Prioritizing based on the progress made and effort utilized by projects
• Proactively managing potential underperforming projects
• Better predicting the cost at the completion of the project based current performance
• Resource gap analysis and earned value analysis (budget spent vs value delivered).

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 39
Project Approach

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 40
Our Analytic Insights project approach
Our comprehensive and flexible methodology for Analytics projects ensures we can deliver
business critical insights within time and budget

A typical Analytic Insights project takes 8-12 weeks following three main phases connected to our approach

2-3 weeks 4-6 weeks 2-3 weeks

Understand Analyze Insights

Assess Current Acquire & Prepare & Evaluate & Report &
Analyze & Model
Situation Understand Data Structure Data Interpret Implement

Approach
Our structured approach has been built up from our experience in analytical engagements. It comprises of 6 steps to maximize project oversight. Each step
allows looping back to previous steps to apply the insights gained in subsequent steps.

Critical success factors


To ensure maximal knowledge transfer in both ways we would need to work closely with key experts in client’s business and IT departments. Rapid access to potentially
disparate source data and support in understanding the data is essential in order to build up the data structures required for the analytical models.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 41
Why Deloitte?

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 42
Deloitte maintains a market-leading global Analytics practice with extensive
experience in FMCG
We understand what your challenges are as well as the current and future analytics
market, placing Deloitte in a unique position to assist you

Global Reach Recognized leader in Analytics


• With over 9,000 BI and • “Deloitte shows growth and innovation
analytics resources leadership through investment in
worldwide, we are recognized acquisitions (with 22 analytics-related
as one of the leading BI&A acquisitions since 2010), technology
service providers partnerships, alliances and intellectual
• Unique combination of property.”
deep industry expertise, • “Deloitte has a strong focus on
analytics capability and innovation, including Deloitte's Insight
understanding of decision- Driven Organization (IDO) Framework,
maker’s roles to maximize breakthrough labs to meet clients'
value demands, and Highly Immersive Visual
Environment (HIVE) labs, as well as a
breadth of analytics accelerators.”
Vendor independent • “All is available through its global network
• We recognize the importance of the right of 21 Global Delivery Centers and
technology, but we also understand the 25 Deloitte Greenhouses.”
necessity of finding pragmatic and efficient ways
Source: Gartner, Magic Quadrant for Business Analytics Services, Worldwide. September 2015
to iteratively build the required capabilities
• Our relationships with, and understanding of,
technology vendors is strong, covering an
impressive range of different products – but,
crucially, we remain vendor independent.
• We are focused solely on helping clients to
develop a practical Information and Analytics
strategy - incorporating the necessary
technologies and introducing the most
appropriate vendors.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 43
Deloitte’s areas of expertise in Analytics
We have build a wide area of expertise, covering all important streams within the field of
Analytics & Information Management

Big Data Management Data Discovery & Visualization Advanced Analytics

Visualization is vital to Understanding the data is key


A clear vision is imperative for understand what can be done to advance to the next level,
success, policies, practices and with data, storytelling, nowadays more advanced
procedures that properly manage advanced visualization & methodologies allow for an
the collection, quality and dashboarding are useful tools to even deeper understanding than
standardization determine what is happening ever before

Big Data Management has to Information has to be


be future-proof and secure, displayed correctly, clearly
that connects with more and and without distraction in a Methodologies such as Text
more different data manner that can be quickly Mining, Segmentation or
sources; structured, examined and understood. Predictive Analytics go
unstructured, internal and
beyond traditional
external
understanding

Big Data is all about processing This allows for actionable


It allows the user to view both
huge amount of data using insights that have a direct effect
simple and complex data at a
commodity hardware that offer on the business and can help the
glance and see abnormalities,
better information, more insights user to understand what is
dependencies and trends that
and more opportunities for a happening and optimize their
would not have been apparent in
growing business business
tables

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 44
Deloitte’s approach towards becoming an Insight Driven Organization (IDO)
Considering analytics with a wider lens than just technology

IDO Framework Planning the Journey


• The application of analytics and The journey towards
its importance will increase in becoming an IDO is about
the coming years. The World is Evolution, not Revolution.
increasingly complex and fast The key principle is
moving which makes getting it sequencing the activities to
right increasingly difficult. deliver the Vision and early
benefits, recognizing existing
capability constraints.

• An IDO asks the right


Becoming an IDO relies on a
questions of itself, they are
foundation of the
more analytical, which
fundamental building blocks
improves the decision making
of People, Process, Data
process and the identification
and Technology, informed
of the most appropriate action.
by an Analytics Strategy.

• As an IDO, you could:


Making it Happen
• Make the same decisions Insight Process Enabling Platform
Outlook Non-IDO IDO faster
Purple People
Past What has happened? Why and how did it • Make the same decisions
happen? cheaper
Present What is currently What is the next best • Make better decisions
happening? action?
• Make innovations in products
Future What is going to What does simulation and services
happen? tell us; the options;
the pros and cons?

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 45
Deloitte Greenhouse
Deloitte Greenhouse offers different types of immersive analytics sessions

Analytics Lab

The Analytics Lab, hosted in Deloitte’s innovative Greenhouse environment, is an inspiring and energetic workshop to uncover the impact of data analytics and visualization for your
organization. Participants are provided with a unique opportunity to experience hands-on analytics in a fun and innovative setting, facilitated by Deloitte’s industry specialists and
subject matter experts.

Art of the Possible Visioning


An inspiring two-hour session including analytics and data A collaborative session to wireframe a custom analytics or
visualization demos, used as a starting point for an open visualization solution, supporting a selected business
discussion on the potential impact of analytics for your challenge. The session is facilitated by Deloitte’s user
organization. experience, data visualization and analytics experts.

42
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 46
Privacy by Design
Incorporate privacy (and security) in the design process of the data analytics application

Privacy by Design Reasons

The Privacy by Design (PbD) concept is to design privacy measures • Effective way to make sure compliance is reached already in the
directly into IT systems, business practices and networked infrastructure, design phase (and maintained)
providing a “middle way” by which organizations can balance the need to
innovate and maintain competitive advantage with the need to • Efficient: accommodating privacy enhancing measures is cost
preserve privacy. effective in the early stages of design

It is no flash-in-the-pan theory: PbD has been endorsed by many public- • Time available to do adjustments / look for alternatives
and private sector authorities in the European Union, North America, and
elsewhere. These include the European Commission, European
Parliament and the Article 29 Working Party, the U.S. White House,
Federal Trade Commission and Department of Homeland Security, among
other public bodies around the world who have passed new privacy laws.
Additionally, international privacy and data protection authorities
unanimously endorsed Privacy by Design as an international standard
for privacy.

Adopting PbD is a powerful and effective way to embed privacy into the
DNA of an organization. It establishes a solid foundation for data
analytics activities that support innovation without compromising
personal information.

Deloitte took the basic principles of PbD and built them out into a full
method that can be used to apply privacy to almost any design –
whether it is IT-systems, applications or products, the latter specifically
significant now that the Internet-of-Things is coming upon us.

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 47
Contacts

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 48
Contact our Analytics Experts

Patrick Schunck
Partner – Lead Consumer Products Deloitte NL
pschunck@deloitte.nl

+31 882881671

Stefan van Duin


Director Analytics & Information Management
svanduin@deloitte.nl

+31 882884754

Frank Korf
Senior Manager Advanced Analytics
fkorf@deloitte.nl

+31 882885911

© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 49
Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”),
its network of member firms, and their related entities. DTTL and each of its member firms are legally separate and
independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients. Please see
www.deloitte.nl/about for a more detailed description of DTTL and its member firms.

Deloitte provides audit, tax, consulting, and financial advisory services to public and private clients spanning multiple
industries. With a globally connected network of member firms in more than 150 countries and territories, Deloitte brings
world-class capabilities and high-quality service to clients, delivering the insights they need to address their most complex
business challenges. Deloitte’s more than 200,000 professionals are committed to becoming the standard of excellence.

This communication contains general information only, and none of Deloitte Touche Tohmatsu Limited, its member firms,
or their related entities (collectively, the “Deloitte network”) is, by means of this communication, rendering professional
advice or services. No entity in the Deloitte network shall be responsible for any loss whatsoever sustained by any person
who relies on this communication.

© 2017 Deloitte The Netherlands


141

Chapter VIII
A Review of Methodologies for
Analyzing Websites
Danielle Booth
Pennsylvania State University, USA

Bernard J. Jansen
Pennsylvania State University, USA

Abstract

This chapter is an overview of the process of Web analytics for Websites. It outlines how basic visitor
information such as number of visitors and visit duration can be collected using log files and page tagging.
This basic information is then combined to create meaningful key performance indicators that are tailored
not only to the business goals of the company running the Website, but also to the goals and content of
the Website. Finally, this chapter presents several analytic tools and explains how to choose the right
tool for the needs of the Website. The ultimate goal of this chapter is to provide methods for increasing
revenue and customer satisfaction through careful analysis of visitor interaction with a Website.

INTRODUCTION several metrics against each other to define visi-


tor trends. KPIs use these dynamic numbers to
Web analytics is the measure of visitor behavior get an in-depth picture at visitor behavior on a
on a Website. However, what kind of information site. This information allows businesses to align
is available from Website visitors, and what can their Websites’ goals with their business goals
be learned from studying such information? By for the purpose of identifying areas of improve-
collecting various Web analytics metrics, such ment, promoting popular parts of the site, testing
as number of visits and visitors and visit dura- new site functionality, and ultimately increasing
tion, one can develop key performance indicators revenue. This chapter covers the most common
(KPIs) – a versatile analytic model that measures metrics, different methods for gathering metrics,

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Review of Methodologies for Analyzing Websites

how to utilize key performance indicators, best lected, they can be used to analyze Web traffic and
key practices, and choosing the right Web ana- improve a Website to better meet its traffic. Ac-
lytics tool. cording to Panalysis (http://www.panalysis.com/),
The first section addresses metrics, informa- an Australian Web analytics company, these met-
tion that can be collected from visitors on a Web- rics generally fall into one of four categories: site
site. It covers types of metrics based on what kind usage, referrers (or how visitors arrived at your
of data is collected as well as specific metrics and site), site content analysis, and quality assurance.
how they can be utilized. The following section Table 1 shows examples of types of metrics that
discusses the two main methods for gathering visi- might be found in these categories.
tor information -- log files and page tagging. For Although the type and overall number of met-
each method, this section covers the advantages rics varies with different analytics vendors, there
and disadvantages, types of supported informa- is still a common set of basic metrics common to
tion, and examples for data format. Following this most. Table 2 outlines eight widespread types of
is a section on how to choose the key performance information that measure who is visiting a Website
indicators (KPIs). This includes outlining several and what they do during their visits, relating each
business strategies for integrating Web analytics of these metrics to specific categories.
with the rest of an organization as well as identify- Each metric is discussed below.
ing the type of Website and listing several specific
KPIs for each site type. The following section Visitor Type
provides the overall process and advice for Web
analytics integration, and the final section deals Since analyzing Website traffic first became
with what to look for when choosing analytics popular in the 1990s with the Website counter,
tools as well as a comparison of several specific the measure of Website traffic has been one of
tools. Finally, the conclusion discusses the future the most closely watched metrics. This metric,
of Web analytics. however, has evolved from merely counting the
number of hits a page receives into counting the
number of individuals who visit the Website.
METRICS There are two types of visitors: those who have
been to the site before, and those who have not.
In order to understand the benefits of Website This difference is defined in terms of repeat and
analysis, one must first understand metrics – the new visitors. In order to track visitors in such a
different kinds of available user information. way, a system must be able to determine individual
Although the metrics may seem basic, once col- users who access a Website; each individual visitor

Table 1. Metrics Categories (Jacka, n.d.)

Site Usage Referrers Site Content Analysis Quality Assurance

• Numbers of visitors and • Which websites are sending • Top entry pages • Broken pages or server
sessions visitors to your site • Most popular pages errors
• How many people • The search terms people used • Top pages for single page view • Visitor response to errors
repeatedly visit the site to find your site sessions
• Geographic information • How many people place • Top exit pages
• Search Engine Activity bookmarks to the site • Top paths through the site
• Effectiveness of key content

142
A Review of Methodologies for Analyzing Websites

is called a unique visitor. Ideally, a unique visitor time a visitor spends on a site during one session.
is just one visitor, but this is not always the case. One possible area of confusion when using this
It is possible that multiple users access the site metric is handling missing data. This can be caused
from the same computer (perhaps on a shared either by an error in data collection or by a ses-
household computer or a public library). In ad- sion containing only one page visit or interaction.
dition, most analytic software relies on cookies Since the visit length is calculated by subtracting
to track unique users. If a user disables cookies the time of the visitor’s first activity on the site
in their browser or if they clear their cache, the from the time of the visitor’s final activity, what
visitor will be counted as new each time he or happens to the measurement when one of those
she enters the site. pieces of data is missing? According to the Web
Because of this, some companies have instead Analytics Association, the visit length in such
begun to track unique visits, or sessions. A session cases is zero (Burby & Brown, 2007).
begins once a user enters the site and ends when a When analyzing the visit length, the measure-
user exits the site or after a set amount of time of ments are often broken down into chunks of time.
inactivity (usually 30 minutes). The session data StatCounter, for example, uses the following time
does not rely on cookies and can be measured categories:
easily. Since there is less uncertainty with visits,
it is considered to be a more concrete and reli- • Less than 5 seconds
able metric than unique visitors. This approach • 5 seconds to 30 seconds
is also more sales-oriented because it considers • 30 seconds to 5 minutes
each visit an opportunity to convert a visitor into • 5 minutes to 20 minutes
a customer instead of looking at overall customer • 20 minutes to 1 hour
behavior (Belkin, 2006). • Greater than 1 hour (Jackson, 2007)

Visit Length The goal of measuring the data in this way


is to keep the percentage of visitors who stay on
Also referred to as Visit Duration or Average Time the Website for less than five seconds as low as
on Site (ATOS), visit length is the total amount of possible. If visitors stay on a Website for such a

Table 2. Eight common metrics of website analysis

Metric Description Category


Visitor Type Who is accessing the Website (returning, unique, etc.) Site Usage
Visit Length The total amount of time a visitor spends on the Website Site Usage
Demographics and System The physical location and information of the system used to Site Usage
Statistics access the Website
Internal Search Information Information on keywords and results pages viewed using a Site Usage
search engine embedded in the Website
Visitor Path The route a visitor uses to navigate through the Website Site Content Analysis
Top Pages The pages that receive the most traffic Site Content Analysis

Referrering URL and Keyword Which sites have directed traffic to the Website and which Referrers
Analysis keywords visitors are using to find the Website
Errors Any errors that occurred while attempting to retrieve the page Quality Assurance

143
A Review of Methodologies for Analyzing Websites

short amount of time it usually means they either • Identify products that are offered, but which
arrived at the site by accident or the site did not customers have a hard time finding.
have relevant information. By combining this • Identify customer trends.
information with information from referrers and • Improve personalized messages by using
keyword analysis, one can tell which sites are the customers' own words.
referring well-targeted traffic and which sites are • Identify emerging customer service issues
referring poor quality traffic. • Determine if customers are provided with
enough information to reach their goals.
Demographics and System Statistics • Make personalized offers. (Aldrich, 2006)

The demographic metric refers to the physical By analyzing internal search data, one can
location of the system used to make a page request. use the information to improve and personalize
This information can be useful for a Website that the visitors’ experience.
provides region-specific services. For example,
if an e-commerce site can only ship its goods to Visitor Path
people in Spain, any traffic to the site from outside
of Spain is irrelevant. In addition, region-specific A visitor path is the route a visitor uses to navigate
Websites also want to make sure they tailor their through a Website. Excluding visitors who leave
content to the group they are targeting. Demo- the site as soon as they enter, each visitor creates
graphic information can also be combined with a path of page views and actions while perusing
information on referrers to determine if a referral the site. By studying these paths, one can identify
site is directing traffic to a site from outside a any difficulties a user has viewing a specific area
company’s regions of service. of the site or completing a certain action (such as
System statistics are information about the making a transaction or completing a form).
hardware and software with which visitors access According to an article by the Web Analyt-
a Website. This can include information such as ics Association, there are two schools of thought
browser type, screen resolution, and operating regarding visitor path analysis. The first is that
system. It is important that a Website be acces- visitor actions are goal-driven and performed in
sible to all of its customers, and by using this a logical, linear fashion. For example, if a visitor
information, the Website can be tailored to meet wants to purchase an item, the visitor will first
visitors’ technical needs. find the item, add it to the cart, and proceed to the
checkout to complete the process. Any break in
Internal Search that path (i.e. not completing the order) signifies
user confusion and is viewed as a problem.
If a Website includes a site-specific search utility, The second school of thought is that visitor
then it is also possible to measure internal search actions are random and illogical and that the only
information. This can include not only keywords path that can provide accurate data on a visitor’s
but also information about which results pages behavior is the path from one page to the page
visitors found useful. The Patricia Seybold Group immediately following it. In other words, the only
(http://www.psgroup.com/) identifies the follow- page that influences visitor behavior on a Website
ing seven uses for internal search data: is the one they are currently viewing. For example,
visitors on a news site may merely peruse the ar-
• Identify products and services for which ticles with no particular goal in mind. This method
customers are looking, but that are not yet of analysis is becoming increasingly popular
provided by the company.

144
A Review of Methodologies for Analyzing Websites

because companies find it easier to examine path either good or bad depending on the content of
data in context without having to reference the the referring page.
entire site in order to study the visitors’ behavior In the same way, keyword analysis deals
(Web Analytics Association, n. d.). specifically with referring search engines and
shows which keywords have brought in the most
Top Pages traffic. By analyzing the keywords visitors use
to find a page, one is able to determine what
Panalysis mentions three types of top pages: top visitors expect to gain from the Website and use
entry pages, top exit pages, and most popular that information to better tailor the Website to
pages. Top entry pages are important because the their needs. It is also important to consider the
first page a visitor views makes the greatest im- quality of keywords. Keyword quality is directly
pression about a Website. By knowing the top entry proportional to revenue and can be determined by
page, one can make sure that page has relevant comparing keywords with visitor path and visit
information and provides adequate navigation to length (Marshall, n. d.). Good keywords will bring
important parts of the site. Similarly, identifying quality traffic and more income to your site.
popular exit pages makes it easier to pinpoint
areas of confusion or missing content. Errors
The most popular pages are the areas of a web-
site that receive the most traffic. This metric gives Errors are the final metric. Tracking errors has
insight into how visitors are utilizing the Website, the obvious benefit of being able to identify and
and which pages are providing the most useful fix any errors in the Website, but it is also useful
information. This is important because it shows to observe how visitors react to these errors. The
whether the Website’s functionality matches up fewer visitors who are confused by errors on a
with its business goals; if most of the Website’s Website, the less likely visitors are to exit the site
traffic is being directed away from the main pages because of an error.
of the site, the Website cannot function to its full
potential (Jacka, n. d.).
GATHERING INFORMATION
Referrers and Keyword Analysis
How does one gather these metrics? There are
A referral page is the page a user visits immedi- two major methods for collecting data for Web
ately before entering to a Website, or rather, a site analysis: log files and page tagging. Most current
that has directed traffic to the Website. A search Web analytic companies use a combination of
engine result page link, a blog entry mention- the two methods for collecting data. Therefore,
ing the Website, and a personal bookmark are it is important to understand the strengths and
examples of referrers. This metric is important weaknesses of each.
because it can be used to determine advertising
effectiveness and search engine popularity. As Log Files
always, it is important to look at this information
in context. If a certain referrer is doing worse than The first method of metric gathering uses log files.
expected, it could be caused by the referring link Every Web server keeps a log of page requests
text or placement. Conversely, an unexpected that can include (but is not limited to) visitor IP
spike in referrals from a certain page could be address, date and time of the request, request
page, referrer, and information on the visitor’s

145
A Review of Methodologies for Analyzing Websites

Web browser and operating system. The same from the first three in that it aims to provide for
basic collected information can be displayed in better control and manipulation of data while still
a variety of ways. Although the format of the log producing a log file readable by most Web analytics
file is ultimately the decision of the company who tools. The extended format contains user defined
runs the Web server, the following four formats fields and identifiers followed by the actual en-
are a few of the most popular: tries, and default values are represented by a dash
“-“ (Hallam-Baker & Behlendorf, 1999). Table 4
• NCSA Common Log shows an example of an extended log file.
• NCSA Combined Log There are several benefits of using system log
• NCSA Separate Log files to gather data for analysis. The first is that
• W3C Extended Log it does not require any changes to the Website
or any extra software installation to create the
The NCSA Common Log format (also known log files. Web servers automatically create these
as Access Log format) contains only basic infor- logs and store them on a company’s own servers
mation on the page request. This includes the cli- giving the company freedom to change their Web
ent IP address, client identifier, visitor username, analytics tools and strategies at will. This method
date and time, HTTP request, status code for also does not require any extra bandwidth when
the request, and the number of bytes transferred loading a page, and since everything is recorded
during the request. The Combined Log format server-side, it is possible to log both page request
contains the same information as the common successes and failures.
log with the following three additional fields: Using log files also has some disadvantages.
the referring URL, the visitor’s Web browser and One major disadvantage is that the collected
operating system information, and the cookie. The data is limited to only transactions with the Web
Separate Log format (or 3-Log format) contains server. This means that they cannot log informa-
the same information as the combined log, but it tion independent from the servers such as the
breaks it into three separate files – the access log, physical location of the visitor. Similarly, while
the referral log, and the agent log. The date and it is possible to log cookies, the server must be
time fields in each of the three logs are the same. specifically configured to assign cookies to visi-
Table 3 shows examples of the common, combined, tors in order to do so. The final disadvantage is
and separate log file formats (notice that default that while it is useful to have all the information
values are represented by a dash “-“): stored on a company’s own servers, the log file
Similarly, W3C provides an outline for stan- method is only available to those who own their
dard formatting procedures. This format differs Web servers.

Table 3. NCSA Log comparison (IBM, 2004)

NCSA Common Log 125.125.125.125 - dsmith [10/Oct/1999:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
NCSA Combined Log 125.125.125.125 - dsmith [10/Oct/1999:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
“http://www.ibm.com/” “Mozilla/4.05 [en] (WinNT; I)” “USERID=CustomerA;IMPID=01234”
NCSA Separate Log Common Log:
125.125.125.125 - dsmith [10/Oct/1999:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
Referral Log:
[10/Oct/1999:21:15:05 +0500] “http://www.ibm.com/index.html”
Agent Log:
[10/Oct/1999:21:15:05 +0500] “Microsoft Internet Explorer - 5.0”

146
A Review of Methodologies for Analyzing Websites

Page Tagging use more bandwidth each time a page loads, and
it also makes it harder to change analytics tools
The second method for recording visitor activity because the code embedded in the Website would
is page tagging. Page tagging uses an invisible have to be changed or deleted entirely. The final
image to detect when a page has been success- disadvantage is that page tagging is only capable
fully loaded and then uses JavaScript to send of recording page loads, not page failures. If a
information about the page and the visitor back page fails to load, it means that the tagging code
to a remote server. According to Web Analytics also did not load, and there is therefore no way
Demystified the variables used and amount of data to retrieve information in that instance.
collected in page tagging are dependent on the Web Although log files and page tagging are two
analytics vendor. Some vendors stress short, easy distinct ways to collect information about the
to use page tags while others emphasize specific visitors to a Website, it is possible to use both
tags that require little post-processing. The best together, and many analytics companies provide
thing to look for with this method, however, is ways to use both methods to gather data. Even
flexibility – being able to use all, part, or none so, it is important to understand the strengths and
of the tag depending on the needs of the page weaknesses of both. Table 5 shows the advantages
(Peterson, 2004). and disadvantages of log file analysis and page
There are several benefits to using this method tagging.
of gathering visitor data. The first is speed of
reporting. Unlike a log file, the data received via The Problems with Data
page tagging is parsed as it comes in. This allows
for near real-time reporting. Another benefit is One of the most prevalent problems in Web ana-
flexibility of data collection. More specifically, it lytics is the difficulty identifying unique users.
is easier to record additional information about In order to determine repeat visitors, most Web
the visitor that does not involve a request to the analytic tools employ cookies that store unique
Web server. Examples of such information include identification information on the visitor’s personal
information about a visitor’s screen size, the price computer. Because of problems with users deleting
of purchased goods, and interactions within Flash or disabling cookies, however, some companies
animations. This is also a useful method of gather- have moved towards using Macromedia Flash
ing data for companies that do not run their own Local Shared Objects (LSOs). LSOs act like a
Web servers or do not have access to the raw log cookie, but standard browsers lack the tools re-
files for their site (such as blogs). quired to delete them, anti-spyware software does
There are also some disadvantages of page tag- not delete them because it does not see them as a
ging, most of which are centered on the extra code threat, and most users do not know how to delete
that must be added to the Website. This causes it to them manually. Awareness is growing, however,

Table 4. W3C Extended Log File (Microsoft, 2005)


W3C Extended Log #Software: Microsoft Internet Information Services 6.0
#Version: 1.0
#Date: 2002-05-24 20:18:01
#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-
bytes time-taken cs(User-Agent) cs(Referrer)
2002-05-24 20:18:01 172.224.24.114 - 206.73.118.24 80 GET /Default.htm - 200 7930 248 31 Mozilla/
4.0+(compatible;+MSIE+5.01;+Windows+2000+Server) http://64.224.24.114/

147
A Review of Methodologies for Analyzing Websites

Table 5. Log Files vs. Page Tagging


Log Files Page Tagging
Advantages Disadvantages Advantages Disadvantages
Does not require changes to Can only record interactions Near real-time reporting Requires extra code added to the
the Website or extra hardware with the Web server Website
installation
Does not require extra Server must be configured to Easier to record additional Uses extra bandwidth each time
bandwidth assign cookies to visitors information the page loads
Freedom to change tools with a Only available to companies Able to capture visitor Can only record successful page
relatively small amount of hassle who run their own Web servers interactions within Flash loads, not failures
animations
Logs both page request Cannot log physical location Hard to switch analytic tools
successes and failures

and Firefox and Macromedia are working against In order for a Website to be beneficial, information
LSOs and providing users with tools to delete gathered from its visitors must not merely show
them (Permadi, 2005). what has happened in the past, but it must also
Sen, Dacin, and Pattichis (2006) cite various be able to improve the site for future visitors. The
other problems with log data from Websites includ- company must have clearly defined goals for the
ing large data size and messy data. Problems with future and use this information to support strate-
large data size are caused by massive amounts of gies that will help it achieve those goals.
traffic to a Website and also the amount of informa- For a Website, the first step in achieving this
tion stored in each record. Records with missing is making sure the data collected from the site
IP addresses and changes to Website content cause is actionable. According to the Web Analytics
messy data. Even though the data may be hard to Association (McFadden, 2005), in order for a
work with at first, once it is cleaned up, it provides company to collect actionable data, it must meet
an excellent tool for Web analytics. these three criteria: “(1) the business goals must
be clear, (2) technology, analytics, and the busi-
ness must be aligned, and (3) the feedback loop
CHOOSING KEY PERFORMANCE must be complete” (Web Channel Performance
INDICATORS Management section, para. 3).
There are many possible methods for meet-
In order to get the most out of Web analytics, ing these criteria. One is Alignment-Centric
one must know how to choose effectively which Performance Management (Becher, 2005). This
metrics to analyze and combine them in mean- approach goes beyond merely reviewing past
ingful ways. This means knowing the Website’s customer trends to carefully selecting a few key
business goals and then determining which KPIs KPIs based on their future business objectives.
will provide the most insight. Even though a wealth of metrics is available from
a Website, this does not mean that all metrics
Knowing Your Business Goals are relevant to a company’s needs. Reporting
large quantities of data is overwhelming, so it is
Every company has specific business goals. Every important to look at metrics in context and use
part of the company works together to achieve them to create KPIs that focus on outcome and not
them, and the company Website is no exception. activity. For example, a customer service Website

148
A Review of Methodologies for Analyzing Websites

might view the number of emails responded to on the next step is choosing relevant KPIs that are
the same day they were sent as a measurement of aligned with the company’s business strategy
customer satisfaction. A better way to measure and then analyzing expected versus actual results
customer satisfaction, however, might be to survey (Sapir 2004).
the customers on their experience. Although this In order to choose the best KPIs and measure
measurement is subjective, it is a better repre- the Website’s performance against the goals of a
sentation of customer satisfaction because even business, there must be effective communication
if a customer receives a response the same day between senior executives and online managers.
they send out an email, it does not mean that the The two groups should work together to define the
experience was a good one (Becher, 2005). relevant performance metrics, the overall goals for
Choosing the most beneficial KPIs using this the Website, and the performance measurements.
method is achieved by following “The Four M’s This method is similar to Alignment-Centric
of Operational Management” as outlined by Performance Management in that it aims to aid
Becher (2005) which facilitate effective selec- integration of the Website with the company’s
tion of KPIs: business objectives by involving major stakehold-
ers. The ultimate goals of OBPM are increased
• Motivate: Ensure that goals are relevant to confidence, organizational accountability, and
everyone involved. efficiency (Sapir 2004).
• Manage: Encourage collaboration and
involvement for achieving these goals. Identifying KPIs Based on Website
• Monitor: Once selected, track the KPIs and Type
quickly deal with any problems that may
arise. Unlike metrics, which are numerical represen-
• Measure: Identify the root causes of prob- tations of data collected from a Website, KPIs
lems and test any assumptions associated are tied to a business strategy and are usually
with the strategy. measured by a ratio of two metrics. By choosing
KPIs based on the Website type, a business can
By carefully choosing a few, quality KPIs to save both time and money. Although Websites
monitor and making sure everyone is involved can have more than one function, each site be-
with the strategy, it becomes easier to align a longs to at least one of the four main categories
Website’s goals with the company’s goals because – commerce, lead generation, content/media, and
the information is targeted and stakeholders are support/self service (McFadden, 2005). Table 6
actively participating. shows common KPIs for each Website type:
Another method for ensuring actionable data We discuss each Website type and related
is Online Business Performance Management KPIs below.
(OBPM) (Sapir, 2004). This approach integrates
business tools with Web analytics to help com- Commerce
panies make better decisions quickly in an ever-
changing online environment where customer data The goal of a commerce Website is to get visi-
is stored in a variety of different departments. The tors to purchase goods or services directly from
first step in this strategy is gathering all customer the site, with success gauged by the amount of
data in a central location and condensing it so revenue the site brings in. According to Peter-
that the result is all actionable data stored in the son, “commerce analysis tools should provide
same place. Once this information is in place, the ‘who, what, when, where, and how’ for your

149
A Review of Methodologies for Analyzing Websites

Table 6. The four types of Websites and examples of associated KPIs (McFadden, 2005)
Website Type KPIs
Commerce • Conversion rates
• Average order value
• Average visit value
• Customer loyalty
• Bounce rate
Lead Generation • Conversion rates
• Cost per lead
• Bounce rate
• Traffic concentration
Content/Media • Visit depth
• Returning visitor ratio
• New visitor ratio
• Page depth
Support/Self service • Page depth
• Bounce rate
• Customer satisfaction
• Top internal search phrases

online purchasers (2004, p. 92).” In essence, the which to base your conversion rate. For example,
important information for a commerce Website a business may want to filter visitors by exclud-
is who made (or failed to make) a purchase, what ing visits from robots and Web crawlers (Ansari,
was purchased, when purchases were made, where Kohavi, Mason, & Zheng, 2001), or they may
customers are coming from, and how customers want to exclude the traffic that “bounces” from
are making their purchases. The most valuable the Website or (a slightly trickier measurement)
KPIs used to answer these questions are conver- the traffic that is determined not to have intent to
sion rates, average order value, average visit value, purchase anything from the Website (Kaushik,
customer loyalty, and bounce rate (McFadden, 2006).
2005). Other metrics to consider with a commerce It is common for commerce Websites to have
site are which products, categories, and brands are conversion rates around 0.5%, but generally good
sold on the site and internal site product search conversion rates will fall in the 2% range depend-
that could signal navigation confusion or a new ing on how a business structures its conversion
product niche (Peterson, 2004). rate (FoundPages, 2007). Again, the ultimate goal
A conversion rate is the number of users who is to increase total revenue. According to eVision,
perform a specified action divided by the total for each dollar a company spends on improv-
of a certain type of visitor (i.e. repeat visitors, ing this KPI, there is $10 to $100 return (2007).
unique visitors, etc.) over a given period. Types The methods a business uses to improve their
of conversion rates will vary by the needs of the conversion rate (or rates), however, are different
businesses using them, but two common conver- depending on which target action that business
sion rates for commerce Websites are the order chooses to measure.
conversion rate (the percent of total visitors who Average order value (AOV) is a ratio of total
place an order on a Website) and the checkout order revenue to number of orders over a given
conversion rate (the percent of total visitors who period. This number is important because it
begin the checkout process). There are also many allows the analyzer to derive a cost for each
methods for choosing the group of visitors on transaction. There are several ways for a business

150
A Review of Methodologies for Analyzing Websites

to use this KPI to its advantage. One way is to using visit frequency and transactions, but there
break down the AOV by advertising campaigns are several important factors in this measurement
(i.e. email, keyword, banner ad etc.). This way, including the time between visits (Mason, 2007).
a business can see which campaigns are bring- Customer loyalty can even be measured simply
ing in the best customers and spend more effort with customer satisfaction surveys (SearchCRM,
refining their strategies in those areas (Peterson, 2007). Loyal customers will not only increase
2005). Overall, however, if the cost of making a revenue through purchases but also through
transaction is greater than the amount of money referrals, potentially limiting advertising costs
customers spend for each transaction, the site is (QuestionPro).
not fulfilling its goal. There are two main ways Bounce rate is a measurement of how many
to correct this. The first is to increase the number people arrive at a homepage and leave imme-
of products customers order per transaction, and diately. There are two scenarios that generally
the second is to increase the overall cost of pur- qualify as a bounce. In the first scenario, a visitor
chased products. A good technique for achieving views only one page on the Website. In the second
this is through product promotions (McFadden, scenario, a visitor navigates to a Website but only
2005), but many factors influence how and why stays on the site for five seconds or less (Avinash,
customers purchase what they do on a Website. 2007). This could be due to several factors, but in
These factors are diverse and can range from general, visitors who bounce from a Website are
displaying a certain security image on the site not interested in the content. Like average order
(MarketingSherpa, 2007) to updating the site’s value, this KPI helps show how much quality
internal search (Young, 2007). Like many KPIs, traffic a Website receives. A high bounce rate
improvement ultimately comes from ongoing may be a reflection of unintuitive site design or
research and a small amount of trial and error. misdirected advertising.
Another KPI, average visit value, measures
the total number of visits to the total revenue. Lead Generation
This is a measurement of quality traffic important
to businesses. It is problematic for a commerce The goal for a lead generation Website is to obtain
site when, even though it may have many visi- user contact information in order to inform them
tors, each visit generates only a small amount of of a company’s new products and developments
revenue. In that case, even if the total number and to gather data for market research; these sites
of visits increased, it would not have a profound primarily focus on products or services that cannot
impact on overall profits. This KPI is also useful be purchased directly online. Examples of lead
for evaluating the effectiveness of promotional generation include requesting more information
campaigns. If the average visit value decreases by mail or email, applying online, signing up for
after a specific campaign, it is likely that the a newsletter, registering to download product
advertisement is not attracting quality traffic to information, and gathering referrals for a partner
the site. Another less common factor in this situ- site (Burby, 2004). The most important KPIs for
ation could be broken links or a confusing layout lead generation sites are conversion rates, cost
in a site’s “shopping cart” area. A business can per lead, bounce rate, and traffic concentration
improve the average visit value by using targeted (McFadden, 2005).
advertising and employing a layout that reduces Similar to commerce Website KPIs, a conver-
customer confusion. sion rate is the ratio of total visitors to the amount
Customer loyalty is the ratio of new to existing of visitors who perform a specific action. In the
customers. Many Web analytics tools measure this case of lead generation Websites, the most com-

151
A Review of Methodologies for Analyzing Websites

mon conversion rate is the ratio of total visitors bounce rate is to increase advertising effective-
to leads generated. The same visitor filtering ness and decrease visitor confusion.
techniques mentioned in the previous section can The final KPI is traffic concentration, or the
be applied to this measurement (i.e. filtering out ratio of the number of visitors to a certain area in
robots and Web crawlers and excluding traffic that a Website to total visitors. This KPI shows which
bounces from the site). This KPI is an essential areas of a site have the most visitor interest. For
tool in analyzing marketing strategies. Average this type of Website, it is ideal to have a high
lead generation sites have conversion rates rang- traffic concentration on the page or pages where
ing from 5-6% and 17-19% conversion rates for users enter their contact information.
exceptionally good sites (Greenfield, 2006). If
the conversion rate of a site increases after the CContent/Media
implementation of a new marketing strategy, it
indicates that the campaign was successful. If it Content/media Websites focus mainly on advertis-
decreases, it indicates that the campaign was not ing, and the main goal of these sites is to increase
effective and might need to be reworked. revenue by keeping visitors on the Website longer
Cost per lead (CPL) refers to the ratio of total and also to keep visitors coming back to the site.
expenses to total number of leads, or how much In order for these types of sites to succeed, site
it costs a company to generate a lead; a more content must be engaging and frequently updated.
targeted measurement of this KPI would be the If content is only part of a company’s Website, the
ratio of total marketing expenses to total number content used in conjunction with other types of
of leads. Like the conversion rate, CPL helps a pages can be used to draw in visitors and provide
business gain insight into the effectiveness of its a way to immerse them with the site. The main
marketing campaigns. A good way to measure the KPIs are visit depth, returning visitors, new visitor
success of this KPI is to make sure that the CPL percentage, and page depth (McFadden, 2005).
for a specific marketing campaign is less than the Visit depth (also referred to as depth of visit
overall CPL (WebSideStory, 2004). Ideally, the or path length) is the measurement of the ratio
CPL should be low, and well-targeted advertising between page views and unique visitors, or how
is usually the best way to achieve this. many pages a visitor accesses each visit. As a
Lead generation bounce rate is the same mea- general rule, visitors with a higher visit depth are
surement as the bounce rate for commerce sites. interacting more with the Website. If visitors are
This KPI is a measurement of visitor retention only viewing a few pages per visit, it means that
based off total number of bounces to total number they are not engaged, and the effectiveness of the
of visitors; a bounce is a visit characterized by a site is low. A way to increase a low average visit
visitor entering the site and immediately leaving. depth is by creating more targeted content that
Lead generation sites differ from commerce sites would be more interesting to the Website’s target
in that they may not require the same level of audience. Another strategy could be increasing
user interaction. For example, a lead generation the site’s interactivity to encourage the users to
site could have a single page where users enter become more involved with the site and keep
their contact information. Even though they only them coming back.
view one page, the visit is still successful if the Unlike the metric of simply counting the
Website is able to collect the user’s information. number of returning visitors on a site, the re-
In these situations, it is best to base the bounce turning visitor KPI is the ratio of unique visitors
rate solely off of time spent on the site. As with to total visits. A factor in customer loyalty, this
commerce sites, the best way to decrease a site’s KPI measures the effectiveness of a Website to

152
A Review of Methodologies for Analyzing Websites

bring visitors back. A lower ratio for this KPI is be a news page. Information on a news page is
best because a lower number means more repeat constantly updated so that, while the page is still
visitors and more visitors who are interested in always in the same location, the content of that
and trust the content of the Website. If this KPI page is constantly changing. If a Website has
is too low, however, it might signal problems in high page depth in a relatively unimportant part
other areas such as a high bounce rate or even of the site, it may signal visitor confusion with
click fraud. Click fraud occurs when a person navigation in the site or an incorrectly targeted
or script is used to generate visits to a Website advertising campaign.
without having genuine interest in the site. Ac-
cording to a study by Blizzard Internet Marketing, Support/Self Service
the average for returning visitors to a Website is
23.7% (White, 2006). As with many of the other Websites offering support or self-service are in-
KPIs for content/media Websites, the best way terested in helping users find specialized answers
to improve the returning visitor rate is by having for specific problems. The goals for this type of
quality content and encouraging interaction with Website are increasing customer satisfaction and
the Website. decreasing call center costs; it is more cost-effec-
New visitor ratio is the measurement of new tive for a company to have visitors find informa-
visitors to unique visitors and is used to determine tion through its Website than it is to operate a call
if a site is attracting new people. When measur- center. The KPIs of interest are visit length, content
ing this KPI, the age of the Website plays a role depth, and bounce rate. In addition, other areas to
– newer sites will want to attract new people. Simi- examine are customer satisfaction metrics and top
larly, another factor to consider is if the Website internal search phrases (McFadden, 2005).
is concerned more about customer retention or Page depth for support/self service sites is the
gaining new customers. As a rule, however, the same measurement as page depth content/media
new visitor ratio should decrease over time as the sites – the ratio of page views to unique visitors.
returning visitor ratio increases. New visitors can With support/self service sites, however, high page
be brought to the Website in a variety of different depth is not always a good sign. For example, a
ways, so a good way to increase this KPI is to visitor viewing the same page multiple times may
try different marketing strategies and figure out show that the visitor is having trouble finding
which campaigns bring the most (and the best) helpful information on the Website or even that
traffic to the site. the information the visitor is looking for does
The final KPI for content/media sites is page not exist on the site. The goal of these types of
depth. This is the ratio of page views for a spe- sites is to help customers find what they need as
cific page and the number of unique visitors to quickly as possible and with the least amount of
that page. This KPI is similar to visit depth, but navigation through the site (CCMedia, 2007).
its measurements focus more on page popular- The best way to keep page depth low is to keep
ity. Average page depth can be used to measure visitor confusion low.
interest in specific areas of a Website over time As with the bounce rate of other Website types,
and to make sure that the interests of the visitors the bounce rate for support/self service sites re-
match the goals of the Website. If one particular flects ease of use, advertising effectiveness, and
page on a Website has a high page depth, it is an visitor interest. A low bounce rate means that qual-
indication that that page is of particular interest ity traffic is coming to the Website and deciding
to visitors. An example of a page in a Website that the site’s information is potentially useful.
expected to have a higher page depth would

153
A Review of Methodologies for Analyzing Websites

Poor advertisement campaigns and poor Website Regardless of Website type, the KPIs listed
layout will increase a site’s bounce rate. above are not the only KPIs that can prove use-
Customer satisfaction deals with how the us- ful in analyzing a site’s traffic, but they provide a
ers rate their experience on a site and is usually good starting point. The main thing to remember
collected directly from the visitors (not from log is that no matter what KPIs a company chooses,
files), either through online surveys or through they must be aligned with its business goals, and
satisfaction ratings. Although it is not a KPI in more KPIs do not necessarily mean better analysis
the traditional sense, gathering data directly – quality is more important than quantity.
from visitors to a Website is a valuable tool for
figuring out exactly what visitors want. Customer
satisfaction measurements can deal with customer KEY BEST PRACTICES
ratings, concern reports, corrective actions, re-
sponse time, and product delivery. Using these In this chapter, we have addressed which metrics
numbers, one can compare the online experience can be gathered from a Website, how to gather
of the Website’s customers to the industry average them, and how to determine which information
and make improvements according to visitors’ is important. But how can this help improve a
expressed needs. business? To answer this, the Web Analytics
Similarly, top internal search phrases applies Association provides nine key best practices to
only to sites with internal search, but it can be follow when analyzing a Website (McFadden,
used to measure what information customers are 2005). Figure 1 outlines this process.
most interested in which can lead to improvement
in site navigation. This information can be used to Identify Key Stakeholders
direct support resources to the areas generating the
most user interest, as well as identify which parts A stakeholder is anyone who holds an interest
of the Website users may have trouble accessing. in a Website. This includes management, site
In addition, if many visitors are searching for a developers, visitors, and anyone else who cre-
product not supported on the Website, it could be ates, maintains, uses, or is affected by the site.
a sign of ineffective marketing. In order for the Website to be truly beneficial, it

Figure 1. The best key practices of Web analytics

154
A Review of Methodologies for Analyzing Websites

must integrate input from all major stakehold- brings in the most revenue. Defining the different
ers. Involving people from different parts of the levels of customers will allow one to consider the
company also makes it more likely that they will goals of these visitors. What improvements can
embrace the Website as a valuable tool. be made to the Website in order to improve their
browsing experiences?
Define Primary Goals for Your
Website Determine the Key Performance
Indicators
To know the primary goals of a Website, one
must first understand the primary goals of its The next step is picking the metrics that will be
key stakeholders. This could include such goals most beneficial in improving the site and eliminat-
as increasing revenue, cutting expenses, and ing the ones that will provide little or no insight
increasing customer loyalty (McFadden, 2005). into its goals. One can then use these metrics to
Once those goals have been defined, discuss determine which KPI you wish to monitor. As
each goal and prioritize them in terms of how the mentioned in the previous section, the Website
Website can most benefit the company. As always, type – commerce, lead generation, media/con-
beware of political conflict between stakeholders tent, or support/self service – plays a key role
and their individual goals as well as assumptions in which KPIs are most effective for analyzing
they may have made while determining their site traffic.
goals that may not necessarily be true. By going
through this process, a company can make sure Identify and Implement the Right
that goals do not conflict and that stakeholders Solution
are kept happy.
This step deals with finding the right Web analytics
Identify the Most Important Site technology to meet the business’s specific needs.
Visitors After the KPIs have been defined, this step should
be easy. The most important things to consider
According to Sterne, corporate executives are the budget, software flexibility and ease of
categorize their visitors differently in terms of use, and how well the technology will work with
importance. Most companies classify their most the needed metrics. McFadden suggests that it is
important visitors as ones who either visit the site also a good idea to run a pilot test of the top two
regularly, stay the longest on the site, view the vendor choices (McFadden, 2005). We will expand
most pages, purchase the most goods or services, on this topic further in the next section.
purchase goods most frequently, or spend the
most money (Sterne, n. d.). There are three types Use Multiple Technologies and
of customers – (1) customers a company wants Methods
to keep who have a high current value and high
future potential, (2) customers a company wants Web analytics is not the only method available for
to grow who can either have a high current value improving a Website. To achieve a more holistic
and low future potential or low current value and view of a site’s visitors, one can also use tools
high future potential, and (3) customers a company such as focus groups, online surveys, usability
wants to eliminate who have a low current value studies, and customer services contact analysis
and low future potential. The most important visi- (McFadden, 2005).
tor to a Website, however, is the one who ultimately

155
A Review of Methodologies for Analyzing Websites

Make Improvements Iteratively 1. What is the difference between your tool


and free Web analytics tools? Since the
When analyzing a Website’s data, it is helpful to company who owns the Website will be
add gradual improvements to the Website instead paying money for a service, it is important
of updating too many facets of the Website at to know why that service is better than free
once. By doing this one can monitor if a singu- services (for example, Google Analytics).
lar change is an improvement or if it is actually Look for an answer that outlines the features
hurting the site. and functionality of the vendor. Do not look
for answers about increased costs because
Hire and Empower a Full-Time of privacy threats or poor support offered
Analyst by free analytics tools.
2. Do you offer a software version of your tool?
It is important to put a person in charge of the Generally, a business will want to look for
data once it is collected. According to the Web a tool that is software based and can run on
Analytics Association, a good analyst understands their own servers. If a tool does not have a
business needs (which means communicating well software version but plans to make one in
with the stakeholders), has knowledge of technol- the future, it shows insight into how prepared
ogy and marketing, has respect, credibility, and they are to offer future products if there is
authority, and is already a company employee. interest.
Although it may seem like hiring a full-time 3. What methods do you use to capture data?
analyst is expensive, many experts agree that the If you remember from the first section, there
return on revenue should be more than enough are two main ways to capture visitor data
compensation (McFadden, 2005). from a Website – log files and page tagging.
Ideally, one should look for a vendor that
Establish a Process of Continuous offers both, but what they have used in the
Improvement past is also important. Because technology
is constantly changing, look for a company
Once the Web analysis process is decided upon, that has kept up with these changes in the
continuous evaluation is paramount. This means past by providing creative solutions.
reviewing the goals and metrics and monitoring 4. Can you help me calculate the total cost
new changes and features which are added to of ownership for your tool? The total cost
the Website. It is important that the improve- of ownership for a Web analytics tool de-
ments are adding value to the site and meeting pends on the specific company, the systems
expectations. they have in place, and the pricing of the
prospective Web analytics tool. In order to
make this calculation, one must consider
SPECIFIC TOOLS the following:
a. Cost per page view.
Choosing a Tool b. Incremental costs (i.e. charges for
overuse or advanced features).
Once the company decides what it wants out of c. Annual support costs after the first
the Web analysis, it is time to find the right tool. year.
Kaushik outlines ten important questions to ask d. Cost of professional services (i.e. instal-
Web analytics vendors (2007): lation, troubleshooting, or customiza-
tion).

156
A Review of Methodologies for Analyzing Websites

e. Cost of additional hardware you may future functionality, it will also show how
need. much they know about their competitors.
f. Administration costs (which includes 10. Why did you lose your last two clients? Who
the cost of an analyst and any additional are they using now? The benefits of this
employees you may need to hire). question are obvious -- by knowing how
5. What kind of support do you offer? Many they lost prior business, the business can be
vendors advertise free support, but it is im- confident that it has made the right choice.
portant to be aware of any limits that could
incur additional costs. It is also important Some examples of free and commercially
to note how extensive their support is and available analytics tools are discussed below.
how willing they are to help.
6. What features do you provide that will al- Free Tools
low me to segment my data? Segmentation
allows companies to manipulate their data. One of the most popular free analytics tools on the
Look for the vendor’s ability to segment Web now is Google Analytics (previously Urchin).
your data after it is recorded. Many vendors Google Analytics (http://www.google.com/ana-
use JavaScript tags on each page to segment lytics/) uses page tagging to collect information
the data as it is captured, meaning that the from visitors to a site. In addition to expanding
company has to know exactly what it wants on the already highly regarded Urchin analytics
from the data before having the data itself; tool, it also provides support for integrating other
this approach is less flexible. analytic information (for example, WordPress
7. What options do I have to export data into and AdWords). Google Analytics reports many
our system? It is important to know who of the KPIs discussed in the previous sections
ultimately owns and stores the data and including depth of visit, returning visitors, and
whether it is possible to obtain both raw page depth.
and processed data. Most vendors will not There is, however, concern about privacy is-
provide companies with the data exactly as sues regarding Google Analytics because Google
they need it, but it is a good idea to realize uses their default privacy policy for their analyt-
what kind of data is available before a final ics tools, but the company assures its Google
decision is made. Analytics users that only account owners and
8. Which features do you provide for inte- people to whom the owners give permission will
grating data from other sources into your have access to the data (Dodoo, 2006). Microsoft
tool? This question deals with the previous also provides a free Web analytic software called
section’s Key Best Practice #6: Use Multiple Gatineau (Thomas, 2007).
Technologies and Methods. If a company
has other data it wants to bring to the tool Paid Tools
(such as survey data or data from your ad
agency), bring them up to the potential InfoWorld provides an in-depth analysis compar-
analytics vendor and see if it is possible to ing the top four Web analytic companies – Co-
integrate this information into their tool. remetrics, WebTrends, Omniture, and WebSide-
9. What new features are you developing that Story HBX (Heck, 2005). They created a scoring
would keep you ahead of your competition? chart and measured each vendor on reporting,
Not only will the answer to this question administration, performance, ease-of-use, sup-
tell how much the vendor has thought about port, and value. Coremetrics received a score of

157
A Review of Methodologies for Analyzing Websites

8.3 with its highest ratings in administration and taking measures to improve their profits based off
support. It is a hosted service that offers special these numbers. Regardless of business size and
configurations for financial, retail, and travel objective, an effective Web analytics strategy is
services. WebTrends also earned a score of 8.3 becoming increasingly essential.
with its highest rating in reporting. This tool is
expensive, but it offers a wide range of perfor-
mance statistics and both client and server hosting. REFERENCES
Omniture is next in line with a score of 8.4 with
its highest ratings in reporting and support. It is an Aldrich, S. E. (2006, May 2). The Other Search:
ASP reporting application that excels in providing Making the Most of Site Search to Optimize the
relevant reports. WebSideStory had the highest Total Customer Experience. Patricia Seybold
score of 8.7 with its highest ratings in reporting, Group. Retrieved March 7, 2007, from WebSide-
administration, ease-of-use, and support. This Story database.
tool is easy to use and is appropriate for many
Ansari, S., Kohavi, R., Mason, L., & Zheng, Z.
different types of businesses.
(2001). Integrating E-Commerce and Data Min-
ing: Architecture and Challenges. IEEE Interna-
tional Conference on Data Mining.
CONCLUSION
Avinash, A. (2007, June 26). Bounce Rate: Sexiest
The first step in analyzing your Website and Web- Web Metric Ever? Retrieved December 2, 2007,
site visitors is understanding and analyzing your from http://www.mpdailyfix.com/2007/06/
business goals and then using that information to bounce_rate_sexiest_web_metric.html.
carefully choose your metrics. In order to take
Becher, J. D. (2005, March). Why Metrics-Centric
full advantage of the information gathered from
Performance Management Solutions Fall Short.
your site’s visitors, you must consider alternative
DM Review Magazine. Retrieved March 7, 2007,
methods such as focus groups and online surveys,
from http://www.dmreview.com/article_sub.
make site improvements gradually, hire a full-time
cfm?articleId=1021509.
analyst, and realize that your site’s improvement is
a process and not a one-time activity. Using these Belkin, M. (2006, April 8). 15 Reasons why all
key best practices and choosing the right analytics Unique Visitors are not created equal. Retrieved
vendor to fit your business will save your company March 7, 2007, from http://www.omniture.com/
money and ultimately increase revenue. blog/node/16.
As Web analytics continues to mature, the
Burby, J. (2004, July 20). Build a Solid Foun-
methods vendors use to collect information are
dation With Key Performance Indicators, Part
becoming more refined. One article speculates
1: Lead-Generation Sites. Retrieved March 7,
that companies will find concrete answers to
2007, from http://www.clickz.com/showPage.
the problems with cookies and unique visitors
html?page=3382981.
(Eisenberg, 2005). The Web analytics industry as
a whole is also expanding. According to Eisenberg Burby, J. & Brown, A. (2007). Web Analytics
(2005), a recent Jupiter report predicts an increase Definitions. Retrieved October 30, 2007, from
in the Web analytics industry – 20 percent an- http://www.webanalyticsassociation.org/attach-
nually, reaching $931 million in 2009. More and ments/committees/5/WAA-Standards-Analytics-
more businesses are realizing the benefits of Definitions-Volume-I-20070816.pdf.
critically analyzing their Website traffic and are

158
A Review of Methodologies for Analyzing Websites

CCMedia. (2007, August 30). How to Obtain a com/tividd/td/ITWSA/ITWSA_info45/en_US/


Cost-effective Operational Model for Support/ HTML/guide/c-logs.html
Self-service Websites? Retrieved December 5,
Jacka, R. Getting Results From Your Website.
2007, from www.webnibbler.com/en/WhitePa-
Retrieved October 30, 2007, from http://www.
per/Online%20Support%20Website.pdf.
panalysis.com/downloads/gettingresults.pdf
Dodoo, M. (2006, March 3). Privacy & Google
Jackson, M. (2007, January 22). Analytics: Deci-
Analytics. Retrieved March 7, 2007, from http://
phering the Data. Retrieved March 7, 2007, from
www.marteydodoo.com/2006/05/03/privacy-
http://www.ecommerce-guide.com/resources/ar-
google-analytics/.
ticle.php/3655251
Eisenberg, B. (2005, April 1). Web Analytics:
Kaushik, A (2006, November 13). Excellent Ana-
Exciting Times Ahead. Retrieved March 7,
lytics Tip #8: Measure the Real Conversion Rate
2007, from http://www.clickz.com/showPage.
& ‘Opportunity Pie.’ Retrieved November 3, 2007,
html?page=3493976
from http://www.kaushik.net/avinash/2006/11/ex-
eVision. (2007, September 27). Websites that cellent-analytics-tip-8-measure-the-real-conver-
convert visitors into customers: Improving the sion-rate-opportunity-pie.html
ability of your Website to convert visitors into
Kaushik, A. (2007, January 23). Web Analytics
inquiries, leads, and new business. Retrieved
Tool Selection: 10 Questions to ask Vendors. Re-
March 7, 2007, from http://www.evisionsem.
trieved March 7, 2007, from http://www.kaushik.
com/marketing/webanalytics.htm
net/avinash/2007/01/web-analytics-tool-selec-
FoundPages. (2007, October 25). Increasing tion-10-questions-to-ask-vendors.html
Conversion Rates. Retrieved October 31, 2007,
MarketingSherpa. (2007, October 20). Security
from http://www.foundpages.com/calgary-inter-
Logo in Email Lifts Average Order Value 28.3%.
net-marketing/search-conversion.html
Retrieved December 4, 2007, from https://www.
Greenfield, M. (2006, January 1). Use Web Ana- marketingsherpa.com/barrier.html?ident=30183
lytics to Improve Profits for New Year: Focus
Marshall, J. Seven Deadly Web Analytics Sins.
on four key statistics. Retrieved March 7, 2007,
Retrieved March 7, 2007, from http://www.click-
from http://www.practicalecommerce.com/ar-
tracks.com/insidetrack/articles/7_deadly_weba-
ticles/132/Use-Web-Analytics-to-Improve-Prof-
nalytics_sins01.php
its-for-New-Year/
Mason, N. (2007, February 6). Customer Loy-
Hallam-Baker, P. M. & Behlendorf, B. (1999,Feb-
alty Improves Retention. Retrieved March 7,
ruary 4). Extended Log File Format. Retrieved
2007, from http://www.clickz.com/showPage.
March 7, 2007, from http://www.w3.org/TR/WD-
html?page=3624868
logfile.html
McFadden, C. (2005, July 6). Optimizing the
Heck, M. (2005, February 18). Chart Your
Online Business Channel with Web Analytics.
Website’s Success. Retrieved March 7, 2007, from
Retrieved March 7, 2007, from http://www.We-
http://www.infoworld.com/Omniture_SiteCata-
banalyticsassociation.org/en/art/?9
lyst_11/product_56297.html?view=1&curNodeI
d=0&index=0 Microsoft. (2005, August 22). W3C Extended
Log File Examples. Retrieved March 7,
IBM. (2004, May 19). Log File Formats. Retrieved
2007, from http://technet2.microsoft.com/Win-
October 29, 2007, from http://publib.boulder.ibm.

159
A Review of Methodologies for Analyzing Websites

dowsServer/en/library/b5b8a519-8f9b-456b- www.Webanalyticsassociation.org/attachments/
9040-018358f2c0c01033.mspx?mfr=true contentmanagers/336/1%20Path%20AnAnalys.
doc
Permadi, F. (2005, June 19). Introduction to Flash
Local Shared-Object. Retrieved March 7, 2007, WebSideStory. (2004). Use of Key Performance
from http://www.permadi.com/tutorial/flash- Indicators in Web Analytics. Retrieved December
SharedObject/index.html 2, 2007, from www.4everywhere.com/docu-
ments/KPI.pdf
Peterson, E. T. (2004). Web Analytics Demystified.
Celilo Group Media. White, K. (2006, May 10). Unique vs. Returning
Visitors Analyzed. Retrieved March 7, 2007, from
Peterson, E. T. (2005, July 31). Average Order
http://newsletter.blizzardinternet.com/unique-vs-
Value. Retrieved November 3, 2007, from Web
returning-visitors-analyzed/2006/05/10/#more-
Analytics Demystified Blog Website: http://blog.
532
webanalyticsdemystified.com/weblog/2005/07/
average-order-value.html Young, D. (2007, August 15). Site Search: Increases
Conversion Rates, Average Order Value And Loy-
QuestionPro. Measuring Customer Loyalty and
alty. Practical Ecommerce, Retrieved November
Customer Satisfaction. Retrieved November 21,
15, 2007, from http://www.practicalecommerce.
2007, from http://www.questionpro.com/akira/
com/articles/541/Site-Search-Increases-Conver-
showArticle.do?articleID=customerloyalty.
sion-Rates-Average-Order-Value-And-Loyalty/
Sapir, D. (2004, August). Online Analytics and
Business Performance Management. BI Report.
Retrieved March 7, 2007, from http://www. KEY TERMS
dmreview.com/editorial/dmreview/print_action.
cfm?articleId=1008820 Abandonment Rate: KPI that measures the
percentage of visitors who got to that point on
SearchCRM. (2007, May 9). Measuring Cus-
the site but decided not to perform the target
tomer Loyalty. Retrieved November 4, 2007,
action.
from http://searchcrm.techtarget.com/general/
0,295582,sid11_gci1253794,00.html Alignment-Centric Performance Man-
agement: Method of defining a site’s business
Sen, A., Dacin, P. A., & Pattichis, C. (2006, No-
goals by choosing only a few key performance
vember). Current trends in Web data analysis.
indicators.
Communications of the ACM, 49(11), 85 - 91.
Average Order Value: KPI that measures the
Sterne, J. 10 Steps to Measuring Website Suc-
total revenue to the total number of orders.
cess. Retrieved March 7, 2007, from http://www.
marketingprofs.com/login/join.asp?adref=rdblk Average Time on Site (ATOS): See visit
&source=/4/sterne13.asp length.
Thomas, I. (2007, January 9). The rumors are Checkout Conversion Rate: KPI that mea-
true: Microsoft ‘Gatineau’ exists. Retrieved sures the percent of total visitors who begin the
March 7, 2007, from http://www.liesdamnedlies. checkout process.
com/2007/01/the_rumors_are_.html
Commerce Website: A type of Website where
Web Analytics Association. Onsite Behavior - Path the goal is to get visitors to purchase goods or
Analysis. Retrieved March 7, 2007, from http:// services directly from the site.

160
A Review of Methodologies for Analyzing Websites

Committed Visitor Index: KPI that measures time of the request, request page, referrer, and
the percentage of visitors that view more than information on the visitor’s Web browser and
one page or spend more than 1 minute on a site operating system.
(these measurements should be adjusted accord-
Log File Analysis: Method of gathering met-
ing to site type).
rics that uses information gathered from a log file
Content/Media Website: A type of Website to gather Website statistics.
focused on advertising.
Metrics: Statistical data collected from a
Conversion Rate: KPI that measures the per- Website such as number of unique visitors, most
centage of total visitors to a Website that perform popular pages, etc.
a specific action.
New Visitor: A user who is accessing a Website
Cost Per Lead (CPL): KPI that measures the for the first time.
ratio of marketing expenses to total leads and
New Visitor Percentage: KPI that measures
shows how much it costs a company to generate
the ratio of new visitors to unique visitors.
a lead.
Online Business Performance Management
Customer Satisfaction Metrics: KPI that
(OBPM): Method of defining a site’s business
measures how the users rate their experience
goals that emphasizes the integration of busi-
on a site.
ness tools and Web analytics to make better
Customer Loyalty: KPI that measures the decisions quickly in an ever-changing online
ratio of new to existing customers. environment.
Demographics and System Statistics: A Order Conversion Rate: KPI that measures
metric that measures the physical location and the percent of total visitors who place an order
information of the system used to access the on a Website.
Website.
Page Depth: KPI that measures the ratio of
Depth of Visit: KPI that measures the ratio page views for a specific page and the number of
between page views and visitors. unique visitors to that page.
Internal Search: A metric that measures in- Page Tagging: Method of gathering metrics
formation on keywords and results pages viewed that uses an invisible image to detect when a
using a search engine embedded in the Website. page has been successfully loaded and then uses
JavaScript to send information about the page and
Key Performance Indicator (KPI): A com-
the visitor back to a remote server.
bination of metrics tied to a business strategy.
Prospect Rate: KPI that measures the per-
Lead Generation Website: A type of Website
centage of visitors who get to the point in a site
that is used to obtain user contact information in
where they can perform the target action (even
order to inform them of a company’s new products
if they do not actually complete it).
and developments, and to gather data for market
research. Referrers and Keyword Analysis: A metric
that measures which sites have directed traffic
Log File: Log kept by a Web server of informa-
to the Website and which keywords visitors are
tion about requests made to the Website including
using to find the Website.
(but not limited to) visitor IP address, date and

161
A Review of Methodologies for Analyzing Websites

Repeat Visitor: A user who has been to a Traffic Concentration: KPI that measures
Website before and is now returning. the ratio of number of visitors to a certain area
in a Website to total visitors.
Returning Visitor: KPI that measures the
ratio of unique visitors to total visits. Unique Visit: One visit to a Website (regard-
less of if the user has previously visited the site);
Search Engine Referrals: KPI that measures
an alternative to unique visitors.
the ratio of referrals to a site from specific search
engines compared to the industry average. Unique Visitor: A specific user who accesses
a Website.
Single Access Ratio: KPI that measures the
ratio of total single access pages (or pages where Visit Length: A metric that measures total
the visitor enters the site and exits immediately amount of time a visitor spends on the Website.
from the same page) to total entry pages.
Visit Value: KPI that measures the total num-
Stickiness: KPI that measures how many ber of visits to total revenue.
people arrive at a homepage and proceed to tra-
Visitor Path: A metric that measures the route
verse the rest of the site.
a visitor uses to navigate through the Website.
Support/Self Service Website: A type of
Visitor Type: A metric that measures users
Website that focuses on helping users find special-
who access a Website. Each user who visits the
ized answers for their particular problems.
Website is a unique user. If it is a user’s first time
Top Pages: A metric that measures the pages to the Website, that visitor is a new visitor, and
in a Website that receive the most traffic. if it is not the user’s first time, that visitor is a
repeat visitor.
Total Bounce Rate: KPI that measures the
percentage of visitors who scan the site and then Web Analytics: The measurement of visitor
leave. behavior on a Website.

162
Themes for the Final Projects in the course of Web Analytics (2019-20)
Dr. Preeti Khanna

EVALUATION CRITERIA

Focus of the project will be to provide feedback to students on How the business competiveness
increases by use of technology and also on how complex situations in industry can have multiple
perspectives. Critical thinking will be an important criterion for evaluation.

The project is for 15 marks


1. Presentation & Q/A: 10 Marks
2. Report (spacing < 1.15; Calibri font size 11; Number of pages – 3-4 pages back to back): 5
Marks
3. All the material /blogs / articles used must be well referenced. A soft copy of the material
referred to be submitted. For referencing and softcopy, you may use apps like Zotero and
Printwhatyoulike
4. As mentioned earlier, the projects will be evaluated in comparison with other projects. (A
concept similar to reverse auction. Do not choose a very simplistic study)
5. All members of the group may not get same marks. During presentation, evaluation will
be based on understanding of the topic and ability to present the concept concisely.

PRESENTATION SCHEDULE

1. Up to 20-25 minutes per group followed by Q/A starting from 9th and 10th session (13th
August and 20th August)
2. Submissions: Softcopy of PPT and Report both at the day of PPT.
3. All members of the group have to be present during the presentation.
4. Students are expected to meet the faculty member after 4 th session to discuss the
progress and also get interim feedback.

REPORT AND PRESENTATION should have the following parameters:


Need to attach the Plagiarism Report along with your Report.
1. Introduction with Research Objectives and Business Goal
2. Scope of problem
3. How you are approaching the problem and solving them
4. Exemplarily shows the conduction of Web Analytics on a real world project. (Designing and
choosing from KPI to Report Generation and Analysis)
5. Finally brings together the previously obtained findings.
6. Challenges / Benefits from various stakeholders’ point of view
7. Concluding Remarks
8. References (text and web with date accessed)
9. Exhibits (facts/figures/diagrams etc.)
Themes could be chosen from this or You may select your own themes -

Some Themes which you can select for your Few References for respective projects
Projects
1 Using Google Analytics to evaluate the usability https://dspace.lboro.ac.uk/dspace-jspui /bitstream
of e-commerce sites /2134/5685 /1/ Using%20Google% 20 Analytics%
 To improve the team performance 20to%20Evaluate %20the%20Usability%20ofE-
 To increase the sales to events commerce%20Sites.pdf
 To optimize their Digital Marketing
campaigns
2 A Study of Web Mining Application on E- https://pdfs.semanticscholar.org/4b7a/
Commerce using Google Analytics Tool 5f66e21a986b04c381120add6a091ba9538e.pdf

3 Imagine you are the commercial Director of a https://www.tableau.com/resources/sports_management


sports team of your choice. Select a sport, such
as football, baseball. Use the web analytics to
research how the sport uses data and analytics https://balsa.man.poznan.pl/indico/event
 To improve the team performance /44/material/paper/0?contribId=167
 To increase the sales to events
 To optimize their Digital Marketing
campaigns.
 For a sports federation such as the
International Cricket Council, how
should the Internet be used in
conjunction with traditional live
television broadcasting
 How can social media be used to increase
participation and inclusion of
disadvantaged groups and developing
countries?
4 SEO Applications (Media and Entertainment
Industry)
 For Brand Building
 For Engagements
 For example, many retailers as well as
sports professionals such as freelance
nutritionists, psychologists,
physiologists, fitness coaches and
personal trainers have taken to SEO to
raise their profile within the search
engine results’ pages (SERPs).
5 Online stores
(Amazon, Newegg, Zappos, Half.com) offer the
notion of a shopping cart. Just as MP3 players
often have a "rewind" button even though the
physical notion of rewinding no longer exists,
shoppers have come to expect an online cart to
have the same features as a physical cart:
temporarily holding items until purchase,
removals and additions, and so on. Website
visitors can usually place items in a cart before
logging in or identifying themselves, and may be
able to retain a cart's contents across sessions.
Large retailers typically implement their own
shopping cart applications, but smaller ones
often use third-party services, such
as Opencart, Shopify, PayPal, and Stripe.
In this project, you will implement a basic
shopping cart application with two user
interfaces: one for a shopper (for purchasing
items), and one for a shopkeeper (for editing the
catalog of items and for reviewing orders).
In this project, you are asked to do a design on
paper before you do any implementation at all.
This is intended to give you some experience
focusing on high-level design issues before you
jump into coding details.
If you do it right, you should be able to save
yourself time overall (so that the whole project
takes less time than it would if you just hacked
the code).
To get this advantage, you will need to use the
design analysis as an opportunity
 to focus on exactly what you plan to do
and why, eliminate any extraneous
complexities, make suitable
generalizations, and figue out how to
exploit the features of Rails (e.g., RESTful
resources);
 to prepare an object model that can be
transformed very directly into code;
 to anticipate some of the tricky issues
that might derail the implementation.
 How would you publicize your website?
 How effectively you use analytics for ad
campaign
 What are the ways that you get to hear
your customers’ raw opinions on the
web and see why they like & buy your
product?

Your design document should be clear and


complete enough that someone else could
implement your design without requiring more
information from you.
o6 Web Traffic Analysis: https://www.oreilly.com/library/view/web-analytics-
o A Comprehensive Guide to Track, measure and with/9781787126527/
analyze your website traffic and hence
performance effectively using the popular tools
in R

 Covers the latest technology of R's


superior advanced analytics capabilities
 Shows how to perform efficient web
analytics with R to increase your
business’ profitability

7 How Website Speed Impacts Ecommerce


Revenue and Analytics

8 Role of Web Analytics in blogging and getting https://www.emarketinginstitute.org/free-ebooks/blogging-


benefits from it for-beginners/

 Show how to integrate blogging into the (free e-book)


online promotion of your business.
 to help with organic traffic
 To lead generation and eventually
conversions

9 Social Media Analytics

Das könnte Ihnen auch gefallen