Sie sind auf Seite 1von 64

Automated Testing of Massively Multi-Player Games

Lessons Learned from The Sims Online

Larry Mellon Spring 2003

Context: What Is Automated Testing?

Classes Of Testing
Feature Regression
QA

System Stress
Load Random Input

Developer

Automation Components
Startup
&

Control

Repeatable, Synced Test Inputs

Collection
&

Analysis

System Under Test

System Under Test

System Under Test

What Was Not Automated?


Startup & Control Repeatable, Synchronized Inputs Results Analysis

Visual Effects

Lessons Learned: Automated Testing


Design & Initial Implementation
Architecture, Scripting Tests, Test Client Initial Results

1/ 3

Fielding: Analysis & Adaptations

1/ 3

Wrap-up & Questions


What worked best, what didnt Tabula Rasa: MMP / SPG

1/ 3

Time (60 Minutes)

Design Constraints
Load

Automation
(Repeatable, Synchronized Input) (Data Management)

Regression Churn Rate

Strong Abstraction

Single, Data Driven Test Client


Regression
Reusable Scripts & Data Single API

Load

Test Client

Data Driven Test Client


Testing feature correctness Testing system performance

Regression
Reusable Scripts & Data Single API

Load

Test Client
Single API Key Game States
Configurable Logs & Metrics

Pass/Fail Responsiveness

Problem: Testing Accuracy


Load & Regression: inputs must be
Accurate Repeatable

Churn rate: logic/data in constant motion


How to keep testing client accurate?

Solution: game client becomes test client


Exact mimicry Lower maintenance costs

Test Client == Game Client


Test Client Game Client

Test Control

Game GUI

State

State

Commands

Presentation Layer

Client-Side Game Logic

Game Client: How Much To Keep?


Game Client
View

Presentation Layer
Logic

What Level To Test At?


Mouse Clicks

Game Client View


Presentation Layer

Logic
Regression: Too Brittle (pixel shift) Load: Too Bulky

What Level To Test At?


Game Client View

Internal Events

Presentation Layer

Logic
Regression: Too Brittle
(Churn Rate vs Logic & Data)

Gameplay: Semantic Abstractions


Basic gameplay changes less frequently than UI or protocol implementations.

View
Logic

NullView Client
Presentation Layer
Buy Lot Buy Object

~ ~

Enter Lot Use Object

Scriptable User Play Sessions


SimScript
Collection: Presentation Layer primitives Synchronization: wait_until, remote_command State probes: arbitrary game state
Avatars body skill, lamp on/off,

Test Scripts: Specific / ordered inputs


Single user play session Multiple user play session

Scriptable User Play Sessions


Scriptable play sessions: big win
Load: tunable based on actual play Regression: constantly repeat hundreds of play sessions, validating correctness

Gameplay semantics: very stable


UI / protocols shifted constantly Game play remained (about) the same

SimScript: Abstract User Actions


include_script enter_lot wait_until setup_for_test.txt $alpha_chimp game_state inlot

chat Im an Alpha Chimp, in a Lot. log_message Testing object purchase. log_objects buy_object chair 10 10 log_objects

SimScript: Control & Sync


# Have a remote client use the chair remote_cmd $monkey_bot use_object chair sit
set_data avatar set_data book use_object book wait_until avatar set_recording on reading_skill 80 unlock read reading_skill 100

Client Implementation

Composable Client
Event Generators Generators Event Event Generators

- Scripts - Cheat Console - GUI

Presentation Layer

Game Logic

Composable Client
- Scripts - Console - GUI

Event Generators Generators Event Event Generators

Viewing Systems Systems Viewing Viewing Systems

- Console - Lurker - GUI

Presentation Layer

Game Logic

Any / all components may be loaded per instance

Lesson: View & Logic Entangled


Game Client View Logic

Few Clean Separation Points


Game Client View

Presentation Layer
Logic

Solution: Refactored for Isolation


Game Client View

Presentation Layer
Logic

Lesson: NullView Debugging


Without (legacy) view system attached, tracing was difficult.

?
Presentation Layer Logic

Solution: Embedded Diagnostics


Diagnostics Diagnostics Diagnostics
Timeout Handlers

Presentation Layer Logic

Talk Outline: Automated Testing


Design & Initial Implementation

Architecture & Design

Test Client

1/ 3

Initial Results Lessons Learned: Fielding


1/ 3

Wrap-up & Questions

1/ 3

Time (60 Minutes)

Mean Time Between Failure


Random Event, Log & Execute Record client lifetime / RAM Worked: just not relevant in early stages of development
Most failures / leaks found were not high-priority at that time, when weighed against server crashes

Monkey Tests
Constant repetition of simple, isolated actions against servers Very useful:
Direct observation of servers while under constant, simple input Server processes aged all day

Examples:
Login / Logout Enter House / Leave House

QA Test Suite Regression


High false positive rate & high maintenance
New bugs / old bugs Shifting game design Unknown failures

Not helping in day to day work.

Talk Outline: Automated Testing


Design & Initial Implementation

Fielding: Analysis&Adaptations
Non-Determinism
Maintenance Overhead Solutions & Results
Monkey / Sniff / Load / Harness


Time (60 Minutes)

Wrap-up & Questions

Analysis: Testing Isolated Features

Analysis: Critical Path


Test Case: Can an Avatar Sit in a Chair?
use_object () buy_object () enter_house () buy_house () create_avatar () login ()

Failures on the Critical Path block access to much of the game.

Solution: Monkey Tests


Primitives placed in Monkey Tests
Isolate as much possible, repeat 400x Report only aggregate results
Create Avatar: 93% pass (375 of 400)

Poor Mans Unit Test


Feature based, not class based Limited isolation Easy failure analysis / reporting

Talk Outline: Automated Testing


Design & Initial Implementation

1/ 3

Lessons Learned: Fielding


Non-Determinism

Maintenance Costs
Solution Approaches
Monkey / Sniff / Load / Harness

1/ 3

Wrap-up & Questions

1/ 3

Time (60 Minutes)

Analysis: Maintenance Cost


High defect rate in game code
Code Coupling: side effects Churn Rate: frequent changes

Critical Path: fatal dependencies High debugging cost


Non-deterministic, distributed logic

Turnaround Time
Regression
Tests were too far removed from introduction of defects.

Smoke Build days Bug Introduced Checkin

Time to Fix
Development

Cost of Detection

Critical Path Defects Were Very Costly


Regression Smoke

Impact on Others

Build days Bug Introduced Checkin


Time to Fix

Development
Cost of Detection

Solution: Sniff Test


Pre-Checkin Regression: dont let broken code into Mainline.
Smoke Checkin
Working Code

Regression

Candidate Code

Sniff
Pass / Fail, Diagnostics

Development

Solution: Hourly Diagnostics


SniffTest Stability Checker
Emulates a developer Every hour, sync / build / test

Critical Path monkeys ran non-stop


Constant baseline

Traffic Generation
Keep the pipes full & servers aging Keep the DB growing

Analysis: CONSTANT SHOUTING


IS REALLY IRRITATING

Bugs spawned many, many, emails Solution: Report Managers


Aggregates / correlates across tests Filters known defects Translates common failure reports to their root causes

Solution: Data Managers


Information Overload: Automated workflow tools mandatory

ToolKit Usability
Workflow automation Information management Developer / Tester push button ease of use XP flavour: increasingly easy to run tests
Must be easier to run than avoid to running Must solve problems on the ground now

Sample Testing Harness Views

Load Testing: Goals


Expose issues that only occur at scale Establish hardware requirements Establish response is playable @ scale Emulate user behaviour
Use server-side metrics to tune test scripts against observed Beta behaviour

Run full scale load tests daily

Load Testing: Data Flow


Resource Metrics Load Testing Team
Client Metrics

Debugging Data

Load Control Rig

Test Test Test Client Client Client Test Driver CPU

Test Test Test Client Client Client Test Driver CPU Game Traffic

Test Test Test Client Client Client Test Driver CPU

System Monitors

Server Cluster

Internal Probes

Load Testing: Lessons Learned


Very successful
Scale&Break: up to 4,000 clients

Some conflicting requirements w/Regression


Continue on fail Transaction tracking Nullview client a little chunky

Current Work
QA test suite automation Workflow tools Integrating testing into the new features design/development process Planned work
Extend Esper Toolkit for general use Port to other Maxis projects

Talk Outline: Automated Testing


Design & Initial Implementation

1/ 3

Lessons Learned: Fielding

1/ 3

Wrap-up & Questions


Biggest Wins / Losses Reuse Tabula Rasa: MMP & SSP

1/ 3

Time (60 Minutes)

Biggest Wins
Presentation Layer Abstraction
NullView client Scripted playsessions: powerful for regression & load

Pre-Checkin Snifftest Load Testing Continual Usability Enhancements Team


Upper Management Commitment Focused Group, Senior Developers

Biggest Issues
Order Of Testing
MTBF / QA Test Suites should have come last Not relevant when early & game too unstable Find / Fix Lag: too distant from Development

Changing TSOs Development Process


Tool adoption was slow, unless mandated

Noise
Constant Flood Of Test Results Number of Game Defects, Testing Defects Non-Determinism / False Positives

Tabula Rasa

How Would I Start The Next Project?

Tabula Rasa

PreCheckin Sniff Test

Theres just no reason to let code break.

Tabula Rasa
PreCheckin SniffTest
Keep Mainline working

Hourly Monkey Tests


Useful baseline & keeps servers aging.

Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers
Keep Mainline working Baseline for Developers

Dedicated Tools Group


Continual usability enhancements adapted tools To meet on the ground conditions.

Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers Dedicated Tools Group
Keep Mainline working Baseline for Developers Easy to Use == Used

Executive Level Support


Mandates required to shift how entire teams operated.

Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers Dedicated Tools Group Executive Support
Keep Mainline working Baseline for Developers Easy to Use == Used Radical Shifts in Process

Load Test: Early & Often

Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers Dedicated Tools Group Executive Support Load Test: Early & Often
Keep Mainline working Baseline for Developers Easy to Use == Used Radical shifts in Process Break it before Live

Distribute Test Development & Ownership Across Full Team

Next Project: Basic Infrastructure


Control Harness For Clients & Components
Regression Engine
Reference Client
Reference Feature

Self Test
Living Doc

Building Features: NullView First


Reference Client
Control Harness
Reference Feature Self Test Living Doc

NullView Client
Regression Engine

Build The Tests With The Code


Control Harness
Regression Engine

Reference Client
Reference Feature

Self Test

NullView Client
Login Monkey Test

Nothing Gets Checked In Without A Working Monkey Test.

Conclusion
Estimated Impact on MMP: High
Sniff Test: kept developers working Load Test: IDd critical failures pre-launch Presentation Layer: scriptable play sessions

Cost To Implement: Medium


Much Lower for SSP Games

Repeatable, coordinated inputs @ scale and pre-checkin regression were very significant schedule accelerators.

Conclusion

Go For It

Talk Outline: Automated Testing


Design & Initial Implementation
1/ 3

Lessons Learned: Fielding

1/ 3

Wrap-up

Questions

1/ 3

Time (60 Minutes)