Sie sind auf Seite 1von 37

GPU Implementations of Online Track Finding Algorithms at PANDA

Mitglied der Helmholtz-Gemeinschaft

HK 57.2, DPG-Frhjahrstagung 2014, Frankfurt


21 March 2014, Andreas Herten (Institut fr Kernphysik, Forschungszentrum Jlich) for the PANDA Collaboration
1

PANDA The Experiment

Mitglied der Helmholtz-Gemeinschaft

13 m

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA The Experiment

Magnet STT

MVD

Mitglied der Helmholtz-Gemeinschaft

13 m

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Event Reconstruction


Triggerless read out
Many benchmark channels Background & signal similar

7/s Event Rate: 2 10

Raw Data Rate: 200 GB/s Reduce by ~1/1000


Mitglied der Helmholtz-Gemeinschaft

(Reject background events, save interesting physics events)

Disk Storage Space for Offline Analysis: 3 PB/y

PANDA Event Reconstruction


Triggerless read out
Many benchmark channels Background & signal similar

7/s Event Rate: 2 10

Raw Data Rate: 200 GB/s Reduce by ~1/1000


Mitglied der Helmholtz-Gemeinschaft

GPUs

(Reject background events, save interesting physics events)

Disk Storage Space for Offline Analysis: 3 PB/y

PANDA Tracking, Online Tracking

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Mitglied der Helmholtz-Gemeinschaft

Detector layers

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Mitglied der Helmholtz-Gemeinschaft

Detector layers

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Mitglied der Helmholtz-Gemeinschaft

Detector layers

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Mitglied der Helmholtz-Gemeinschaft

Detector layers

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Mitglied der Helmholtz-Gemeinschaft

Detector layers

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Mitglied der Helmholtz-Gemeinschaft

Detector layers

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Detector layers

PANDA

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Detector layers

PANDA

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Detector layers

PANDA

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

PANDA Tracking, Online Tracking


Usual HEP experiment

PANDA: No hardware-based trigger But computational intensive software trigger ! Online Tracking

Trigger

Detector layers

PANDA

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

GPUs @PANDA Online Tracking


Port tracking algorithms to GPU
Serial ! parallel C++ ! CUDA

Investigate suitability for online performance But also: Find & invent tracking algorithms Under investigation:
Hough Transformation Riemann Track Finder Triplet Finder

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

Algorithm: Hough Transform


Idea: Transform (x,y)i ! (,r)ij, find lines via (,r) space Solve rij line equation for
Lots of hits (x,y,)i and Many j ! [0,360) each

Hough Transform Princip

Fill histogram Extract track parameters


y y

Mitglied der Helmholtz-Gemeinschaft

Mitglied der Helmholtz-Gemeinschaft

! Bin giv

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

Algorithm: Hough Transform


Idea: Transform (x,y)i ! (,r)ij, find lines via (,r) space rij = cosj xi + sinj yi + i Solve rij line equation for
Lots of hits (x,y,)i and Many j ! [0,360) each

i: ~100 hits/event (STT) rij: 180 000 Hough Transform Princip j: every 0.2

Fill histogram Extract track parameters


y y

Mitglied der Helmholtz-Gemeinschaft

Mitglied der Helmholtz-Gemeinschaft

! Bin giv

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

Algorithm: Hough Transform

r Hough transformed

0.6 0.5 0.4 0.3 0.2 0.1 0

68 (x,y) 0 points
Entries Mean x Mean y RMS x RMS y 2.2356e+08 25 90 0.02905 51.96 0.1063 20

15

10

-0.1 -0.2
Mitglied der Helmholtz-Gemeinschaft

-0.3 -0.4 0 20 40 60 80 100 120 140 160 180 Angle / 0

PANDA STT+MVD
1800 x 1800 Grid
7

Algorithm: Hough Transform

r Hough transformed

0.6 0.5 0.4 0.3 0.2 0.1 0

68 (x,y) 0 points
Entries Mean x Mean y RMS x RMS y 2.2356e+08 25 90 0.02905 51.96 0.1063 20

15

10

-0.1 -0.2
Mitglied der Helmholtz-Gemeinschaft

-0.3 -0.4 0 20 40 60 80 100 120 140 160 180 Angle / 0

PANDA STT+MVD
1800 x 1800 Grid
7

Algorithm: Hough Transform


Two Implementations

Thrust
Performance: 3 ms/event
Independent of granularity Reduced to set of standard routines
Fast (uses Thrusts optimized algorithms) Inflexible (has its limits, hard to customize)

Plain CUDA Performance: 0.5 ms/event


Built completely for this task Fitting to every problem Customizable A bit more complicated at parts

No peakfinding included
Even possible? Adds to time!

Simple peakfinder implemented (threshold)

Using: Dynamic Parallelism, Shared Memory

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

Algorithm: Riemann Track Finder


Idea: Dont fit lines (in 2D), fit planes (in 3D)! Create seeds
All possible three hit combinations

Grow seeds to tracks Continuously test next hit if it fits


Use mapping to Riemann paraboloid

Summer student project (J. Timcheck)

z
Mitglied der Helmholtz-Gemeinschaft

x
x

x
x

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

Algorithm: Riemann Track Finder


GPU Optimization: Unfolding loops
for () {for () {for () {}}} int ijk = threadIdx.x + blockIdx.x * blockDim.x;

! 100 faster than CPU version

1 nLayerx = 8x + 1 1 2 p 3 3 243x2 1 + 27x 1 p pos(nLayerx ) = + 1 2 / 3 3 3 3 3 3 243x2 1 + 27x

Time for one event (Tesla K20X): ~0.6 ms


Mitglied der Helmholtz-Gemeinschaft

10

Algorithm: Triplet Finder


Idea: Use only sub-set of detector as seed
Combine 3 hits to Triplet Calculate circle from 3 Triplets (no fit)

Features
Tailored for PANDA Fast & robust algorithm, no t0

Ported to GPU together with NVIDIA Application Lab

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

11

Triplet Finder Time

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

12

Triplet Finder Optimizations


Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which fill up GPU best

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

13

Triplet Finder Optimizations


Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which fill up GPU best
Hit

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

13

Triplet Finder Optimizations


Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which fill up GPU best
Hit Event

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

13

Triplet Finder Optimizations


Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which fill up GPU best
Hit Event

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

13

Triplet Finder Optimizations


Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which fill up GPU best
Hit Event

Bunch

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

13

Triplet Finder Optimizations


Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which fill up GPU best
Hit Event

Bunch

!(N2) ! !(N)

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

13

Triplet Finder Bunching Performance

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

14

Triplet Finder Optimizations


Compare kernel launch strategies
Dynamic Parallelism
Triplet Finder CPU

Joined Kernel
Triplet Finder

Host Streams
Triplet Finder
stream/ 1 stream bunch 1 bunch 1 stream// bunch

thread/ 1 thread bunch /bunch thread/ bunch 11 Calling Calling Calling kernel kernel kernel

GPU

block/ 1 block/ bunch 1 bunch 1 block/bunch

Joined Joined Joined kernel kernel kernel

Combining Combining Calling stream stream stream


TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4

TF Stage #1 TF Stage #1
Mitglied der Helmholtz-Gemeinschaft

TF Stage #2 TF Stage #2 TF Stage #3 TF Stage #3 TF Stage #4 TF Stage #4


Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

15

Triplet Finder Kernel Launches

Preliminary (in publication)

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

16

Triplet Finder Clock Speed / Chipset

Preliminary (in publication)

K40 3004 MHz, 745 MHz / 875 MHz K20X 2600 MHz, 732 MHz / 784 MHz
Mitglied der Helmholtz-Gemeinschaft

Memory Clock

Core Clock

GPU Boost

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

17

Summary
Investigated different tracking algorithms
Best performance: 20 s/event ! Online Tracking a feasible technique for PANDA
Multi GPU system needed !(100) GPUs

Still much optimization necessary (efficiency) Collaboration with NVIDIA Application Lab

Mitglied der Helmholtz-Gemeinschaft

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

18

Summary
Investigated different tracking algorithms
Best performance: 20 s/event ! Online Tracking a feasible technique for PANDA
Multi GPU system needed !(100) GPUs

Still much optimization necessary (efficiency) Collaboration with NVIDIA Application Lab

! u o y k Than
Mitglied der Helmholtz-Gemeinschaft

rten Andreas He h.de c i l e u j z f @ n a.herte

Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

18

Das könnte Ihnen auch gefallen