Sie sind auf Seite 1von 82

Design and Construction of a K-dimensional LED Array

Nikhil J. Dixit Chirag D. Sakhuja Minesh H. Patel Professor Ramesh Yerraballi Spring 2012 Electrical and Computer Engineering, University of Texas, Austin, Texas

May 30, 2012

Abstract
Our team has designed a high side driver which, in its greatest generality, can be used to simultaneously power any possible array of numerous, high power devices while providing individual addressability within each dimension. The basic premise of our design lies on the division of any given k-dimensional array into j cross sections each containing i devices. Each element per cross section is then controlled individually by a low-side circuit while a high-side circuit delivers power to each cross section in turn at a specied rate. Our physical implementation consists of applying our driver design to a three dimensional array of 103 RGB LEDs. Our project boasts full individual 36-bit color addressability all at a refresh rate with a lower bound of 60 Hz. It can display both static images and animations, and is entirely controllable with a simple software interface.

A Note Concerning Convention


In order to be as thorough as possible in our analysis, we will adopt the following convention for the rest of the paper: k will be reserved to represent the number of dimensions of the array in consideration j will be reserved to represent the number of elements within the dimension of iteration (i.e. the number of frames required for one full refresh) i will be reserved to represent the number of elements within one cross section of iteration (i.e. the number of devices actively controlled within each of the above frames) Note: The product i j always represents the total number of elements in the array

Contents
1 Introduction 1.1 The Parts . . . . . . . . . . . . . . . . . RGB LED . . . . . . . . . . . . . . . . . TLC5940 . . . . . . . . . . . . . . . . . . Shift Register . . . . . . . . . . . . . . . Enhancement Mode P-channel MOSFET NOT Gate . . . . . . . . . . . . . . . . . Level Shifter . . . . . . . . . . . . . . . . Electret Condenser Microphone . . . . . Power Supply . . . . . . . . . . . . . . . Stellaris EKK-LM3S1968 . . . . . . . . . Decoder/Demultiplexer and AND Gate . 6 7 7 8 8 9 10 10 10 11 11 12 14 14 15 15 16 19 19 19 20 21 21 21 22 23 24 26 27 28 28 29 30 30

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

2 Designs 2.1 Final Design [j = 10, i = 100] . . . . . . . . . . . . . . . . . . . . The High Side of our LED Array Driver . . . . . . . . . . . . . . Description . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Low Side of our LED Array Driver . . . . . . . . . . . . . . . Description . . . . . . . . . . . . . . . . . . . . . . . . . . Operation of the TLC5940 . . . . . . . . . . . . . . . . . . Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . The General Form of the Solution . . . . . . . . . . . . . . The General High Side . . . . . . . . . . . . . . . . . . . . . . . . The General Low Side . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Preliminary Designs . . . . . . . . . . . . . . . . . . . . . . . . . The Global Bus: Reduction of Dimensions [j = 1000, i = 1] . . . . The Divergence of Paths: Trifurcation of the Bus [j = 1000, i = 1] TLC5940 Design 1 [j = 100, i = 10] . . . . . . . . . . . . . . . . . TLC5940 Design 2 [j = 100, i = 10] . . . . . . . . . . . . . . . . . TLC5940 Design 3 [j = 100, i = 10] . . . . . . . . . . . . . . . . . Premise of the Final Design . . . . . . . . . . . . . . . . . . . . . 2.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

2.4

300x 1 mm Diameter Acrylic Rod . . . . . . 6x Breadboard . . . . . . . . . . . . . . . . 6x Pre-Punched Blank Protoboards . . . . . 1x 28 Gauge Wire . . . . . . . . . . . . . . . 1x 28 Gauge Magnet Wire . . . . . . . . . . 1x Plexiglas Sheet . . . . . . . . . . . . . . . 1x Permanent Marker . . . . . . . . . . . . 1x Hot Glue Gun + Rells . . . . . . . . . . 1x Soldering Iron + Solder . . . . . . . . . . 1x Drill + Bit . . . . . . . . . . . . . . . . . The Construction . . . . . . . . . . . . . . . . . . . The Prototype . . . . . . . . . . . . . . . . The Final Array . . . . . . . . . . . . . . . . Casting of the Die . . . . . . . . . . . Anode Consolidation . . . . . . . . . Array Linkage . . . . . . . . . . . . . Power System Implementation . . . . Conclusion . . . . . . . . . . . . . . . Software . . . . . . . . . . . . . . . . . . . . . . . . Interfacing the TLC5940 . . . . . . . . . . . . . . . Logic Design of the TLC5940 . . . . . . . . General Algorithm Overview . . . . . . . . . Final Algorithm Overview . . . . . . . . . . Driver Implementation . . . . . . . . . . . . Environment . . . . . . . . . . . . . . Tools . . . . . . . . . . . . . . . . . . A Word on Generality . . . . . . . . Connections . . . . . . . . . . . . . . Writing the TLC5940 Driver . . . . . TLCSetup . . . . . . . . . . . . . User Driver Interface . . . . . . . TLCSetLED . . . . . . . . . . . TLCInterrupt . . . . . . . . . . . The Final Code . . . . . . . . . . . . . . . . The Refresh Rate . . . . . . . . . . . . . . . Benchmarking with the FFT . . . . . . . . . . . . . Introduction to the Cooley-Tukey Algorithm Breakdown of Code . . . . . . . . . . . . . . FFT . . . . . . . . . . . . . . . . . . FFT Calculate . . . . . . . . . . . . . The Final Code . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 31 32 32 32 33 33 33 34 34 34 34 35 35 36 37 38 39 39 39 40 41 44 48 48 48 49 49 49 52 57 57 58 62 68 69 70 71 72 73 73

3 Extensions 3.1 Applications . . . . . . . . . . . . . . . . . . . Applications of the RGB LED Array . . . . . Applications of the Driver . . . . . . . . . . . The Choice of a Dimension Independent Array 3.2 Conclusion . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

76 76 76 77 78 78

List of Figures
1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7 The truth table for the NOT function. . . . . . . . . . . . . . . . . . The amplication circuit of the Electret Condenser Microphone. . . . The truth table for the AND function. . . . . . . . . . . . . . . . . . The schematic of our solution to the array driver problem. . . . . . . The current going through one LED during a single full refresh. The LED is on for a tenth of the cycle. . . . . . . . . . . . . . . . . . . . . The progression of our designs. . . . . . . . . . . . . . . . . . . . . . A summary of the entire design process. . . . . . . . . . . . . . . . . Programming ow chart provided by TI to interface the TLC5940. [4] Note that the middle wave perfectly resembles a clock signal. . . . . . An SPI timing diagram (MOSI stands for master out slave in, which is equivalent to the SIN line). Note how the SCLK pulses are perfectly aligned with the data pulses. [3] . . . . . . . . . . . . . . . . . . . . . An input edge count mode progression diagram specic to the Stellaris LM3S1968. Note that on each rising and falling edge the count is decremented until it matches the target of 0x0006. [6] . . . . . . . . . Process chart of the improved algorithm. Note the dramatic shift from software to hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . Internal layout of colorArray. All numbers in the top right box are indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data ows between multiple interrupts. The order of variable and pin changes is crucial. Also note that the rst layer is given undened data during the very rst refresh cycle, an acceptable side eect of this iteration scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The interrupt essentially stalls the CPU for the user and thus takes time away from the main thread. . . . . . . . . . . . . . . . . . . . . The Sf ree vs fupdate line for our 10x10x10 LED array. . . . . . . . . . The symmetry that the Cooley-Tukey algorithm takes advantage of. [1] 10 11 13 14 17 24 30 42 44

45

2.8

46 48 58

2.9 2.10 2.11

61 68 69 71

2.12 2.13 2.14

Chapter 1 Introduction
An RGB LED array is an evenly spaced matrix of tri-color LEDs, each of whose color channels can be independently controlled using a simple interface. It serves as a visualizer and so can be used to display both static images and real time animations with a resolution inversely proportional to the density of the array. Our goal throughout this project is to design and construct a three dimensional array of 103 RGB LEDs. Our ideas are motivated by the simple fact that this system is nothing less than a fully three dimensional visualizer; our objective is to have sucient resolution so as to be able to display any arbitrary, full-color animation as required. In order to formulate our design, we begin by abstracting our project goals into a more general problem. In this way, we can state that our essential aim is to design a power system capable of driving a k-dimensional array of objects that each require considerable power to operate (such as LEDs or motors as opposed to, for example, an LCD screen, which has a large number of very low power elements). We can then proceed to divide our overall goal into a number of unique criteria. 1. It will be necessary to have an overall refresh rate of greater than 60Hz. This is essential in order for the nal product to display animations smoothly to the human eye. Note that this implies that each iteration must be done at at least 60 j Hz. 2. There must be sucient free microprocessor clock time between refreshes to determine the nature of the next frame. We predict that it will be necessary for greater than 75% of the total clock cycles to be available for computation, as each update requires computation of the states of each i j elements, 1000 in our situation. This will be especially important if the data for the next frame is not simply being fetched from memory but rather being computed on the y. As the latter is our eventual goal, this goal is of great importance. 3. Each LED must be individually color-addressable within each frame of animation. Furthermore, the color addressability must have a small enough resolution to mask the discrete nature of transitions between adjacent RGB values. This will allow for any animations involving hue sweeps to seem continuous. 6

4. The apparent brightness of any single LED through only one frame of animation must be comparable to its true continuous maximum brightness. As an extension to this, our power system must be robust enough to be able to simultaneously drive the entire arrays worth of LEDs continuously at the same brightness that it would power a single LED. This will ensure that the apparent brightness of any given LED at any given time is uniform, entirely independent of the number of active LEDs at that point in time. 5. Our driving system must be upscalable, ideally to any scale and dimension. Consider, for example, a system consisting of a 3D array of three-tier components (i.e. RGB LEDs), each of which must be controlled independently. This system is eectively a ve dimensional arrangement which must be as easily accommodated by our design as would a simple array of one-tier components. Thus, our objectives have been broadened and clearly specied in the aforementioned requirements. Our task, then, is to create a design which satises each of the goals in as neat and elegant a manner as possible. We now proceed to describe the process by which we solved this problem.

1.1

The Parts

RGB LED
The project uses 1000 four pin, common anode RGB LEDs. Each LED has a single pin for its anode and three pins as the cathodes for the corresponding color channels (R, G, and B), which are actually smaller individual LEDs within the larger unit. Since current ows from anode to cathode, the LED can only be eectively controlled by a current sink. This layout suits our design because selectively sinking current is a much easier task than selectively sourcing current. Furthermore, this type of LED allows the design to eortlessly increase the potential across the diode if necessary, which is not so simple with a common cathode LED. As stated, to control the color of the LED the current through each cathode must be changed independently of all others. To accomplish both independence and precise control, the design calls for a PWM sink. Feeding each LED cathode into one sink (so three sinks per RGB LED are necessary) and controlling the PWM duty cycle on the sinking IC using a microcontroller gives maximum precision and reliability since the actual microcontroller is protected by the current sink from high voltages on its GPIO pins. The specic LEDs used for this project were manufactured by Shen Zhen Hanhua Opto [2], and are 5 mm in diameter with diused lense. We chose 5 mm LEDs because they are a suitable blend between size and brightness (3 mm LEDs are too dim and 10 mm LEDs are so large they obstruct the LEDs behind them). Diused lenses were chosen because they allow the individual LEDs within the larger package to blend together and give a much better illusion of being a single color. This is especially important when trying to represent non-primary colors such as purple, yellow, orange 7

and white. As for power, the LEDs are rated at a maximum continuous current of 60 mA, meaning 20 mA per cathode, which is the output goal for the design.

TLC5940
The TLC5940 is a multi-channel low side LED driver designed by Texas Instruments. It is a current sinking IC capable of controlling 16 dierent currents per chip using pulse width modulation and is our choice for the PWM sink called for by the design for a few reasons. The TLC5940 is relatively simple to drive, requiring at its core only two clock lines and two data lines. It is highly expandable, allowing for the chaining of multiple ICs to increase the number of sinks. Most importantly it is also very precise, due to its 12 bit grayscale controller which has 4096 steps. As a bonus, the TLC5940 limits the maximum current through each pin via an external resistor, which can be chosen depending on the necessary maximum throughput allowing for near-perfect brightness scaling. All these facets combined with its low price make it an attractive option for our needs. However there are a few limits to the TLC5940 that need to be overcome if a very large scale project is undertaken (well beyond the scope of our array). The most pressing issue is the upper bound on clocks. The TLC5940 places a hardware limit of 30 MHz on the two clocks it needs, the grayscale clock and the data clock. The problem is that as the number of elements grows by n, the data to be sent increases by 12 n, which creates a steep slope for the data clock to overcome for large amounts of elements. At scales of ten thousand elements or more, a mere 30 MHz data clock cannot keep up and the refresh rate suers dramatically. Another issue is the TLC5940s voltage and current limits. Designed to be a low power chip, the TLC5940 cannot handle more than a 17 V potential on any of its pins without faulting, meaning any devices that require more than the specied amount of voltage to operate will need an alternative sink. Furthermore, for the same reasons the TLC5940 cannot sink more than 120 mA continuous per pin, which can become prohibitive for a very large number of even small powered devices sunk into a single pin. There are no ocial specications on the maximum pulsed current or voltage per pin but we reasonably expect those to reect their continuous counterparts.

Shift Register
The project requires a serial-in parallel-out shift register to serve as a layer selection mechanism. In essence, a SIPO shift register is made up of any number of master slave ip ops connected together with each ip op also having an output pin that is available on the register. The register thus takes serial input and makes the rst n bits of it available through parallel outs. How exactly this is useful will be explained in Section 2.1. The specic shift registers used in this project are Texas Instruments CD74HC164E, which have eight parallel outs per IC. These registers were chosen because they have a very simple interface, requiring only three GPIO pins to drive, and like the TLC5940s allow for expansion by chaining together multiple chips (note that is a property of 8

all SIPO shift registers). The CD74HC164E works by taking in a data line, a clock line, and a master reset line. When the clock line is transitioned from low to high, the bit present on the data line is moved into the rst output Q0 , and simultaneously each bit present on Q0 - Qn is shifted over into Q1 - Qn (the previous Qn bit is lost). When the master reset line is driven low all outputs are cleared to 0 simultaneously, which is much faster than serially pushing a low signal n times.

Enhancement Mode P-channel MOSFET


As a project that deals with pulsed currents on the order of tens of amps, a high voltage external power source is a necessity. However, a microcontroller cannot interface with any voltage higher than its base voltage (usually 5 V) without taking permanent damage, and so a transistor is required to interface between the two voltages. Our array utilizes transistors as both switches and power sources to drive the LEDs in each layer. There are multiple distinctions within the transistor family, the foremost being the divide between bipolar junction transistors and metal oxide eld eect transistors. At a very basic level, both devices control the current at their outputs (naming conventions dier between types of transistor), but the fundamental dierence (at least for our purposes) lies in their control mechanisms. BJTs are current-controlled, meaning that the output current Iout is proportional to the current owing out of the base Ibase by some factor . MOSFETs on the other hand are voltage-controlled, translating a voltage on their gate to a resistance between their input and output terminals. MOSFETs are better suited for our purposes for a few reasons. One, being voltage controlled, it is much easier to interface them than BJTs since all the other logic in the project is also voltage based. Also, MOSFETs are better suited to be pure switches than BJTs which are better used for amplication. It is much easier to saturate a MOSFET and get it to act like a wire than it is to fully activate a BJT because there are no considerations to make on things such as resistor size and pull up/down necessities, which leads us to our nal advantage: unlike BJTs, MOSFET gates are by function isolated from the source and drain, meaning no current can ever backow from the gate into the microcontroller and damage it, a valuable property to have when dealing with high voltage power sources. Within both classes of transistor there are two further division dealing with primary charge carriers and method of operation. For MOSFETs the rst segmentation occurs between enhancement type FETs versus depletion type FETs. Enhancement type FETs require a gate-source voltage to turn the device on, whereas depletion type FETs require the gate-source voltage to turn the device o. Though both can be used for our purposes, enhancement type FETs are used because they are far more readily available. The second and more important division is between N-channel (primary carriers are electrons) and P-channel (primary carriers are holes). Though there is nothing both transistors cannot do on their own, P-channel transistors are used in the array because by nature they facilitate high side switching, meaning they can be connected directly to the power source and be used as switches. This is possible but not so trivial with N-channel MOSFETs. 9

The transistors used in the LED array are International Rectier IRF9540Ns, which are enhancement mode P-channel power MOSFETs. These were chosen because they are a match for our design purposes, are fairly inexpensive, and because they have a very high carrying capacity, with a continuous drain current limit of 23 A and a pulsed current limit of 76 A, which guarantees that the LED array will receive ample power.

NOT Gate
NOT gates simply invert their input signals and make the inverted signals available as outputs. These logic components are required in the nal design for proper interfacing of the MOSFETS. The specic ICs used in the project are the NXP 74F04 hex inverters. a 0 1 NOT a 1 0

Figure 1.1: The truth table for the NOT function.

Level Shifter
There are two main binary logic levels the LED array must contend with. All ICs previously mentioned use TTL voltage, which ranges from 0 V to 5 V (anything above 2 V is considered high). However, the MOSFETs used to gate the layers may require voltages much higher than 5 V. As stated in the transistor subsection, for P-channel MOSFETs to shut o, the gate-source voltage must be close to 0 V. This poses a problem because if the transistors source is connected to a power supply with a voltage even marginally higher than 5 V, the gate-source voltage will never be 0 V. This will result in the transistors being permanently on. To remedy this, the TTL logic voltage coming from the selection ICs must be upscaled to the voltage of the power supply, which is accomplished by a level shifter. The level shifters used for the array are Texas Instruments CD4504Bs, each of which is capable of shifting six input signals. The ICs take two supply lines, VCC and VDD , with VCC being its supply voltage and VDD the voltage of the power source. The chip then scales each input from TTL voltage to the level dened by VDD . Note that though the same eect can be achieved using pull up resistors and buers, we prefer to use a dedicated level shifter to do this conversion because it is far more robust and ecient.

Electret Condenser Microphone


A standard electret condenser microphone will be used as an analog sound-to-voltage transducer. The microphone will have its own standard amplication circuit in or-

10

der to scale up the microvoltages it produces to the 5 V necessary for input to the microcontroller. The circuit diagram for the input amplier is as shown in Figure 1.2.

Figure 1.2: The amplication circuit of the Electret Condenser Microphone.

Power Supply
The LED array, due to its large size and high power components, requires a massive current source to achieve full brightness. Though more detailed calculations will be made in Section 2.1, the required current is on the order of 60 A. To provide such power (on the order of 600 W), we utilize a relatively high end computer power supply unit. The specic power supply used is the OCZ ZT Series 750W Fully-Modular 80PLUS Bronze High Performance Power Supply. At 12 V this power supply is capable of delivering a continuous current of 63 A, which suces for the array. The PSU is also relatively inexpensive.

Stellaris EKK-LM3S1968
The microcontroller used to command the circuit is the Stellaris LM3S1968. The LM3S1968 is a high performance microcontroller which has a multitude of GPIO pins as well as many important features and qualities that make it an ideal choice for our LED array. 11

One of the primary considerations made by our team when choosing a microcontroller was the core clock speed. Though a higher clock generally gives greater performance in terms of raw MIPS, the main impact for our project of a higher clock speed is a higher data clock rate. As analyzed in Section 1.1, the refresh rate of the LED array is directly proportional to the speed of the data clock, which itself is related to the core clock speed with: fmax = E 2 (1.1)

where fmax is the maximum data clock speed and E is the maximum core clock speed. The LM3S1968 has a maximum core clock speed of 50 MHz, which means it can transmit data at 25 MHz, approaching the physical limit imposed by the TLC5940, and thus can extract most of the performance that is oered by the chip. Another capability required by the TLC5940 is a PWM generator. Recall that the TLC5940, alongside its data clock, requires a grayscale clock to operate. On top of that, the design mandates that the clock be variable across a wide range of frequencies to allow for refresh rate scaling. The Stellaris meets this criteria with ease with its ability to generate precise PWM waveforms with arbitrary lengths which can be made to exactly model the waveform of a normal clock signal. A more application specic but still vital feature of the LM3S1968 is the hardware edge counter. As will be explained in Section 2.1, the ability of the LM3S1968 to accurately count the pulses of a waveform, as well as throw interrupts on when a specied number of pulses are counted is essential for proper interfacing with the TLC5940. On the software side, an important component of the LM3S1968 is the existence of a oating point coprocessor as well as compatibility with the C standard library. Having a oating point datatype greatly increases the accuracy of any dynamic animations we may simulate. Algorithms such as the fast fourier transform and the Runge-Kutta time integration scheme benet greatly both in accuracy and speed by utilizing a oating point datatype. Access to the C standard library further simplies and accelerates the code allowing for greater performance.

Decoder/Demultiplexer and AND Gate


A 4 to 16 active high decoder/demultiplexer and multiple AND Gates are used in some of the preliminary designs, though they are discarded in the nal construction. A decoder is a device which, as its name suggests, decodes information present on its input lines. Thus, a 4 to 16 decoder takes 4 inputs, generates a number n corresponding to the binary value indicated by the inputs, and then activates its nth output. The decoders used in the preliminary designs are active low, meaning that the selected output will be forced to ground while all others will be set to high. An AND gate simply performs a logical AND on two inputs, and correspondingly sets the outputs.

12

a b a AND b 0 0 0 0 1 0 1 0 0 1 1 1 Figure 1.3: The truth table for the AND function.

13

Chapter 2 Designs
2.1 Final Design [j = 10, i = 100]

Figure 2.1: The schematic of our solution to the array driver problem. In its most generality, the solution entails successively activating every cross section of any single dimension of the given array. Every corresponding element in each cross section is connected together on the low side. This provides for i connections on the low side of the driver, each of which is simultaneously current-controlled via its own PWM sink. This arrangement eectively allows for fully individual addressability of every single element of the array. 14

Said another way, whereas our preliminary designs had been focused on attempting to minimize i, so as to economize on the required quantity of PWM sinks, the key to the entire problem lies in minimizing j. Realizing that this is the basis of our problem with the LED array, we have designed the high side to iterate only through a single dimension, reducing j to its bare minimum. Note that zeroing of the quantity of dimensions of iteration would defeat the purpose of the project entirely; the idea is to power an array of elements too large to control as an entire whole. Not having to iterate at all represents a trivial solution and thus will not be considered for the purposes of our project. The nal design for our LED array, then, conceptually represents the array as ten slices of 100 LEDs each arranged in vertical layers. Each layers LEDs are connected together by their common anodes, which are in turn connected to ten individual transistor switches, one per layer. The refresh process iteratively activates a single layer at a time, while at the same time independently modulating the current through each cathode in the layer (three per LED) to control its color. The control logic is essentially divided into two sections, high side and low side referring to the location of parts relative to LED anodes.The following in depth description of our LED arrays driver seeks to explain by specic example (that of our LED array) the general nature of our solution to this problem.

The High Side of our LED Array Driver


Description The high side of our design consists of a microcontroller controlling a single 10-bit serial-in parallel-out shift register. This shift register has, at any one time, only a single output high; all others are low. Iteration proceeds during a single frame by feeding into the shift register further low inputs, thus advancing the high output through the entire register. After the high output is lost to the other end of the register, another is seeded into the entry point signifying a new frame, and iteration begins anew. Each output of the register is connected to its own input on a level shifter via a NOT gate. The level shifter eectively takes the inverted TTL voltage of the shift register and raises it to the high voltage we will be using for the transistor. The level shifter is fed with the power input of an external source, which it then sets on an output pin if the corresponding input pin is high at the TTL level. The output of the level shifter is then connected to its own P-channel enhancement mode MOSFET. Each transistor has its source connected to the same common high voltage reference as the level shifter and its drain connected to its own layer of the LED array. By this we mean that, for any given horizontal layer of the array, every member of that layer has its common anode connected to the drain of the MOSFET. It should be noted that the inverter at the terminal of each shift register output is very necessary. This is due to the fact that while the shift register selects a single one of its outputs to be high, the transistor interprets a high signal at its gate as a closed switch. This is a nuance of having to use a P-Channel MOSFET, and is, as 15

seen above, easily overcome by inverting the input signals. In this way, the selected output of the shift register will in turn select a layer to be on while maintaining all other layers o rather than performing the inverse function, which would cause catastrophic levels of power draw from our external supply and present grave danger to our power system circuit overall. This completes the description of the high side of the LED array driver. Analysis The entire purpose of the high side of the design is to isolate a single cross section of the dimension of iteration at any time and cycle through them all to produce one frame of animation. In this instance, the dimension of iteration can be thought of as the vertical axis and thus contains exactly 10 elements. Then, each cross section of this dimension is a at, horizontal layer of the array containing exactly 100 elements. The high side of the driver ensures that at any point in time, only one layer (100 LEDs) have their anodes high at a time. That is, only one layer of the array can be active at once. It must be stressed that this method then requires that only one pin on the shift register is active at any time, else multiple layers would be active simultaneously. From an overview standpoint, the eect of using shift registers as opposed to a decoder system to select each layer is no better. In fact, it could be argued that this is a degradation in addressability, as one can no longer arbitrarily select a layer out of the array; instead, iteration must proceed sequentially as the shift registers are advanced one by one. However, this train of thought is misleading, as the benets gained from using shift registers far outweigh their limitations. Consider a single refresh of the array. In a worst case situation, every single element of the array must be updated, requiring one to iterate through each and every layer. Since refreshes proceed at 60 Hz, the order of updates makes no dierence to the human eye, and one may as well apply Occams Razor and proceed in an intuitive linear pattern. Considering also that out-of-order updating requires the extra eort of keeping track of which layers have already been updated and which require updating within the current frame, it makes the most sense to proceed in a very linear way that does not require such logging. On a hardware level, implementation of the shift registers also saves many clock cycles due to their simple operation. The entire control of the iteration is done with a total of three pins, and the only action necessary to advance iteration is the pulsing of a GPIO pin. Furthermore, shift registers signicantly cut down on the signal propagation delay from the microcontroller to the switching transistor. In a decoder system, the signal must propagate through many transistors on its way through the two integrated circuits, but with a shift register, the signal simply propagates from one ip op to the next. In fact, according to the the CD74HC164E data sheet [5], signal delay using shift registers instead of decoders drops a full order of magnitude. This entire high side system is advantageous for a number of reasons. First and foremost, it puts little pressure on the microcontroller itself. Necessitating a massive clock speed (> 20 MHz) greatly limits the pool of suitable microcontrollers one can 16

use to run the driver with, and it is for this reason that minimization of j is essential. As we will be using the Stellaris EKK-LM3S1968 microcontroller, we have an upper limit of 50 MHz on our microcontroller. Second, the entire iteration scheme is done very eciently, with very few unproductive delays during the process. Every LED is activated practically instantaneously as soon as current is allowed to ow through it. This is done using the shift register acting through an inverter and a level shifter. Since these are simple transistor devices, their total propagation delays do not amount to anything considerable. The selection process as a whole is about as ecient as it can be, and so we are condent in judging it to be optimised to its full extent. Third, this scheme puts as little pressure on the power source as could be hoped for. Specically, its maximum instantaneous current sourcing requirements are comparable to what iteration through the entire array as a whole would require. The following calculations serve to demonstrate the exact theoretical current that would be needed from the power source. Note that the calculations are done in the worst possible situation, assuming that every element of the entire cross section currently chosen is active simultaneously at maximum brightness on all three color channels (bright white). In our case, we are interested in determining the maximum current draw for a single entire layer because only a layer can be active at any given time. The time spent on one layer during one refresh, tactive , is found by: tactive = 1 j fref resh = 1 = 0.001667 s 10 60 (2.1)

This means that the I vs t curve for a single layer through one full refresh of the array looks like the following. Note that both axes are in arbitrary units with a t of 1.0 representing one full refresh cycle.

Figure 2.2: The current going through one LED during a single full refresh. The LED is on for a tenth of the cycle. We must ensure that for the amount of time that the layer is active, each LED is fed its maximum current of 60 mA. This means that the above curve must have a 17

time average of 60 mA over the arbitrary interval shown. The required amplitude of the spike, then, can be calculated as follows: 1 tmax tmin
1 60

tmax

Average Value =

f (t) dt
tmin

(2.2) 1 600 dt (2.3) (2.4)

1 = 1 0 60 Iinst = 10

Iinst H(t) H t

Where H(t) is the Heaviside Theta function, more commonly known in the eld of electrical engineering as the unit step function. Thus, the time-averaged current passing through the LED throughout the frame of iteration has been shown to be exactly one-tenth of the instantaneous current through the LED during its time active. From this, we can set the constraint that the total time-average current should be the Imax = 60 mA of the LED and calculate the total current draw required for a fully active layer. Iinst 10

Imax = 60 mA = Iinst = 600 mA

(2.5) (2.6) (2.7) (2.8)

Itotal = Iinst 100 = 60 A

Thus, we have arrived at the conclusion that the power source needs to be able to source at most 60 Amperes instantaneously to power the array as a whole to maximum brightness. It must be said that this horrifying quantity is absolutely unavoidable. It is an eect of constructing such a large array; no matter how power is distributed. The required amount of continuous power needed is calculated below. 1000 60 mA = 60 A (2.9)

In fact, this is a way of ensuring that the calculations have been done correctly, as their sole purpose is to determine the time relative distribution of the total current required to power each LED to maximum brightness. There is no way to decrease this quantity other than lowering the number of LEDs in the array. Needless to say, this would be contradictory to the original design objectives, and as such, will not be considered in the solution process. It should be noted here that at maximum power, the array is nothing short of blinding, and thus will never truly be operated continuously at this power. Rather, it will be operated with a lower total current demand than is shown here. This is not to say, however, our selected power source cannot cope with these demands; it is possible for our source to provide up to 62 Amperes of continuous power. The damage, then, will come in the form of overheating components and the total electricity bill. 18

The former can be remedied with such techniques as fanning, heatsinking, and liquid cooling. The latter, however, is a necessary expense of the array. Secondly, though this system requires for such massive currents to be sourced, each LED itself only experiences a total of 600 mA. This is exceedingly good, as the LEDs themselves can only sustain a certain maximum instantaneous current, well above this value. In comparison, consider the scheme of iterating singly through each element in the array. This would require for the full instantaneous current of 60 Amps to go through each LED, an unreasonable number entirely, sure to blow out each and every component and its wires.

The Low Side of our LED Array Driver


Description The low side of the design consists of collectively feeding every corresponding cathode within each cross section of iteration (here, every cathode in each column) into its own PWM sink. Note that in total, this calls for 300 independent PWM sinks, 3 per column for each color channel times 100 columns. It is here that the Texas Instruments TLC5940 IC comes into play due solely to its convenience in situations requiring a large quantity of PWM sinks: every IC contains 16 PWM sinks, each of which can be used independently of the others. See Section 1.1 for details. Thus, our LED array requires a total of 19 TLC5940 chips, an easily manageable quantity due to their simplistic chainability. Operation of the TLC5940 As stated in Section 1.1, the TLC5940 allows each of its output pins to act as a PWM sink with a resolution of 12-bits. The other relevant aspects of the IC that have been overlooked up until this point will now be discussed; for a full analysis of its inner workings and for the specics as to how we control its functionality, see Section 1.1. The implementation of the TLC5940 as a PWM sink also has a slightly more subtle advantage: that of the IREF. This single pin input requires a resistor bridging it to ground voltage. The choice of resistor used is crucial as its value is used as a reference for the maximum I to be sunk on each output pin of the IC. The resistance is translated linearly into a current value, allowing an extremely simple plug-and-play overall brightness control. This is an ideal way to control the overall brightness, as a simple turn of a potentiometer could control the entire array. A second, less obvious, bonus of an IREF pin is an indirect method of debugging the eectiveness of our external power source. If IREF is lowered, for example, then the expected outcome is that the overall brightness of the active LED array will be increased in proportion to the decrease in resistance. However, if this does not occur, or occurs less than proportionally, we can be sure that the power source has reached its sourcing capabilities and cannot provide any greater instantaneous current. On the other hand, if the brightness is easily controlled by varying the value of IREF, we know that the power source has not yet reached its ceiling.

19

Another important point about implementation of this IC is the use of serial data as input for the PWM. Each PWM pin requires a total of 12 bits of input data, amounting to 192 bits per TLC5940. Thus, every PWM cycle, 192 bits of data must be sent in serially per TLC5940. This results in an unfortunate limiting behaviour, as the TLC5940 has a hardware maximum of 30 MHz for its serial clock frequency. The free time available as a function of the number of TLC5940s that we employ is shown in the following calculations. Note that the original formula used to begin with is derived later in the document. For its derivation, see Equation 2.22 in The Refresh Rate subsection of the paper . Assuming a target refresh rate of 60 Hz, a SCLK frequency of 25 MHz, and 10 layers: Sf ree = 1 0.004608 j Where Sf ree = Proportion of free time available for computation = The number of TLC5940 ICs in operation j = The number of elements in the dimension of iteration We can see that as the number of TLCs grows, the free time available decays and rapidly approaches zero. 0 = 1 0.0004608 j 1 = 0.0004608 j 2170.14 217 (assuming j = 10) = j (2.11) (2.12) (2.13) (2.10)

Thus, if the number of TLC5940 ICs grows past the boxed quantity, there is essentially no free time left for computation, and no frames can be displayed on the array at all. This provides an upper limit to the number of PWM sinks we can have, so while using the TLC5940 ICs ,i is limited to a total of 16 217 = 3472 individual, single-cathode elements. Note that this number is astronomical and at best, an idealisation. It will not even be approached in our project. A disadvantage of the implementation of the TLC5940, then, is an upper bound to the number of elements allowed in one cross section of iteration. It must be noted, however, that this number is enormous, and can be circumvented in a number of ways. For example, j can be increased to cover two dimensions worth of elements, allowing for use of the TLC5940 at the expense of iteration frequency. Or, if desired, the TLC5940 can be discarded in favour of a more appropriate PWM sink. Analysis The purpose of the low side of our driver is to provide a method by which to allow individual addressability and color control of the arrays elements. A simple on/o 20

scheme is insucient for elements such as our RGB LEDs or devices such as motors, which require ner current control in order to operate to their full capabilities. Thus, the intent of the low side is to allow each element to have its own current controlled sink which can be controlled in a relatively simple manner, so that through each full frame of animation, each and every element can be updated individually. This current control is implemented through pulse-width modulation, a widely used method to limit the current passing through a device. PWM is the most robust method of current control and our choice here due to the ease with which it allows precise control. The brightness of each LED color channel can thus be simply represented by a straightforward 12-bit value. Not only does this abstract the hardware-level implementation of RGB LEDs, but it also makes it far easier to write animation data on a software level. Each frame then essentially consists of a thousand 12-bit values. The General Form of the Solution Before discussing why this design is a complete solution to the original problem, we would do well to consider the array driver in its greatest generality. This allows us to describe its full potential rather than limiting its applicability to simply an LED array. Therefore, in its broadest terms, the array driver can once again be described in terms of its high and low sides, referring to the location of parts relative to the arrays elements. Note: we will be extensively referencing the i, j, k convention developed in the introduction throughout the rest of this section.

The General High Side


The high side of the design essentially controls iteration through any single dimension of the given array. This means that it cycles through the j cross sections of that dimension repeatedly, activating all i elements of each cross section at a time. This is accomplished by using a shift register of minimum length j, which is easily accomplished even if j >> 10 by chaining multiple shift registers of basis length. The shift register functions exactly as described in the above explanations, repeatedly pushing a single high value through while maintaining all other outputs low. Each shift register output is gated via a NOT gate and a level shifter I/O pair to its own P-channel enhancement mode MOSFET which acts as a switch between the high voltage external power supply and the anodes of every element that is a member of the iteration cross section to which that transistor belongs. In this way, each transistor is responsible for the other k 1 dimensions of the array at once. Furthermore, the high voltage supply will be responsible for powering up to i elements simultaneously.

The General Low Side


While the high side mediates power delivery, the low side of the design implements power control. Conceptually, this involves providing each of the i elements with 21

its own current control mechanism. As we have seen, this is best implemented using pulse width modulation, and so the low side requires i, independent PWM sinks. This can be accomplished in any way; our specic example enlisted use of the TLC5940 simply due to its convenience for our purposes. Indeed, any low-side current-control mechanism would work equally well.

Conclusion
In summary, it can be said that our solution to the array driver problem tackles it by reducing the k-dimensional problem to one of only two dimensions. These consist of the high side dimension and the low side dimension, which can be thought of as the dimension of iteration and the dimension of power control, respectively. In analogy, the high side dimension is controlled serially while the low side dimension is taken care of in parallel. Referring back to the design parameters, this design is a full and thorough solution to the problem because it satises each of our ve original criteria in a neat and complete manner: 1. It will be necessary to have an overall refresh rate of greater than 60Hz. This is accomplished easily by iterating through only a single dimension of the array. At most, the iteration proceeds through 60 j elements, and as the earlier calculations showed, a refresh rate of 60 Hz is easy. Even in reasonable upscaling, this particular criterion is in little danger, as one would choose the smallest dimension of the array to iterate through. In theory, dimension sizes of orders of magnitude greater than our 10 could be supported. 2. There must be sucient free microprocessor clock time between refreshes to determine the nature of the next frame. This has been veried already by our calculations in Equation 2.22. 3. Each LED must be individually color-addressable within each frame of animation. Furthermore, the color addressability must have a small enough resolution to mask the discrete nature of transitions between adjacent RGB values. This is readily apparent in our design. As iteration proceeds, each single LED has its own dedicated PWM-controlled sink, and so each LED is therefore individually color-addressable whenever it is iterated through. To address the latter concern, the PWM has a resolution of 12 bits, providing for 4096 distinct color states. Since the standard RGB color space of 8 bits per channel is considered to be suciently continuous, ours will more than suce to the judgement of the human eye. 4. The apparent brightness of any single LED through only one frame of animation must be comparable to its true continuous maximum brightness. As an extension to this, our power system must be robust enough to be able to simultaneously drive the entire arrays worth of LEDs continuously at the same brightness that 22

it would power a single LED. This will ensure that the apparent brightness of any given LED at any given time is uniform, entirely independent of the number of active LEDs at that point in time. This is all achieved due to our use of the TLC5940. Though not a direct reason to choose this IC over other methods of mass-PWM, it nevertheless accomplishes brightness control in an almost ideal way. The IREF channel on the IC ensures that each output pin, independent of all others, sinks an exact amount of current. This means that regardless of the situation of the rest of the array at the time, each element which is being powered will be current-controlled to a precisely determined value, and thus its brightness will be similarly determined. Here we introduce a qualm, namely that of the power source being able to supply the required current. This is ensured by employing transistor gates switching power from an external supply rather than attempting to source the current from a microcontroller. The external supply can easily be upscaled if required, and thus this design criterion is entirely satised by our design. 5. The driving system must be upscalable, ideally to any scale and dimension. The design is fully upscalable on both the high and low sides. The simplest way to prove this is to consider the array driver solution as a reduction of dimensions from k to 2. Since k is an arbitrary value, this driver is dimensionally independent. No matter how many dimensions the given array consists of, our solution treats it as a single one on the high side and as k 1 dimensions on the low side. Furthermore, the driver is, for all intents and purposes, fully independent of the size of each dimension, within reason. As we have already touched on above, the high side dimension is already orders of magnitude underpowered to begin with and can be increased signicantly before presenting problems. The low side dimensions can also be similarly upscaled, as that simply requires more PWM sinks. Eventually the TLC5940s hardware limitations would begin causing issues, but since they were simply used for their convenience, another method of low side current control could be used to easily replace them.

2.2

Preliminary Designs

Here, we seek to explain the thought processes which led us to our nal design. Many of the intricacies presented in the nal design may not seem to be directly obvious, and so this section attempts to explain why we ended up with the design we did. We do so in a chronological manner, starting with the very rst considerations we had and then proceeding to the designs we considered, in the order in which we considered them. Our discussion begins with a decision: the choice of topological space to work in. There are many dierent geometries in which the array could have been considered, 23

each with its own pros and cons. As it turns out, discretization of the standard dimensions in toroidal spaces, spherical spaces, and other such curvilinear geometries become unfeasible to implement due to the large inaccuracies in representing curves. For example, consider the central angular dimension in a bipolar toroidal coordinate system. In order display the standard reference circle, you would have a maximum of 10 LEDs along its perimeter, and thus you would have divisions of radians 5 between each; a decagon, which is a poor representation of the reference circle. Any picture represented in such a space would be a distortion of what the idealised angular coordinates require. Thus, we have decided to do away entirely with any sort of space which depends on an angular dimension(s) and rather deal with the simple Euclidean spaces. Our LED array lends itself to this kind of coordinate representation as it is composed of elements each equally spaced apart in three linear dimensions. This has the additional benet that any picture that is to be represented in our array can be represented intuitively with linear dimensions, leading to both simpler and more ecient software. With this system in mind, let us proceed to analyze the process by which we arrived at our nal design. This section is preceded by a chart comparing each of the systems we considered throughout our design process. Note: The rest of this section may be skipped without major loss of continuity in the overall paper.
Design I J Power Source Power Sink Microcontroller Microcontroller Microcontroller External External PWM Control Method Microcontroller Microcontroller TLC5940 IC TLC5940 IC TLC5940 IC TLC5940 IC High Side Low Side Multiplexing Method 3D Decoder/AND 3D Decoder/AND 2D Decoder/AND 2D Decoder/AND 2D Shift Register/AND 1D Shift Register

The Global Bus Trifurcation of the Bus TLC5940 Design 1 TLC5940 Design 2 TLC5940 Design 3 Final sign De-

1 1 10 10 10

1000 Microcontroller 1000 Microcontroller 100 Microcontroller 100 External 100 External

Current Controlled Iteration Controlled Iteration Controlled Iteration Controlled Iteration Controlled Iteration Controlled

Iteration Controlled Current Controlled Current Controlled Current Controlled Current Controlled Current Controlled

100

10

External

External

Figure 2.3: The progression of our designs.

The Global Bus: Reduction of Dimensions [j = 1000, i = 1]


Taken at face value, the problem of choosing individual LEDs out of a 103 array can be tackled simply by implementing a coordinate-wise choosing system. Thus, our rst and foremost idea consists of a three dimensional low-side multiplexing of the entire cube. The multiplexing is implemented with AND gates and three decoders, one per dimension. Each decoder can, at any point in time, assert only a single output line. Each element within the array is chosen via a triple input AND gate, with its 24

inputs being the corresponding decoder coordinates. For example, the element which is located at coordinate point (1, 2, 3) in the array would have its input to its AND gate be output 1 of the x decoder, output 2 of the y decoder, and output 3 of the z decoder. Thus, only if all three of its coordinates are asserted will it turn on. For all intents and purposes, the intention of this system is to eectively reduce the three dimensional problem of the array to a single virtual dimension of length 1000. (One could also think of this situation as introduced, as an implementation of an x, y, z coordinate system, but this interpretation does not lend itself to our i, j, k convention as the dimension of iteration here is the only dimension. This also prevents any sort of in-depth analysis of the design as compared with other implementations, and so we would do well to avoid thinking of it in this way.) As we shall soon see, by its very nature, this sort of design attempt requires not common anode LEDs but rather common cathode LEDs. On the high side for this design, three PWM sources (one per color channel) are implemented using the microcontroller as a PWM power source itself. Three microcontroller PWM pin outs are each globally connected to corresponding transistors at each LEDs three cathodes. Thus, each color channel on each LED is continuously receiving pulses, but the signal only propagates through to the actual LED if the transistor gates are open. Also, each LEDs AND gate is connected to all three of its transistors, so that in the event the LED is asserted, the PWM signal can pass through and control the LED. The advantages of this design lie in the fact that the addressability is extremely intuitive; each LED has its own direct address and can be singled out at any stage in the iteration process. Thus, a 0 through 999 sweep is unnecessary, and optimizations can be made as to which LEDs actually need to be asserted in a certain frame. The disadvantages of this design are numerous and inhibitive. Firstly, the method by which power is delivered to the system is highly undesirable. Indeed, the microcontroller itself is expected to source the entire current demanded by the LEDs, which is unreasonable for even the best of microcontrollers. Equation 2.9 shows that the required current is on the order of 60 Amperes. It should be noted that this ridiculous number was not known to be of such magnitude at the time this or the following few designs were conceived, and our determining its scale prompted the transition from the TLC5940 Design 1 to the TLC5940 Design 2. This number is damaging to the LEDs themselves because this iteration scheme require that for any single LED to be activated, the full current goes through it. Our LEDs, and most others in existence, cannot handle instantaneous currents of this magnitude, thus adding another dimension of challenge to this design. Secondly, a refresh rate of at least 60 Hz requires an immensely fast iteration rate if it is to proceed one by one through each element in the array. Though this is not impossible with modern technology, it remains dicult and expensive to employ top of the line embedded systems, which in turn would take a lot of (at this point, unnecessary) stress. Furthermore, this system allows for each LED to be on for only 1 of a refresh, requiring the current going through it during its activity to be 1000 1000 times greater than its maximum rated current, namely 60 Amperes. This is absolutely unfeasible for the LED, and as such, will need to be reconsidered. Thirdly, and perhaps most signicantly, this design presents very little scalability. 25

It essentially requires for iteration through every single element of the array, so an expansion of array size in either three of the dimensions leads to a decrease in the time allowed per LED, which in turn results in an increase in both the power supply requirements and in the required iteration frequency. Both of those are upscalable only to a certain point, after which it is simply not possible to drive an array with this design. Fourthly, simply put, this represents far too much a brute force solution, and therefore is unappealing as a whole. Thus, we seek to improve our design.

The Divergence of Paths: Trifurcation of the Bus [j = 1000, i = 1]


Our next design is motivated primarily from the power system problems faced when employing the Global Bus. The system still reduces the entire array into a one dimensional problem, but attempts to correct the issue of having to use a PWM power source by seeking to replace it with a PWM sink. Of course, this will require a switch to the common anode LEDs present in our nal design. The high side of the circuit remains unmolested for this design. Its function is no dierent; each of the 1000 anodes is eectively assigned its own twelve bit address and can be singled out in turn to be active. Iteration proceeds identically to the previous design, along the eectively linearised single dimension containing every LED as an element. However, the microcontroller PWM pin out/ triple transistor regime is removed entirely. The output of the AND gates are simply connected directly to the LEDs common anodes now. The low side now consists of three individual global buses, one for each color channel. A transistor interrupts the paths of each bus to ground. Each of the transistors is controlled by a microcontroller PWM pinout, and thus implements three independent PWM sinks. These low side transistors are a great advantage due to their lowering the overall demand for transistors by no less than three orders of magnitude. Three total transistors on the low side replace 3000 on the previous high side. This design allows for a PWM sink rather than a PWM source. Not only is this function easier for the microcontroller to perform, but also allows for greater accuracy in controlling current ow. The microcontrollers high output can be used as a constant power supply rather than a pin itself, avoiding instantaneous overdrawing of current. Furthermore, the PWM control is now implemented through a transistor rather than directly into the load, which is a great relief to the PWM pin outs. However, this design oers little more advantage than this. The same unreasonable power draw requirements are present as in the previous design. Also, the same issues of upscalability, iteration speed, and I vs. I are present, and so modications to the driver are still required before it can be said that each and every project goal has been comprehensively satised.

26

TLC5940 Design 1 [j = 100, i = 10]


This update to the design seeks to ease the required iteration frequency. The method in which it accomplishes this involves consolidating each layer into its own three PWM sinks, one per color channel. In this way, iteration can proceed columnwise; the sinks ensure simultaneous control of an entire column as each column is activated in turn. This reduces the three dimensional problem into one of two dimensions, with the dimension of iteration containing 100 elements (j = 100) and each of its cross sections containing 10 elements (i = 10). The high side is thus reduced. It now multiplexes each of the 100 columns, eectively assigning each individual column an 8-bit address. In this way, each column can be singled out and iterated through in a simple, intuitive way. This change requires a shift from three-input AND gates to simple two-input ANDs. Furthermore, each column now has its common anodes connected together and fed into an output the columns corresponding AND gate. Note that this reduces the overall required number of AND gates by a factor of 10. The low side entirely eliminates the bus/transistor combination and instead sinks each of the 10 iteration cross sections (each horizontal layer) into its own set of three PWM sinks courtesy of the Texas Instruments TLC5940 PWM driver. Note that since each chip contains only 16 PWM sinking pins, a total of two TLC5940 chips will need to be chained to achieve the desired quantity of pins. This design is far more eective than the previous due simply to the fact that the dimension of iteration has been attenuated by a factor of 10. This allows 10 times as much time spent per LED, and decreases the required iteration frequency by the same amount. This puts signicantly less stress on the power source, requiring a more modest current draw than previously needed. This is an essential improvement in the design because the slower iteration proceeds, the easier the demands on our components are, in general. As we can see, these new power requirements are somewhat more feasible than before. Each LED need suer only approximately 6 Amperes of instantaneous current rather than the full intimidating 60 Amperes. This is the main advantage of employing the parallel PWM sinks. Similarly, the microcontroller hardware does not have to cope with the massive clock frequency requirements as was previously thought in order to provide a refresh rate of > 60 Hz. This proves benecial because the scal cost of a microcontroller tends to grow exponentially with the clock speed at which it operates. However, the design is far from perfect. It should be noted that implementation of the TLC5940 puts many additional constraints on our system. While it allows for many simultaneous PWM sinks, it also decreases our available free computation time. It limits our instantaneous voltage per pin to less than 17 V. Furthermore, it limits our maximum iteration rate, as it places a 30 MHz ceiling on its own internal clocks. The existing problems with the way in which iteration is implemented are still present in this design. The microcontroller is expected to source this enormous quantity of charge, and this is simply not possible for most, if not all, of todays microcontrollers. The Texas Instruments EKK-LM3S1968 is no exception to this rule, and 27

as we found, it is impossible to power the array to anything above a dim glow using the microcontroller as a power source.

TLC5940 Design 2 [j = 100, i = 10]


This design focuses on amending the power sourcing issues. As mentioned in the discussion above, a microcontroller is simply not a feasible power source for our project. Thus, we introduce an external power supply, being extremely careful to isolate it from our microcontroller so as to avoid damage. The high side undergoes a major renovation; instead of the the multiplexing circuit directly powering the common anodes of each column, the AND gates are fed into the gates of 100 individual power MOSFETs via level shifters and inverters, eectively implementing high side switches. Both the level shifter high voltage power channels and the sources of each transistor are connected to a common high-voltage power supply capable of supplying a greater than sucient instantaneous current. The drains of each transistor are linked to each element within their corresponding columns within the LED array. In this way, two of the dimensions are consolidated into a single dimension of iteration, just as the previous design did. Thus, j = 100 and i = 10. The low side suers absolutely no change from the previous design; the PWM sinks still control the entire cross section of iteration as a whole. This design is far removed from its predecessor simply due to the addition of an external power source. The microcontroller, specically the EKK-LM3S1968, is no longer expected to source any current to power the LEDs, and thus the danger of current overdraw is entirely nullied. So long as the external source is robust enough to easily source current as per our requirements, the microcontroller is entirely insulated from any damaging backow into any of its pins. This is essential in order to protect the valuable microcontroller from taking damage due to overvolting or burning out circuitry. The level shifter is necessary to raise the voltage at the gate to that of the source of the transistor so proper switching can be achieved. Additionally, the P-channel MOSFET acts as an open switch only when the potential at the source is much greater than that of the drain, so the input signal requires inverting. A NOT gate is used for this purpose, in the form of a hex inverter IC. This ensures that when an active signal is asserted, the gate is sent low and the transistor is activated, and when the o signal is asserted, the gate goes high and shuts o the power to the drain. The refresh rate remains equally poor as in the previous design, as does the current-draw time period per LED. While it may suce, it suers from very limited upscaling ability, and thus the dimension of iteration will always be limited to a relatively small number. This is, of course, unacceptable by all accounts and thus we proceed to make further modications to our design.

TLC5940 Design 3 [j = 100, i = 10]


This next design supplants the entire high side multiplexing regime with a much simpler and more elegant solution: shift registers, the same method seen in our nal 28

design. The high side implements serial-in parallel-out shift registers as opposed to the decoders to select between the 100 columns. Two 8-bit shift registers are chained together. Two of the resultant 16-bit shift registers each have 10 of their parallel outputs connected to each element in two perpendicular dimensions of the array via the same NOT/level shifter I/O and MOSFETs as before. Thus, the array is eectively multiplexed by use of the shift registers, and one can once again select exactly one of the 100 columns at any one time. This is achieved by, at all times, having only one master-slave ip-op in each extended shift register active at any time. The other parallel outs are all set to low, and thus only the single column chosen by both of the shift registers together is activated. Though this design is essentially the same as the previous one, the addition of shift registers is noteworthy for many reasons, most of which have already been discussed in Section 2.1. We will not repeat those here, but for the fact that it simplies the microcontroller and software interface greatly. Shift registers save a signicant amount of propagation delay as compared with decoders. Plus, only 3 total pins per register are required, of which only one pulse is required to shift the register. The decoders, on the other hand, require 4 address lines each to be maintained at a specic value in order to assert an output line. On the downside, every one of the issues inherent with the previous design are still present in this one. Both the iteration frequency and power source demands hold little hope for upscaling and are not nearly as clean as we would like them to be. The use of the TLC still greatly limits our expandability also, and so this solution is not elegant enough to deem the nal design.

Premise of the Final Design


The nal design seeks to overcome the frequency obstacle which has been present and inhibitive throughout all of the preliminary designs. The way in which this is done represents a shift of mindset from that of individual control to that of broad scale control. The nal design corrects this issue by using only one shift register to iterate through only one of the three dimensions of our array. Cross sections are then layers of the array, each containing 100 elements. Each of these elements is given its own set of 3 TLC5940 PWM sinks. The design calls for the chaining of 19 TLC5940 ICs, but this is equally simple as chaining 2.

29

Figure 2.4: A summary of the entire design process.

2.3

Implementation

This section deals with our physical implementation of the solution to the array driver problem, including both the hardware used and the method by which the materials are put together to form a fully functional three dimensional 10x10x10 RGB LED array. The physical product is essentially a proof of concept, ensuring that our ideas developed throughout this paper, specically the proposed array driver design, is able to function as promised in the real world. Though every hardware specication is theoretically met, it is always benecial to have tangible proof, and thus we built an initial scaled down prototype and the nal full-edged LED array. This section is divided into two major subsections, the Materials and the Construction. The Materials subsection seeks to break down each of the materials we use in order to construct our physical model. The Construction subsection is further subdivided into two parts, the Scale Model, and the Final Array. Note: the reader may skip this Implementation section entirely without signicant loss of continuity.

The Materials
This section highlights all of the equipment and materials used in our project. The list is preceded with a list of components that actually make up the array whose detailed information is not found here, but rather in Section 1.1.

30

1000x RGB Common Anode LEDs 19x TLC5940NT IC 10x Power P-Channel Enhancement Mode MOSFET 2x Level Shifter 2x Shift Register 1x External Power Supply 1x Stellaris EKK-LM3S1968 Microcontroller The following are dened within this section as they are necessary to build the array but do not contribute to it. 300x 1 mm Diameter Acrylic Rod The rods we use in our array are each cut to 27 cm long to match the side length of our project, and are 1 mm in diameter. These rods are of use only for support. They will provide the infrastructure around which our LEDs are arranged, both holding them in place permanently and allowing for a modicum of shock resistance. The essential requirements for these rods are that they be absolutely transparent and as thin as possible while still maintaining their structural integrity. This allows them to be as invisible as possible in the overall design so that they do not interfere with visibility of array elements. Acrylic is the material of choice for this because of its economic value and its ease of working with. It also forms a relatively steady, if not strong, bond with hot glue, our choice of adhesive. This cannot be said about many other materials involving metals. 6x Breadboard The breadboards we will use for our model are of standard dimensions, 7.25 x 3.75 inches. They each have 830 test points grouped together into nodes and buses. The individual points in each node are separated by a spacing of 0.1 inches and together, and a total of ve points comprises a single node. There are two horizontal bus lines, which are intended for use as a high and low voltage reference, but can easily be adapted for use as obscenely large nodes. The breadboard points themselves are each formed using springy clips beneath the surface perforations, and are rated at an absolute maximum of one amp at ve volts. Thus, if any node is to be supplied a greater potential or current than these limits, multiple channels should be used in parallel to ensure that the limits are not superseded. The breadboards also present a limit of approximately 10 MHz on the frequencies of the components inserted into their test points. This is due to the internal contacts within the board having a stray capacitance on the order of 1012 Farads, which begins to signicantly interfere with signals oscillating faster than the given specications. This could potentially present a problem because it cannot be simply circumvented by splitting a channel into multiple parallel ones. For example, the lines for our GSCLK and SCLK need to run at at least 2.5 times the rated maximum of 10 MHz which could present issues if the breadboard is to be used to convey the signal. Thus,

31

the breadboard is a necessary inconvenience, and will only be used for design of the Prototype and will not be used in the nal array. 6x Pre-Punched Blank Protoboards The protoboards used for the nal design have dimensions of 3 x 4.25 inches, and each contain 1224 contact points spaced apart at 0.1 inches each. This allows for placement of standard ICs inside the board with ease. The protoboards are fully solderable and are lined with copper as their conductive layer. Protoboards are chosen for their convenience in making permanent circuits as opposed to using breadboards, which are solderless and therefore impermanent. Another signicant advantage of using protoboards is that they eectively remove the frequency limitations inherent with breadboards. This is due chiey to the lack of capacitance between adjacent points. Thus, protoboards oer little impedance to the frequencies at which our array will be operating and are viable mediums upon which to construct our nal circuitry. 1x 28 Gauge Wire The wires used for our protoboard and breadboard connections are lengths of a spool of 28 gauge monolament copper cored wire. This wire is insulated and will need to be cut to required lengths as necessary. The reason for our choice of 28 gauge is because it provides a suitable blend between stiness and exibility. Any thicker wire becomes too dicult to manipulate but any thinner is too weak to be of robust use. A thick layer of insulation is also necessary because this wire will be used as a conduit of the high voltage to the LED array, requiring absolute isolation from all other circuitry least valuable parts burn out. Issues of ampacity will be discussed in the following item. 1x 28 Gauge Magnet Wire The wire of choice for internal connections within the array is a 28 gauge gold-coated magnet wire. It is chosen solely for its invisibility from a distance. At 28 gauge, its thickness is negligible, and thus will be about as unobtrusive as possible to the overall array. Gold was chosen as the color over red because it is closer to a neutral color. The only issue with such thin wire is its current carrying capabilities. The following calculations serve to prove that the ampacity of the wire is sucient for our purposes. The Standard Handbook for Electrical Engineers lists the following formula: 33 S I = current in Amperes A = area of the wire in circular mils = 160 cm for 28 gauge wire S = time the current ows in seconds I A
2

= log

Tm Ta +1 234 + Ta

(2.14)

32

Tm = melting point in Celsius = 1083 C for copper Ta = ambient temp in Celsius = 25 C for room temperature See pp. 4-74 to 4-79 of the 13th edition of the Handbook for more information [8]. Solving for S as a function of the other variables, we obtain the following: S=
A 2 I

log

Tm Ta 234+Ta

+1 (2.15)

33 541.079 I2

Plugging in the appropriate constants, we get S as a function of I: S= (2.16)

Each LED requires a total of 60 mA going through for a total of 1/10 of the refresh cycle. In the worst case, this means that any single column of the array needs to carry a maximum of 600 mA continuously. The amount of time that 28 gauge copper wire can sustain such a current is: 541.079 = 1503 seconds = 25 minutes (2.17) S= 0.62 Thus, these calculations serve to show that such thin wire can easily sustain the currents required. The array can run at full power for up to 25 full minutes without failure due to the wires, much longer with air cooling and other thermal modulation techniques. 1x Plexiglas Sheet Approximately 2 square feet of standard plexiglas sheet will be used in making our array. The sheet is approximately 0.25 inches thick. Plexiglas is the material of choice due to its ease of machining and exibility. It can take signicant stress without fracturing and is relatively shock resistant. We will be using it only to make a global pattern out of, but in this way we can ensure that the pattern is versatile and can last throughout ten castings. It is also transparent, and so will help with debugging as will be explained later. 1x Permanent Marker The marker of choice for our purposes is a standard black Sharpie brand marker. It will be used for marking on the plexiglas sheet. 1x Hot Glue Gun + Rells A standard hot glue gun and its hot glue rells will be used as the xative of choice for our project. Though not the most permanent of adhesives, hot glues convenience far outweighs any of its limitations. We reason that the nal LED array is never going to experience temperatures exceeding 400 Fahrenheit or or signicant stress or shock. This renders the use of any industrial strength adhesives unnecessary. 33

1x Soldering Iron + Solder Solder will be used to bind our electrical components together. The solder of choice is a standard 60/40 mix of tin and lead, respectively, and is cored with ux for ease of application. 1x Drill + Bit The drill of choice for our purposes is a drill press with a 5 mm bit. A press is used instead of a standard hand drill due to its perfection in drilling holes parallel to the surface normal. This property is key to our design as is explained below.

The Construction
The Prototype The purpose of constructing a prototype is to have a small scale model to begin with. Throughout our design process, we have been testing our ideas in various ways to ensure what we are considering is feasible. Indeed, the discovery of many of the issues we found and corrected can only be attributed to building small scale LED arrays and checking them for full functionality. Needless to say, this section could be populated easily with tens of preliminary implementations, but in order to save on unnecessary detail, we will describe in depth only the last scaled design we built, which is essentially a 3x3x3 RGB LED array which in every sense functions exactly as a full 10x10x10 RGB LED array does. There are a number of reasons to build such a model, chief among them the safety of equipment. LEDs, ICs, and microcontrollers are all valuable commodities, and as such, mass failure must be safeguarded. A 3x3x3 array requires a total of only 27 LEDs and two TLC5940 ICs. If the design being tested contains a fatal aw that could potentially damage our LEDs or ICs, it would be preferable to lose as few parts as possible. Thus, small scale models have been used extensively in debugging our design concepts, the largest of which is the nal model here deemed the Prototype. Like each of the models, the Prototype is built exclusively on a breadboard, and as such, is impermanent and exists solely for testing purposes. Its construction process can be summarised in a very straightforward manner as follows. The rst step involves placement of the LEDs. There are a total of 27 RGB LEDs which must be arranged in a manner which is both simple to wire on a breadboard and also conducive to testing. The consensus for this involved separating the LEDs into 3 disjoint squares separated on a eld of breadboards. Each square consists of 9 LEDs, equally spaced apart. Together, the three squares represent a 3x3x3 LED array with each square modeling a single horizontal layer of the array. This is an emulation of a small scale LED array, dilated from the full array by a factor of 0.3 in each dimension. The second step involves placement of the electrical components, including a single shift register, two TLC5940 ICs, a single level shifter, three power MOSFETs, and the microcontroller. The circuit diagram shown in Figure 2.1 demonstrates the layout 34

of each component in the prototype. Note that this layout matches exactly that of our nal design. The microcontroller directly interfaces only with the shift register and with the TLC5940 ICs. The entire power system circuit is only connected to the microcontroller via the main ground reference, and thus the two circuits are essentially disjoint. The testing of the Prototype is done by the method of interchangeable parts. With our setup, the external power source is easily replaceable, and it is a simple matter to remove components and place new ones. This allows for simple debugging as necessary. For example, if one desires to quantify the current passing through a node, all that is necessary is to open up the node and bridge the gap and measure the intervening current. If it becomes necessary to determine whether certain parts are working the way they should, they can easily be changed out or replaced with similar components which may perform the same function in a dierent way. Such was the case when we were deciding between BJT and MOSFET transistors. The software side is also ideal for testing the limits of our circuit, as it is a simple matter to force the software to interface as though we are working with a full 10x10x10. This is a strong advantage of using a small scale model; it can be treated as though it were full scale while maintaining the debugging capabilities inherent with the use of breadboards, chiey the interchangeability mentioned in the paragraph above. Some of the software-side tests which were performed on the Prototype include, for example, a HSL color sweep, scrolling text output, and a few computationally-heavy algorithms. The complete success of these tests signies that we are ready to build the full scale array. The Final Array Construction of the full sized LED array is to be done dierently from the construction of the Prototype, chiey due to its permanent nature as opposed to the Prototypes use of breadboards and other solderless xtures. Construction will begin by rst making each individual layer, which will be connected as a whole by its elements common anodes and support rods laid out in a grid like pattern. The individual layers will then be assembled into a whole using support rods for each column, intended to space apart the layers precisely as well as perform their namesakes function. This process is simple enough conceptually that it belies the enormous eort required to solder every joint. The driver itself is then assembled exactly as described in Section 2.1. The physical implementation of the driver is not so important as the circuit paths created, as it will not suer signicant visibility in comparison with the necessarily aesthetic LED array. Casting of the Die The rst step involves construction of a mold for arranging single layers. The mold allows for impermanently maintaining exactly 100 LEDs in the relative positions that they will hold in a single cross section of iteration. The arrangement forms an equally spaced square matrix, with each element spaced apart from its neighbors by an exact quantity. The LEDs are held upside-down within the mold so that their pinouts are accessible for working with. It is necessary that the 35

mould be reusable as it will be used to cast each of the ten layers of the array in succession. The mold itself is fashioned out of a simple plexiglas sheet. Holes of exactly 5 mm diameter (the bulb diameter of our LEDs) are drilled into the material at regular intervals. Various drill bits of 5 mm diameter were tested, and the method of trialand-error was employed to choose the one which created the most snug hole for an LED to t within. Each of the intervals between adjacent LEDs is predetermined by delineating a grid pattern of spacing three centimeters. Each intersection of two perpendicular gridlines represents the position of a single LED; in this way, a quasi perfect spacing of LEDs can be achieved. Once the intersections are determined, they are marked with a permanent marker. This allows for ease of visibility of the marks and will withstand accidental erasure. Finally, each mark is drilled into in a manner perpendicular to the two dimensions of the plexiglas sheet. This is of utmost importance because if the hole is at some angle with the surface normal, then the resultant LED position will also be at the same angle, and even though the LED may be spatially located correctly, the angle it forms relative to the other LEDs will result in a nonuniform apparent overall brightness when the nal array is viewed from various angles. This is unacceptable, as it could misrepresent the display of say, a density distribution, even though on a hardware and software level everything is being done correctly. The resultant plexiglass sheet with markings for each LED acts as a general mould for any single layer of the array. A standard baseline for the layer design is essential to assure that the corresponding components of each layer line up along the dimension perpendicular to the layer. Every column must be perfectly linear, else any calculation performed to display frames would require appropriate correction to oset the displaced LEDs. Not only would this provide diculty on a software level, but it would also decrease the visual appeal of any image displayed on the array. For example, if one wished to display the magnitude of a vector eld, the resulting display would be misleading to the human eye due to points of comparison not lining up. Anode Consolidation During this step, we cast and create each of the layers of our array. This is done through judicious use of the mould constructed during the previous step. Each layer is fabricated individually; the mold is lled with a hundred LEDs and all of the LEDs are connected together using the plexiglas rods, ten per dimension to ensure that the array remains xed in place. The fastener of choice is hot glue due to its relatively solid bond between plastics and its sheer convenience in use. Each rod is individually placed before being linked to its cross-members. The attachment to LEDs occurs to the side of the LED head in as unobtrusive a manner as possible. Since each LED has a nite cone of projection, care is taken to ensure that the blob of hot glue does not interfere with the light spread. After all hundred of the plexiglas supports are set into place and the adhesive is allowed to fully cure, the connection of all of the anodes begins. Since each of the one hundred anodes must be linked together, an ecient algorithm must be employed to avoid excessive soldering of awkwardly held joints. Magnet wire is used to connect

36

the anodes, as the intention is for the internal circuitry to be as invisible as possible, allowing for minimum disruption of LED visibility. The most eective way to do this is to cut exactly eleven segments of magnet wire of approximately 30 cm in length, the full side length of our array plus an extra unit of 3 cm to allow for human error. Each wire length will connect one column of anodes in the layer, and the remaining wire will be used in the transverse direction to the others to link together all of the columns. The attachment of magnet wire is no easy feat, and thus deserves more specic instruction. The wire cannot be simply sanded to remove the insulative coating as can thicker magnet wires because its sheer thinness will cause it to snap. Thus, an alternative method of insulation removal is necessary. The method we employed is that of an open ame. Holding the wire to the side of an open ame allows the insulation to burn o but the copper core to remain unmolested. This is a very ecient method, as it requires very little time to obtain perfect, precise results. Using this technique, each wire is stripped down at intervals corresponding to solder joints. The bare wire is then soldered onto the xture just like any normal wire. Array Linkage The individual layers are linked together to form the nal cube by a direct stacking method. We begin with the base, to which we fasten all 100 of the remaining column supports. It is critical that the base here refer to the top of the actual array so that assembly proceeds in an upside-down manner. In this way, the cathodes of the LEDs are accessible for soldering moreso than if assembly were to proceed normally. Just as when drilling the holes, all supports must be held at the surface normal to the base layer. This will ensure that the resultant array will be in the shape of a right rectangular prism, not a parallelepiped of some angle. It is necessary that the latter shape not be created because the resultant skew distortion will be applied to every image displayed on the array. Another issue that could result if the supports are not all mutually parallel is an uneven distribution of damaging tension forces throughout the array. Over time, this could potentially dislocate many of the joints, and as the wire used for linking is relatively thin, it could easily snap and wreak havoc in an inaccessible location within the cube. However, the bond of hot glue is fairly exible, and therefore it is not necessary to be extremely cautious when positioning the rods. As long as they are roughly parallel to the surface normal, the rest will take care of itself. The next layer is positioned on top of the base layer by feeding it carefully into the eld of rods. It is very necessary that care be taken to ensure that the layers wire bridges not be broken accidentally because that would require immediate repair. Once the requisite layer is at the right height on the vertical rods (the same width between each LED, 3 cm), it should be fastened in place using the same hot glue as before. This will ensure that the structure is rigid and square. If any error is found in the right nature of all of the angles on the structure, it must be repaired immediately, as once further layers are added, it will be that much harder to correct. The R,G, and B channels of each corresponding LED in both layers are then chained together using the magnet wire. This ensures that the columns are linked

37

electrically and that each PWM sink can control an entire column using a single output per channel. The magnet wire attachment is somewhat trickier at this step than it was in manufacturing the individual layers. The issue arises because once a layer is mounted by many others, it becomes dicult to access, and attempting to get to a joint accessible only within a deep cavity by shoving a soldering iron neck into is more likely to damage the parts along the inner walls of the hole than to successfully complete the job. Thus, we must develop a new technique to connect the corresponding cathodes along each column. The method we used involves the same insulation removal technique as earlier, namely burning it o. three hundred wires are prepared in the exact same way as before, stripped of insulation at all ten LED contact points. Then, the rst exposed point of each is axed to its designated location in the base array. It is carefully fed through the next array layer as it is added on. This must be done carefully enough to ensure that none of the wires snaps at any point; if this were to occur, it would be exceedingly dicult to remedy, for the same reasons developed in the preceding paragraph. This is certainly the most dicult task of constructing the entire cube, and as such, must be done with extreme care. The process is then continued iteratively, with each new layer being added on until the nal array is assembled. The nal product will have a total of three hundred channels coming out of its base corresponding to the three hundred color channels that will be sunk into the TLC5940s. Each layer will also have its own master common anode channel, which will require for another length of magnet wire to be run down the side of the array which will be closest in proximity to the power system setup for each layer. Power System Implementation This section will not explain the intricacies of the power system itself. It will simply highlight how the theory developed in Section 2.1 is implemented in the physical LED array. This implementation begins with the master common anodes each being connected to external power source via their own power transistors. The transistors are all placed in a single protoboard and connected directly to the leads running down the side of the array. The leads go into the drains of each transistor and are soldered in place permanently. The transistors themselves are gated by the level shifter outputs which are in turn controlled by the shift register outputs, which are all placed on the same protoboard. The outputs of the level shifters are connected to gates of the transistors using the standard wire which was also used in the breadboard. This is not due to concerns of power draw, but rather due to concerns of thinner wire accidentally getting severed if mishandled. The registers are in turn controlled by the microcontroller, which stands alone. The appropriate pins of the microcontroller are connected to the registers using the same thicker wire for the same reasons just mentioned. It is here also that the lack of pins necessary to interface with a shift register as opposed to a decoder comes into good light; the number of external wires necessary to run from the protoboard to the microcontroller is a bare minimum of three. Not only does this avoid the running of large, sti leads from part to part, but also decreases the amount and cost of material

38

required to make this design work. The is as easy to do as it is to think about; the same cannot be said for the low side. There are a total of three hundred channels which need to be fed into as many TLC5940 output pins. This is accomplished via use of the the remaining protoboards, on which the TLC5940 ICs are all soldered in and chained together. The thicker gauge wire would perform well here due to its thickness and insulated nature. However, when soldering the wires to their respective places on the actual base of the array, they amount to a bundle of signicant thickness, which detracts from the aesthetic appeal of the overall array. Thus, we have determined that it is best to use the magnet wire for this purpose also. Though delicate, once bundled and held in place, there is little danger of tearing. If some catastrophic external event occurs which is capable of ripping the bundle out of either end, there will reasonably be other components to worry about taking damage than some thin wires, which are easily replaceable at any rate. The microcontroller must also be interfaced with the TLC5940 ICs, and will do so via the regular 28 gauge wire, for all of the normal reasons. The last step involves installation of the external power source. This is wired directly to the transistors sources via the same wire. It is also fed into each of the two level-shifters high voltage input. It is of critical importance that these connections be insulated and perfect, as any accidental cross-connections could easily destroy vital components which would be dicult at best to replace once soldered in. Finally, all of the individual protoboards will be connected together in a stacked formation, with the one containing the power source on the top. They will not be permanently axed, however, in the anticipation of wear and tear over time which would require replacing of whichever parts have died. Conclusion This completes our survey of the construction of the RGB LED array. Though not the simplest task to perform, the stepwise method of tackling the construction process allows for far fewer errors and misalignments than would say, randomly axing in LEDs until the array was completed. Needless to say, debugging the hardware is at best a pain, and this is why it is easiest to debug as construction is in progress. Each LED can be tested after the anodes are connected to ensure that anode bindings are solid. Furthermore, as each new layer is axed to the array, each color channel can be tested individually to ensure that no side connections have been accidentally made as well as that every required joint has been applied properly. This is essential, as once the nal array is produced, it is practically impossible to access any elements deep within without causing damage to other members surrounding it.

2.4

Software

Interfacing the TLC5940


In this section we aim to go into some detail as to how the TLC5940 is interfaced on a hardware and software level using the Stellaris EKK-LM3S1968. The full system 39

from pin connections to source code will be described, giving a thorough insight into some of the issues faced and how we overcome them. However, before this it is of vital importance that one understands how the TLC5940 functions. Logic Design of the TLC5940 The TLC5940 is internally organized as a collection of registers and comparators which ultimately serve to manipulate the 16 outputs of the chip known as its channels. At the rst stage, there are two input shift registers, both 96 wide. However, the two registers are chained so they eectively act as one register of length 192. The registers are split to achieve extra functionality which we do not use in this project so it will be more helpful to just think of the two as being a single unit. Each time the data clock pin, SCLK (serial clock), is pulsed the input register takes in the bit at the data in line, SIN (serial in), as its rst bit and shifts the existing in the register bits down chain by one. This stage is also where the chainability of the TLC5940 is achieved. The last bit of the input shift register is linked to an output pin, SOUT (serial out), which when connected to the SIN pin of another TLC5940 eectively joins the two input registers of both TLCs and allows for the control of both chips with only one data line and one clock line. This chain can physically be extended indenitely as the limit is not hardware bound but rather software bound, as will be explained further down. The data in the input shift register is treated in packets of 12 bits, which makes sense since 12 bits * 16 outputs per chip = 192 bits, the length of the register. These packets are also sequentially organized, with bits 0-11 being the rst packet, 12-23 the second packet, etc. When the TLC5940 detects a pulse on its XLAT pin, it immediately copies each packet into its own corresponding Grayscale Register, and keeps it there until the next XLAT pulse. The values in the Grayscale Registers are then used to modulate the outputs of the chip. This latching is helpful for two reasons: rst, the actual grayscale number (represented by the 12 bit packet) does not uctuate with the shifting of the input register. Second, it is not required to update the input register continuously as the previous values stored in the GS Register can be continually used until the call to update (XLAT), is made. What we have just described could be called the data input process of the TLC5940. However, there is another simultaneous process which handles the PWM control aspect of the device. This process is driven by two pins, the grayscale clock pin (GSCLK) and the blank pin (BLANK). Internally, every TLC5940 chip contains 16 Grayscale Counter Registers, one for each channel, which are 12 wide. Each rising edge on the GSCLK pin numerically increments all 16 of these counters at once, with an upper bound of 212 or 4096. It should be stressed that all 16 counters are controlled by the one GSCLK pin, as otherwise the logic system becomes excessively complex and inecient. The BLANK pin is simply the master clear for all 16 Grayscale Counters, as when pulsed all 16 counters, no matter their current value, are reset to 0. Both of these processes come together in the nal stage of the operation of the TLC5940. Recall that each TLC has 16 Grayscale Registers, containing the data inputted by the user, and 16 Grayscale Counters, which are regularly incremented 40

by pulses on the GSCLK pin. The nal stage is then a simple comparator. For all 16 outputs, the internal circuitry compares the values in the corresponding Grayscale Register and Counter. If the value in the Register is higher, the chip activates that particular channel, and keeps it on until the Counter value exceeds the Register value. This creates a system in which the current sent to each channel is controlled linearly by the input. An input of 0 means that the value of the corresponding Grayscale Register will never be greater than any Counter value, and so the channel sinks no current. Conversely, a value of 4095 means that the Grayscale Register will always be greater than the Counter, causing its respective channel to sink maximum current. Thus, the current sunk by each channel can be controlled from always o all the way to fully on simply by setting its input from 0 to 4095, which gives rise to the name Pulse Width Modulation Control, as the width of the current pulse is modulated by the input. This is all that is necessary to understand for our projects purposes. A lot of information concerning the more complex features of the TLC5940, such as dot correction, EEPROM usage and LED Open Detection, has been left out. If the reader is interested in understanding these we recommend reading over the TLC5940 datasheet [6]. General Algorithm Overview Now that the workings of the TLC5940 are clear we can discuss the process followed by the nal device driver. Let us rst examine the process owchart seen in Figure 2.5 to get an idea of the general algorithm.

41

Figure 2.5: Programming ow chart provided by TI to interface the TLC5940. [4] After initializing the interface lines, the rst block of execution deals with setting up and editing the dot correction (DC) data. However, we do not use this feature and so skip this step entirely, which leads us to the block that deals with the actual logic of the chip. This block also goes through some initialization stages which mainly deal with isolating it from the extra features like dot correction and error checking. After the VPRG check, the block executes the following steps: 1. Initially, two counters, GSCLK Counter and Data Counter, and an array called GSData are created. GSCLK Counter is used to keep track of the value of the Grayscale Counter Register inside the TLC5940. Data Counter is used as an index to GSData, which itself contains the actual input values that the user wishes to send to the TLC5940. The size of GSData is 192 n, where n is the number of TLC5940s that are chained. 2. The BLANK pin is set low to allow incrementation of the Grayscale Counter Register. This also automatically activates all outputs. 3. A nested loop is entered. At each loop increment, the GSCLK Counter is compared with 4095, which is the maximum value for the Grayscale Counter Register. If GSCLK Counter is less than 4096, two things happen: rst, if the Data Counter is less than the 192n1 (the maximum index of the GSData array), SIN is set to the value at the current data element, GSData[Data Counter], Data Counter is incremented, and SCLK is pulsed to shift the value into the 42

TLC5940. Next, the GSCLK pin is pulsed, incrementing the Grayscale Counter Register, and to keep up GSCLK Counter is also accumulated. 4. When GSCLK Counter reaches 4096, BLANK is set high to reset the Grayscale Counter Register to 0, XLAT is pulsed to latch in the previously sent GSData, and block is exited. Notice that the entire algorithm is based on the GSCLK Counter thus by equality also the Grayscale Counter Register. This is a very important concept as no TLC5940 driver implementation can function without it. The core idea is that one full cycle of the Grayscale Counter Register denes one PWM period. That is, the time it takes for the GCR to span from 0 to 4095, its maximum value, is the period of each refresh cycle. This value is extremely important because it directly controls the maximum refresh rate attainable by the system. Suppose that the PWM period in some implementation happens to be 50 ms. Then, each refresh cycle will need 50ms simply to complete as the counter must reach 4095 before another cycle can be started without corrupting the data inside the input register. Assuming the next ms refresh is started immediately after the rst, a total of 1000ms refresh cycles will be 50 completed in a second, resulting in a maximum update rate of 20Hz. The faster the PWM period, the better the framerate. However, as will later be shown, decreasing the period also has signicant consequences which must be dealt with. Though the original algorithm functions, it has few characteristics that make it unattractive for nal implementation. First, the process is not automatic and runs completely in the foreground, meaning that the CPU performs the refresh only when the user requests it. This is problematic because for the LED array a high refresh rate is vital, and having the update as a foreground process essentially hands control of that rate to the user, making it highly volatile. Because the user has no direct way of knowing when the current PWM period ends, he must either manually keep track or simply guess as to when the period is over and the refresh routine is to be called. Both of these methods are unreliable, intrusive and inecient. Another issue is the pulsing of the pins GSCLK and SCLK. In the initial algorithm, both of these lines must be pulsed manually by wiring the GSCLK and SCLK lines to GPIO pins and on each iteration setting the respective pin rst to high, then immediately to low. This method, known as bit-banging, is unfavorable because it oers no control over the speed of the pulses and is relatively inecient to do because it uses software for a task that hardware specializes in. By far the largest problem with this procedure is that the GSCLK and SCLK are tied, meaning that for every pulse on the GSCLK, there is a pulse at the same time on the SCLK (until of course GSData has been completely shifted). This linkage, though it makes implementation simple, also rids the driver of a signicant amount of control over the TLC5940, because it means that to increase the rate of transfer to the chip, one must also decrease the length of the PWM period, a side eect which causes insurmountable complications when implemented in a more complex algorithm.

43

Final Algorithm Overview Our nal interface builds on the original and utilizes the same methodology (though a vastly dierent implementation) as used in Matthew T. Pandinas article Demystifying the TLC5940 [9] which allow it to overcome all the issues that plague the initial algorithm and make it highly ecient and robust for our purposes. The nal algorithm begins by solving the primary issue of tied clocks. Realizing that the GSCLK and SCLK serve completely dierent purposes and have no real reason to be joined together, the process decouples GSCLK and SCLK, and uses two dierent methods, both hardware, to address each clock and its functions.To guarantee that GSCLK is given reliable time-steady pulses, the driver uses a PWM wave generated from the microcontroller to drive it. Though it may be slightly counterintuitive to use a modulating source as a regular clock, such an eect can be achieved by xing the duty cycle of the wave to 50%. This is illustrated in Figure 2.6.

Figure 2.6: Note that the middle wave perfectly resembles a clock signal. Furthermore, there are also some crucial benets to using a PWM wave as a clock signal. Due to the fact that in most microprocessors PWM waves are generated purely by hardware, the CPU suers no strain and computation capability is not aected. For similar reasons, it is very simple to alter the frequency and phase of PWM waves through software, an ability whose usefulness will become clear further on. Replacing the SCLK is slightly trickier. This clock exists solely to pulse in the grayscale data to the input shift register, meaning that every time a rising edge is detected on the SCLK pin the value at SIN at that time is shifted into the register. Though tempting, using a clocking system similar to the PWM control process will 44

result in disaster for a few reasons. The most pressing issue is that PWM waves are continuous once started, meaning that the SIN pin will have to be continuously updated with values even if the chip cannot even react to them yet, which is obviously unacceptable. Another prohibitive diculty is the timing. Each rising pulse of the PWM wave must be perfectly matched with the correct data value on the SIN pin to avoid sending inaccurate values, and though possible this is a nightmare to implement in software. Thus, to solve the original problem, the nal driver uses the Serial Peripheral Interface or SPI to drive SCLK and SIN. SPI is a communications protocol developed by Motorola that is very ecient in communicating with peripheral devices at a high speed. Though SPI has a plethora of features we use it in its simplest 2-pin conguration, which will be detailed in the following section. The advantages of SPI over PWM clocks in this scenario are immense. Like PWM, SPI is hardware controlled so there is absolutely no chance of a mis-clock occurring during a transaction. Writing data using SPI is also extremely simple, as it involves writing to a single output register and letting the hardware take care of timing and pulsing. Finally, SPI can achieve extremely high baud rates, capable of transferring data at half of the cores clock speed.

Figure 2.7: An SPI timing diagram (MOSI stands for master out slave in, which is equivalent to the SIN line). Note how the SCLK pulses are perfectly aligned with the data pulses. [3] Though decoupling the clocks frees up a large amount of CPU resources, it also creates a ow problem. Recall that in the original design, the value of the Grayscale Counter Register is actively counted by the software itself so that the program knows when a PWM cycle has ended. However, when utilizing a PWM wave there is no direct way for the program to know how many pulses have occurred because the wave is generated by hardware in the background, making it impossible to control algorithm ow by use of software counters. Thus we require two things: a method to count the pulses on the GSCLK waveform, and a way to alert the software when 4095 clock pulses have occurred so that it may perform the post-PWM period routine. We fulll the rst requirement by using a functionality known as input edge count mode. A feature of most modern embedded systems, input edge count mode allows a microprocessor to detect rising and falling voltage edges on some of its GPIO pins. The setup process is simple yet powerful: the executing program sets the initial value of the edge counter, and also provides a target pulse count for the mechanism to 45

match. Then, during runtime, the edge counter is decremented by hardware every time a pulse is detected until the target is met, at which point the processor sets a ag signifying that the desired count has been reached and stops the edge counter (See Figure 2.8). To utilize this mode, we split the GSCLK driving PWM signal from the microcontroller and route the second line to the input edge count pin through a resistor to prevent current ow. The processor treats its own PWM wave as an external signal and track its pulses. By then setting the counter target to 4095, we get an eective replacement of the GSCLK Counter found in the original algorithm. Note that using edge count mode begins to transform the driver into a background process, because unlike the initial algorithm, this method requires nearly zero software side manipulation to achieve the same functionality. Once it has initialized the edge counter functionality and the corresponding registers, the user program is free to process other tasks without worry, thus taking a large load o of the CPU.

Figure 2.8: An input edge count mode progression diagram specic to the Stellaris LM3S1968. Note that on each rising and falling edge the count is decremented until it matches the target of 0x0006. [6] The solution to the second requirement also comes as a benet of using the edge count functionality. Recall that the edge counter sets a ag when the target count is met. There are then two ways to use this ag to notify the running program of the end of the PWM period. In one approach, the program continuously polls the ag, and then executes the post-PWM period routine when it detects the ag as set. However, this technique reduces the driver to a foreground process since active CPU time must be spent doing the check. To keep the driver in the background, our driver utilizes an edge count interrupt mechanism which is hardware supported. In this setup, the hardware itself polls the register and automatically calls a preset function 46

in the software when it detects the ag. When the interrupt is called the program halts its current execution thread, completes the corresponding interrupt routine, and then returns to where it left o. This method also accurately noties the program when a period is over, but also refrains from using any computation cycles, making the driver an invisible thread to the user. This interrupt, called the edge counter interrupt, is an especially important part of the nal interface because it acts as the link between the PWM period process and the data input process. Remember that the original algorithm shifted in the grayscale data alongside the grayscale counter, as each GSCLK pulse was followed by a bit of data being sent through SIN. In the new approach, we do not have the luxury of doing this since GSCLK is interfaced entirely through hardware, and the value of the Grayscale Counter Registers must be known to properly clock in data (if all data is not clocked in before the counters reset the outputs are undened). Thus, all the input data is sent during the edge counter interrupt, as that is the only time where the value of the Grayscale Counter Registers is known and we can be sure that no undened behavior will occur. With all the pieces complete, we can now discuss how the layer selection system works. Since the driving system only addresses one cross sectional layer of the array at one time, to control every object in the system the driver must iterate through all the layers which compose the array. Our project aims to accomplish this using a shift register, but again runs into have a timing issue: if a layer shift occurs during a PWM period, then all color control is forfeited because the very basis of the PWM grayscale cycle is invalidated. For example, suppose an LED on layer 0 is set to have an input value of 2000, and another LED on layer 1 is set at a value of 500. The Grayscale Counter Registers have just been reset and layer 0 has been set as active. Thus the LED on layer 0 turns on as we want it to. However, now imagine that when the GCR reaches 1000, the active layer is changed to 1. Now two things happen: rst, the LED on layer 0 immediately shuts o because it is no longer receiving power to its anode. Obviously this is incorrect as the LED was only on for 1000 GSCLKs instead of the desired 2000. Furthermore, the LED on layer 1 will not ever turn on, because its value of 500 has already been surpassed by the value in the GCR. Both LEDs now have arbitrary responses and we have lost the ability to control them. From this example it becomes clear that to maintain control over the outputs, the layers must be switched precisely when a PWM period ends. Thankfully, the edge counter interrupt is triggered exactly when this happens and so the layer is switched in the interrupt as well, xing the timing issue. To summarize the complete algorithm, all the problems facing the initial implementation are solved by separating the two processes necessary to properly interface the TLC5940. The steady incrementing of Grayscale Register Counter is handled by a 50% duty cycle PWM wave generated from the microcontroller. Another pin on the microcontroller uses the edge counting functionality to log the rising edges of the wave, and causes an interrupt whenever the count matches 4095. The interrupt then resets the Grayscale Register Counters, latches in the previous data, changes layers and shifts in the next set of data via SPI. Note that the entire process is independent of the user and thus also precisely timed, achieving maximum frame rate as well as 47

eciency and robustness.

Figure 2.9: Process chart of the improved algorithm. Note the dramatic shift from software to hardware. Driver Implementation Now we discuss the specic implementation of the nal algorithm that was written for the Stellaris LM3S1968 microcontroller, on evaluation board EKK-LM3S1968. We will cover the writing of the driver from start to nish and also describe how the client interfaces with it. Environment The TLC5940 driver was written in C, using the Keil Vision4 integrated development environment to facilitate organization and testing. As such, some general knowledge of the C programing language, specically pointers, arrays, the preprocessor and functions, will be necessary to completely understand the implementation. Tools The driver utilizes a few tools to further aid in code exibility and readability. We use the StellarisWare library provided by ARM to interface with the registers that control internal hardware functionality such as PWM generation, SPI as well as interrupts, because manually modifying these registers results in both confusing and error-prone code. The StellarisWare library also performs error checking on its arguments and thus guarantees that the program does not write to invalid addresses, further improving robustness. A further benet of using the LM3S1968 controller, the software is also able to utilize the C standard library. The C standard library is a collection of utility les that provides a strong foundation for any program to build on. It includes a large number of mathematical routines (including trigonometric and exponential methods), methods to input and output to console, and helps with memory allocation and deallocation. 48

The C standard library thus assists in all aspects of the driver, compacting it as well as driving up eciency. A Word on Generality Please note that the following library, though it uses a general algorithm, is specically written to drive our LED array. Because of this, there are parts of the code which will have to be altered in order to allow the library to drive another system. However, our implementation should provide the basic framework to help others with their modications, and the nature of the algorithm will make sure that no drastic changes will be necessary. Connections Figure 2.4 highlights the IC connections relevant to the driver.

Writing the TLC5940 Driver The library is primarily contained within two les: LibTLC.h and LibTLC.c. Note however that it does access many other les, most of which are part of the StellarisWare toolkit and thus are not explained here . We begin with the header le, which contains general declarations and facilitates the actual implementation in the denition le. # ifndef LIBTLC 49

# define LIBTLC # include # include # include # include # include # include # include # include # include # include # include " hw_types . h " " hw_ints . h " " hw_ssi . h " " hw_memmap . h " " interrupt . h " " sysctl . h " " pwm . h " " timer . h " " gpio . h " " ssi . h " " inc / lm3s1968 . h "

Here we include all the external les that will be necessary further on. As stated before most of these les are needed to employ the StellarisWare tools. The others, including as all the hw x les and the inc/lm3s1968.h le provide macros to commonly used memory locations and values. // TLC5940 port definitions // # define GSCLK_PORT GPIO_PORTG_BASE # define GSCLK_PIN GPIO_PIN_2 # define SCLK_PORT # define SCLK_PIN # define BLANK_PORT # define BLANK_PIN # define XLAT_PORT # define XLAT_PIN GPIO_PORTA_BASE GPIO_PIN_2 GPIO_PORTF_BASE GPIO_PIN_7 GPIO_PORTF_BASE GPIO_PIN_1

// PG2 //

// SSI0CLK //

// PF7 //

// IDX1 //

// Shift register port definitions // # define SR_DATA_PORT GPIO_PORTD_BASE # define SR_DATA_PIN GPIO_PIN_2 # define SR_CLK_PORT # define SR_CLK_PIN # define SR_CLR_PORT # define SR_CLR_PIN GPIO_PORTD_BASE GPIO_PIN_3 GPIO_PORTG_BASE GPIO_PIN_3

// U1Rx //

// U1Tx //

// PG3 //

Now we dene the ports and pins that all our interface lines will use. Doing this has a few benets: the actual port names are abstracted and thus made clearer. Also it becomes trivial to change any of these ports as all one has to do is reassign them here 50

and the change will naturally carry over to the entire program. SCLK and GSCLK should be kept constant as only specic pins support such functions. // Port # define # define # define functions // setLow ( port , pin ) ( GPIOPinWrite ( port , pin , 0 x00 ) ) setHigh ( port , pin ) ( GPIOPinWrite ( port , pin , pin ) ) pulse ( port , pin ) do { \ setHigh (( port ) , ( pin ) ) ; \ setLow (( port ) , ( pin ) ) ; \ } while (0) # define outputState ( port , pin ) ( GPIOPinRead ( port , pin ) ) These macro denitions are adapted from Pandinas article [9], and they further abstract and simplify pin operations. The user can set or check the state of, as well as pulse any dened pin using the above macro functions. The denitions themselves are self-explanatory (the GPIO functions are provided by StellarisWare). void TLCSetup ( int num_TLC , int num_Layers ) ; void TLCSetLED ( unsigned char layer , unsigned char x , unsigned char y , unsigned short * color ) ; void TLCInterrupt ( void ) ; # endif We now declare three functions and their parameters which will control all driver functionality. This concludes the header le, so let us now discuss the denition le, LibTLC.c. # include " ../ include / LicTLC . h " # include < stdlib .h > // Number of TLCs in chain // unsigned short numTLC ; unsigned short numChannels ; // Number of layers in array // unsigned short numLayers ; // Element properties // unsigned short numElementsPerLayer ; unsigned short numElementsPerDimension ; // Counters // int TLCCounter , TLCCounter2 ; int currentLayer = 0; 51

// Main data variable // unsigned short ** colorData ; This section declares all the global variables that will be utilized during the execution functions. Their uses are as follows: numTLC - Stores how many TLC5940s the user is chaining together. numChannels - Stores how many total channels will be controlled by the driver. numLayers - Stores the number of layers in the LED array. This is equvalent to the j variable in the established convention. numElementsPerLayer - Stores the number of elements per layer, following the same denition as above. numElementsPerDimension - Stores the number of elements in a single dimension of a single layer. TLCCounter - This variable has no specic use but is employed as a general counter in constructs such as for loops. currentLayer - This variable keeps track of the array layer that the TLC5940s are currently coloring. colorData - This points to an array which contains all the color data for the entire array. TLCSetup We now delve into the rst function of the TLC5940 library, TLCInit. This function initializes all the system hardware used to drive the TLC5940 and also sets up the user to driver interface. void TLCSetup ( int num_TLC , int num_Layers ) { // Assign TLC number // numTLC = num_TLC ; // Calculate number of channels // numChannels = numTLC * 16; // Assign layer number // numLayers = num_Layers ; // Figure out number of elements // numElementsPerLayer = ( numLayers ) * ( numLayers ) * 3; numElementsPerDimension = ( numLayers ) * 3; 52

The function takes two arguments, num TLC and num Layers, which correspond to their non-underscored counterparts numTLC and numLayers. numTLC and numLayers are initialized, and numChannels is to numT LC 16 because each TLC has 16 channels. The following two assignments determine how many elements are present in each layer and each dimension of a layer. Note that these assignments are specic to our project as they assume that the output devices are arranged in a cube. numElementsPerLayer - Here we calculate the number of elements in a single cross section of the array by squaring numLayers then multiplying by 3. numElementsPerDimension - We drop to a single dimension and calculate how many elements there are in a single line of LEDs. // Allocate color map and initialize // colorData = ( unsigned short **) malloc ( sizeof ( unsigned short *) * numLayers ) ; for ( TLCCounter = 0; TLCCounter < numLayers ; TLCCounter ++) colorData [ TLCCounter ] = ( unsigned short *) malloc ( sizeof ( unsigned short ) * numElementsPerLayer ) ; for ( TLCCounter = 0; TLCCounter < numLayers ; TLCCounter ++) for ( TLCCounter2 = 0; TLCCounter2 < numElementsPerLayer ; TLCCounter2 ++) colorData [ TLCCounter ][ TLCCounter2 ] = 0; Now that we know the total number of elements in the structure, we allocate space to hold all of them. Each element is an unsigned short because the maximum value that can be represented by the TLC5940 is 4095. The for loop simply initializes all values to 0 to assure no unexpected outputs. The actual internal layout of colorData will be explained during the TLCSetLED function. // Enable peripherals // SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOA ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOB ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOD ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOG ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOF ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_PWM0 ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_SSI0 ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_TIMER0 ) ; The function enables all peripherals it will need such as GPIO ports, PWM, SSI (which encapsulates SPI), and the hardware timer (which provides the edge counting functionality).

53

// Setup shift register ports // GPIOPinTypeGPIOOutput ( SR_DATA_PORT , SR_DATA_PIN ) ; GPIOPinTypeGPIOOutput ( SR_CLK_PORT , SR_CLK_PIN ) ; GPIOPinTypeGPIOOutput ( SR_CLR_PORT , SR_CLR_PIN ) ; // Seed shift register with initial 1 // pulse ( SR_CLR_PORT , SR_CLR_PIN ) ; setHigh ( SR_DATA_PORT , SR_DATA_PIN ) ; pulse ( SR_CLK_PORT , SR_CLK_PIN ) ; The driver now begins to initialize each component of the entire system, starting with the shift register. After declaring the lines as GPIO outputs, the function pulses the CLR pin of the IC to make sure all the outputs are low. Then, a single high bit is shifted onto the register to signify that the rst layer (layer 0 in our convention) is active. // Setup SSI // GPIOPinTypeSSI ( GPIO_PORTA_BASE , GPIO_PIN_5 | GPIO_PIN_4 | GPIO_PIN_3 | GPIO_PIN_2 ) ; SSIConfigSetExpClk ( SSI0_BASE , SysCtlClockGet () , SSI_FRF_MOTO_MODE_0 , SSI_MODE_MASTER , SysCtlClockGet () / 2 , 12) ; Note: SSI and SPI are used synonymously in this section, but note that they are not the same in all cases. Here we setup the SSI settings, with the following arguments: SSI0 BASE - This is simply the base address of the rst SSI module (the LM3S1968 has two), which is the one we will be using. SysCtlClockGet() - This argument simply asks for the rate of the system clock which will be used to supply the SSI module. Note that in our implementation this value is 50,000,000 (50 MHz). SSI FRF MOTO MODE 0 - The SSI system has a few frames of transmission to choose from. Each frame has a separate starting and ending signature as well as pulsing pattern. For the TLC5940, the FRF MOTO frame is necessary for proper operation. For information regarding the other frames please refer to the Stellaris datasheet [7]. SSI MODE MASTER - Each SSI device can be congured as either a master or slave, which helps a single controller driver multiple chips. The intricate details will not be explained here as they are beyond the scope of the document. We do, however, place the LM3S1968 in master mode because the master is the provider of the clock in the SSI protocol. 54

SysCtlClockGet() / 2 - This argument, known as the baud rate, is the rate of transfer between the master and slave (and vice versa). As a physical limitation, SSI can only transmit data at as much as half the cores clock speed, which is the reason for the division by 2. For our project this number is 25,000,000. 12 - The nal argument tells the hardware how many bits of data will be sent per transaction. Though this number can be anywhere from 4 to 16, we choose 12 as it matches the bit resolution of the TLC5940 and consequently makes sending values much more straightforward. // Setup TIMER0A in edge count mode // GPIOPinTypeTimer ( GPIO_PORTB_BASE , GPIO_PIN_0 ) ; TimerConfigure ( TIMER0_BASE , TIMER_CFG_SPLIT_PAIR | TIMER_CFG_A_CAP_COUNT ) ; // Configure timer0 as input capture that captures positive edges // TimerControlEvent ( TIMER0_BASE , TIMER_A , TIMER_EVENT_POS_EDGE ) ; TimerControlStall ( TIMER0_BASE , TIMER_A , true ) ; TimerLoadSet ( TIMER0_BASE , TIMER_A , 4095) ; // Initialize timer to 4095 // TimerIntEnable ( TIMER0_BASE , TIMER_CAPA_MATCH ) ; // enable capture A interrupts // TimerMatchSet ( TIMER0_BASE , TIMER_A , 0) ; // Timer interrupts at 0 // TimerIntRegister ( TIMER0_BASE , TIMER_A , TLCInterrupt ) ; // Set Timer0 interrupt function // IntEnable ( INT_TIMER0A ) ; The edge counter is now congured on Timer0A. TimerControlEvent(TIMER0 BASE, TIMER A, TIMER EVENT POS EDGE) - Congures the timer to only count rising edges, instead of falling or both edges. This is to mimic the TLC itself increments the GCRs on each rising edge. TimerLoadSet(TIMER0 BASE, TIMER A, 4095) - Seed the timer with an initial value of 4095, since the edge counter decrements on each detected event. TimerMatchSet(TIMER0 BASE, TIMER A, 0) - Set the value at which the timer will throw an interrupt. We use 0 because 4095 0 = 4095, the number of edges we wish to count. Note that the load and match values do not matter as long as their dierence is the correct value. TimerIntRegister(TIMER0 BASE, TIMER A, TLCInterrupt) - Register TLCInterrupt as the function to call when the interrupt is thrown.

55

// Setup PWM0 Generator // GPIOPinTypePWM ( GSCLK_PORT , GSCLK_PIN ) ; PWMGenConfigure ( PWM0_BASE , PWM_GEN_0 , PWM_GEN_MODE_DOWN ) ; PWMGenPeriodSet ( PWM0_BASE , PWM_GEN_0 , SysCtlClockGet () / 2500000) ; PWMPulseWidthSet ( PWM0_BASE , PWM_OUT_0 , SysCtlClockGet () / 2500000 / 2) ; PWMOutputState ( PWM0_BASE , PWM_OUT_0_BIT , true ) ; Here the PWM generator to drive the GSCLK is setup: PWMGenCongure(PWM0 BASE, PWM GEN 0, PWM GEN MODE DOWN) - The generator is congured to generate left/right aligned pulses instead of center aligned pulses, which are unnecessarily complex for our purposes. PWMGenPeriodSet(PWM0 BASE, PWM GEN 0, SysCtlClockGet() / 2500000) - This call sets the period of each PWM cycle. Below is an explanation of the nal argument to this function. The array has 1000 LEDs laid out as 10 layers of 100. Each layer must receive one full PWM cycle to be color accurate, as described in Section 2.4. Thus, 10 PWM periods are necessary to update the entire array once. However, we want the array to be updated 60 times a second, so at least 10 60 or 600 PWM periods are needed per second. Furthermore, each PWM period must contain 4096 pulses to make sure the GCRs are fully saturated. Thus, the generated signal must contain 10604096 = 2457600 pulses per second, a speed of 2.457 MHz. Since the PWM clock is derived from the core clock, the period of each PWM cycle relative to the core is E given by 2457600 . We round this value up to 2500000 for some headroom in the frame rate. PWMPulseWidthSet(PWM0 BASE, PWM OUT 0, SysCtlClockGet() / 2500000 / 2) - The width of the pulse is set to half that of the entire period, resulting in a 50% duty cycle wave which results in a true clock signal. // Setup TLC5940 // GPIOPinTypeGPIOOutput ( BLANK_PORT , BLANK_PIN ) ; GPIOPinTypeGPIOOutput ( XLAT_PORT , XLAT_PIN ) ; setLow ( GSCLK_PORT , GSCLK_PIN ) ; setLow ( XLAT_PORT , XLAT_PIN ) ; setHigh ( BLANK_PORT , BLANK_PIN ) ; Because GSCLK, SCLK, and SIN are all dealt with using other methods, only BLANK and XLAT are setup as normal GPIO outputs. They are then set to their starting states as dictated by Figure 2.5.

56

// Enable SSI // SSIEnable ( SSI0_BASE ) ; // Enable Edge Counter // TimerEnable ( TIMER0_BASE , TIMER_A ) ; // Enable PWM generator // PWMGenEnable ( PWM0_BASE , PWM_GEN_0 ) ; Now that the necessary ports, pins, and hardware functionalities are congured, we end the TLCInit method by enabling SSI, the edge counter, and the PWM generator, essentially activating the TLC5940.

User Driver Interface Before discussing the second driver method, TLCSetLED, we would rst like to describe the interface between the TLC5940 driver and its client. Our goal, as has been hinted at throughout this section, is to make the driver invisible to the user. That is, the user should not have to know how the driver works, nor should he have to accommodate for it in any way when writing his program. Ultimately, we want the user to access the LED array like he would any two-dimensional screen: by setting pixel colors and letting the background processes translate them to the physical construct. The interface is quite simple to implement and is achieved using a single array, called colorData in our specialized driver, which contains all data needed to color the LED cube. The idea is that the user writes to this array anytime he want to modify the color of a pixel, and the library simply indexes the same array when passing values to the TLC5940. Though the cube may not update instantaneously, as the driver will have to be updating the correct layer for any change to become visible, a large enough refresh rate will make it appear so to any human observer. Note that modern display screens utilize a similar rendering technique.

TLCSetLED The TLCSetLED function allows the user to alter the colorData array and thus control the output of the cube. The method simply takes a color value and assigns it to an LED, given by user coordinate, by placing it in the proper colorData location. void TLCSetLED ( unsigned char layer , unsigned char x , unsigned char y , unsigned short * color ) { // Determine index // unsigned short index = ( y * numElementsPerDimension ) + ( x * 3) ; // Fill in data // 57

colorData [ layer ][ index ] = color [0]; // Red intensity // colorData [ layer ][ index + 1] = color [1]; // Green intensity // colorData [ layer ][ index + 2] = color [2]; // Blue intensity // } colorData is organized as a two dimensional array to facilitate indexing. The rst index selects the layer to be modied and the second chooses which object in that layer is to be modied. The y dimension is higher order than the x, meaning that as the second index iterates from 0 to numElementsPerLayer, the array cycles through all the x values in a y coordinate before changing the y coordinate:

Figure 2.10: Internal layout of colorArray. All numbers in the top right box are indexes. The second index is calculated in accordance with the above layout, and the nal array location is then lled with the 3 color values specied by the user.

TLCInterrupt Now we analyze the nal function and the driving force behind the TLC5940 driver, TLCInterrupt. Being an interrupt, this function cannot take arguments nor return anything. Also, keep in mind that when this function is called, the Grayscale Counter Registers have just hit a count of 4095. // Counter to track FIFO progress // unsigned char fillFIFO = 0;

58

The Stellaris LM3S1968 contains a hardware based rst in rst out data queue that can be used to queue multiple data frames to the SSI module and let the controller shift them in at the relatively slower baud rate. This allows the interrupt to avoid busy waiting after each transmission, dramatically increasing the time eciency of the function. However, the queue has a limited depth and so a variable, llFIFO, is used to keep track of how many locations have been lled. // Reset timer // TimerIntClear ( TIMER0_BASE , TIMER_CAPA_MATCH ) ; TimerDisable ( TIMER0_BASE , TIMER_A ) ; TimerLoadSet ( TIMER0_BASE , TIMER_A , 4095) ; TimerEnable ( TIMER0_BASE , TIMER_A ) ; This block resets the edge counter as the hardware automatically disables it when it detects a counter match. The timer is again set to 4095 to keep it functioning because its current value is 0. // Reset GCRs and disable all outputs // setHigh ( BLANK_PORT , BLANK_PIN ) ; In order to prepare for the layer change, we rst set BLANK to high, which does two things: resets the Grayscale Counter Registers, and forces all outputs to o. It is important that all outputs be o before changing layers to avoid any ghosting eects. // Change layers // if ( currentLayer == 0) { pulse ( SR_CLR_PORT , SR_CLR_PIN ) ; setHigh ( SR_DATA_PORT , SR_DATA_PIN ) ; pulse ( SR_CLK_PORT , SR_CLK_PIN ) ; } else { setLow ( SR_DATA_PORT , SR_DATA_PIN ) ; pulse ( SR_CLK_PORT , SR_CLK_PIN ) ; } Here, the layer selecting shift registers are interfaced to produce the correct outputs on their parallel pins. If the currentLayer is 0, then the registers are cleared and reseeded with an initial 1, turning on the 1st layer. Otherwise the register outputs are simply shifted over one by shifting in a 0 bit, which in turn activates the next layer. // Latch in input shift register data // pulse ( XLAT_PORT , XLAT_PIN ) ; 59

// Resume new PWM cycle // setLow ( BLANK_PORT , BLANK_PIN ) ; Now that the layer has been changed, we can safely latch in the data from the Input Shift Register into the Grayscale Data Registers. Once that is done the PWM cycle is restarted by forcing BLANK to low. // Calculate next layer // currentLayer = ( currentLayer + 1) % numLayers ; Here we calculate the index of the next layer using a simple modular correction scheme. This index is necessary before shifting in new data because, as shown in Figure 2.9, the algorithm requires that the interrupt transfer the next layers data. // Bitflood TLC ( queue is 8 deep ) // for ( TLCCounter = numElementsPerLayer ; TLCCounter >= 0; TLCCounter - -) { SSIDataPut ( SSI0_BASE , colorData [ currentLayer ][ TLCCounter ]) ; fillFIFO ++; if ( fillFIFO == 8) { fillFIFO = 0; while ( SSIBusy ( SSI0_BASE ) ) ; } } The nal block of the TLCInterrupt function, this for loop initiates and completes the actual transfer of data to the TLC5940 through the SSI module. Note that the for loop begins at index numElementsPerLayer and goes to 0, instead of going the other way. This is simply because of the nature of shifting. As the bits keep getting pushed further down the chain, what was shifted rst will actually end up last in the chain. Thus we shift the bits in reversed order so that they appear in the correct layout on the Input Shift Register. Within each loop, the SSI module is assigned a new packet. Unless the SSI module is free, the packet gets pushed onto the data queue until the module is ready (even though each colorData element is actually 16 bits long, only the rst 12 bits will be transferred because of the conguration). It is not advisable to keep pushing packets onto the queue as the hardware only holds 8 packets at once and will reject any more, resulting in loss of control. To avoid this overow we keep track of the used spaces on the queue using llFIFO, and stall the transfer module when all 8 slots are detected as full.

60

Figure 2.11: Data ows between multiple interrupts. The order of variable and pin changes is crucial. Also note that the rst layer is given undened data during the very rst refresh cycle, an acceptable side eect of this iteration scheme. This completes the implementation of the TLC5940 driver. To use the library, the user simply includes its header TLCLib.h,and calls the initialization function with the proper values. Any changes the user then makes using the function TLCSetLED will automatically be reected on the output array. Once again, remember that this is a relatively project specic library and will need to be modied for dierent applications. 61

The Final Code TLCLib.h 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 # ifndef LIBTLC # define LIBTLC # include # include # include # include # include # include # include # include # include # include " hw_types . h " " hw_ints . h " " hw_memmap . h " " interrupt . h " " sysctl . h " " pwm . h " " timer . h " " gpio . h " " ssi . h " " inc / lm3s1968 . h "

/* TLC5940 Connections GSCLK -> PG2 , CCP0 SCLK -> S0CLK SIN -> S0Tx BLANK -> PF7 DCPRG -> +5 V VPRG -> GND XLAT -> IDX1 10 k from VCC and PF7 Current control resistor from IREF to ground */ /* Shift register connections MR -> PG3 CP -> U1Tx DS1 -> U1Rx DS2 -> +5 V */

// TLC5940 port definitions // # define GSCLK_PORT GPIO_PORTG_BASE # define GSCLK_PIN GPIO_PIN_2 # define SCLK_PORT # define SCLK_PIN # define BLANK_PORT GPIO_PORTA_BASE GPIO_PIN_2 GPIO_PORTF_BASE 62

// PG2 //

// SSI0CLK //

// PF7 //

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

# define BLANK_PIN # define XLAT_PORT # define XLAT_PIN

GPIO_PIN_7 GPIO_PORTF_BASE GPIO_PIN_1 // IDX1 //

// Shift register port definitions // # define SR_DATA_PORT GPIO_PORTD_BASE # define SR_DATA_PIN GPIO_PIN_2 # define SR_CLK_PORT # define SR_CLK_PIN # define SR_CLR_PORT # define SR_CLR_PIN // Port # define # define # define GPIO_PORTD_BASE GPIO_PIN_3 GPIO_PORTG_BASE GPIO_PIN_3

// U1Rx //

// U1Tx //

// PG3 //

functions // setLow ( port , pin ) ( GPIOPinWrite ( port , pin , 0 x00 ) ) setHigh ( port , pin ) ( GPIOPinWrite ( port , pin , pin ) ) pulse ( port , pin ) do { \ setHigh (( port ) , ( pin ) ) ; \ setLow (( port ) , ( pin ) ) ; \ } while (0) # define outputState ( port , pin ) ( GPIOPinRead ( port , pin ) )

void TLCSetup ( int num_TLC , int num_Layers ) ; void TLCSetLED ( unsigned char layer , unsigned char x , unsigned char y , unsigned short * color ) ; 69 void TLCInterrupt ( void ) ; 70 71 # endif TLCLib.c 1 2 3 4 5 6 7 8 9 10 11 12 13 # include " ../ include / tlc . h " # include < stdlib .h > // Number of TLCs in chain // unsigned short numTLC ; unsigned short numChannels ; // Number of layers in array // unsigned short numLayers ; // Elemnent properties // unsigned short numElementsPerLayer ; unsigned short numElementsPerDimension ; 63

14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

// Main data variable // unsigned short ** colorData ; // Counters // int TLCCounter , TLCCounter2 ; int currentLayer = 0; void TLCSetup ( int num_TLC , int num_Layers ) { // Assign TLC number // numTLC = num_TLC ; // Calculate number of channels // numChannels = numTLC * 16; // RMEMEBER TO CHANGE THIS WHEN YOU DONT WANT ALL 16 ON EVERY CHIP // // Assign layer number // numLayers = num_Layers ; // Figure out number of elements // numElementsPerLayer = numLayers * numLayers * 3; numElementsPerDimension = numLayers * 3; // Allocate color map and initialize // colorData = ( unsigned short **) malloc ( sizeof ( unsigned short *) * numLayers ) ; for ( TLCCounter = 0; TLCCounter < numLayers ; TLCCounter ++) colorData [ TLCCounter ] = ( unsigned short *) malloc ( sizeof ( unsigned short ) * numElementsPerLayer ) ; for ( TLCCounter = 0; TLCCounter < numLayers ; TLCCounter ++) for ( TLCCounter2 = 0; TLCCounter2 < numElementsPerLayer ; TLCCounter2 ++) colorData [ TLCCounter ][ TLCCounter2 ] = 0; // Enable peripherals // SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOA ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOB ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOD ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOG ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_GPIOF ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_PWM0 ) ; 64

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

SysCtlPeripheralEnable ( SYSCTL_PERIPH_SSI0 ) ; SysCtlPeripheralEnable ( SYSCTL_PERIPH_TIMER0 ) ; // Setup shift register ports // GPIOPinTypeGPIOOutput ( SR_DATA_PORT , SR_DATA_PIN ) ; GPIOPinTypeGPIOOutput ( SR_CLK_PORT , SR_CLK_PIN ) ; GPIOPinTypeGPIOOutput ( SR_CLR_PORT , SR_CLR_PIN ) ; // Seed shift register with initial 1 // pulse ( SR_CLR_PORT , SR_CLR_PIN ) ; setHigh ( SR_DATA_PORT , SR_DATA_PIN ) ; pulse ( SR_CLK_PORT , SR_CLK_PIN ) ; // Setup SSI // GPIOPinTypeSSI ( GPIO_PORTA_BASE , GPIO_PIN_5 | GPIO_PIN_4 | GPIO_PIN_3 | GPIO_PIN_2 ) ; SSIConfigSetExpClk ( SSI0_BASE , SysCtlClockGet () , SSI_FRF_MOTO_MODE_0 , SSI_MODE_MASTER , SysCtlClockGet () /2 , 12) ; // Setup TIMER0A in edge count mode // GPIOPinTypeTimer ( GPIO_PORTB_BASE , GPIO_PIN_0 ) ; TimerConfigure ( TIMER0_BASE , TIMER_CFG_SPLIT_PAIR | TIMER_CFG_A_CAP_COUNT ) ; // Configure timer0 as input capture that captures positive edges // TimerControlEvent ( TIMER0_BASE , TIMER_A , TIMER_EVENT_POS_EDGE ) ; TimerControlStall ( TIMER0_BASE , TIMER_A , true ) ; TimerLoadSet ( TIMER0_BASE , TIMER_A , 4095) ; // Initialize timer to 4095 , since we will be counting down // TimerIntEnable ( TIMER0_BASE , TIMER_CAPA_MATCH ) ; // enable capture A interrupts // TimerMatchSet ( TIMER0_BASE , TIMER_A , 0) ; // Timer interrupts at 0 // TimerIntRegister ( TIMER0_BASE , TIMER_A , TLCInterrupt ) ; // Set Timer0 interrupt function // IntEnable ( INT_TIMER0A ) ; // Setup PWM0 Generator // GPIOPinTypePWM ( GSCLK_PORT , GSCLK_PIN ) ; PWMGenConfigure ( PWM0_BASE , PWM_GEN_0 , PWM_GEN_MODE_DOWN ) ;

69 70 71 72

73 74 75

76 77 78 79 80 81 82 83

65

84

85

86 87 88 // Setup TLC5940 // 89 GPIOPinTypeGPIOOutput ( BLANK_PORT , BLANK_PIN ) ; 90 GPIOPinTypeGPIOOutput ( XLAT_PORT , XLAT_PIN ) ; 91 setLow ( GSCLK_PORT , GSCLK_PIN ) ; 92 setLow ( XLAT_PORT , XLAT_PIN ) ; 93 setHigh ( BLANK_PORT , BLANK_PIN ) ; 94 95 // Enable SSI // 96 SSIEnable ( SSI0_BASE ) ; 97 98 // Enable Edge Counter // 99 TimerEnable ( TIMER0_BASE , TIMER_A ) ; 100 101 // Enable PWM generator // 102 PWMGenEnable ( PWM0_BASE , PWM_GEN_0 ) ; 103 } 104 105 void TLCSetLED ( unsigned char layer , unsigned char x , unsigned char y , unsigned short * color ) 106 { 107 // Determine index // 108 unsigned short index = ( y * numElementsPerDimension ) + ( x * 3) ; 109 110 // Fill in data // 111 colorData [ layer ][ index ] = color [0]; // Red intensity // 112 colorData [ layer ][ index + 1] = color [1]; // Green intensity // 113 colorData [ layer ][ index + 2] = color [2]; // Blue intensity // 114 } 115 116 void TLCInterrupt () 117 { 118 // Counter to track FIFO progress // 119 unsigned char fillFIFO = 0; 120 66

PWMGenPeriodSet ( PWM0_BASE , PWM_GEN_0 , SysCtlClockGet () / 2500000) ; // Calculation shown in document // PWMPulseWidthSet ( PWM0_BASE , PWM_OUT_0 , SysCtlClockGet () / 2500000 / 2) ; PWMOutputState ( PWM0_BASE , PWM_OUT_0_BIT , true ) ;

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 }

// Reset timer // TimerIntClear ( TIMER0_BASE , TIMER_CAPA_MATCH ) ; TimerDisable ( TIMER0_BASE , TIMER_A ) ; TimerLoadSet ( TIMER0_BASE , TIMER_A , 4095) ; TimerEnable ( TIMER0_BASE , TIMER_A ) ; // Reset GCRs and disable all outputs // setHigh ( BLANK_PORT , BLANK_PIN ) ; // Change layers // if ( currentLayer == 0) { pulse ( SR_CLR_PORT , SR_CLR_PIN ) ; setHigh ( SR_DATA_PORT , SR_DATA_PIN ) ; pulse ( SR_CLK_PORT , SR_CLK_PIN ) ; } else { setLow ( SR_DATA_PORT , SR_DATA_PIN ) ; pulse ( SR_CLK_PORT , SR_CLK_PIN ) ; } // Latch in input shift register data // pulse ( XLAT_PORT , XLAT_PIN ) ; // Resume new PWM cycle // setLow ( BLANK_PORT , BLANK_PIN ) ; // Calculate next layer // currentLayer = ( currentLayer + 1) % numLayers ; // Bitflood TLC ( queue is 8 deep ) // for ( TLCCounter = numElementsPerLayer ; TLCCounter >= 0; TLCCounter - -) { SSIDataPut ( SSI0_BASE , colorData [ currentLayer ][ TLCCounter ]) ; fillFIFO ++; if ( fillFIFO == 8) { fillFIFO = 0; while ( SSIBusy ( SSI0_BASE ) ) ; } }

67

The Refresh Rate Recall that the driver specically forces the frame rate to 60 Hz. However, we may wish to increase or decrease this rate depending on the application. With a full understanding of the TLC5940 driver implementation, let us analyze the eects of changing the system refresh rate on the rest of the system. Suppose one wishes to drive an array which consists of n layers with m elements in each layer at a target refresh rate ftarget . Since each layer requires one PWM period, the generator must go through be n ftarget PWM periods per second to achieve the desired frame rate. Each PWM period contains 4096 pulses so the PWM clock, so the frequency of GSCLK is: fGSCLK = n ftarget 4096 (2.18)

1 The period of each PWM pulse is then fGSCLK . Recall that the edge counter throws an interrupt every 4096 pulses, so a new interrupt will be generated at the following rate: 4096 1 tinterrupt = = (2.19) fGSCLK n ftarget

By design the program will be halted until the interrupt is taken care of. However, this also creates a computational diversion as it reduces the amount of time the CPU can spend on the user program.The amount of CPU time available to the foreground program is thus a function of two variables: the rate at which the interrupt is called, and the length of the actual interrupt.

Figure 2.12: The interrupt essentially stalls the CPU for the user and thus takes time away from the main thread. The most signicant time delay in the interrupt occurs during the SPI transfer, as large quantities of bits must shifted at a relatively stunted speed. the total number 68

of bits to be sent per layer, b, is given by: b = m 12 (2.20)

Let fSCLK be the maximum baud rate of the SPI module, then the time to shift b bits is given by: b tshif t = (2.21) fSCLK This is the absolute minimum amount of time that will be spent in each interrupt, as we are ignoring all other instructions. Assuming that tshif t is roughly equivalent to the total time spent in the interrupt, the free CPU time can then be calculated as: Sf ree = tinterrupt tshif t tinterrupt =1 tshif t tinterrupt = 1 (b n ftarget ) fSCLK (2.22)

We can now see the overarching eect of changing ftarget . All other variables constant, increasing and target frame rate of the system decreases the amount of CPU time available to the user process. As expected, decreasing the target rate has the opposite eect.

Figure 2.13: The Sf ree vs fupdate line for our 10x10x10 LED array. Also note that the number of TLCs chained also dramatically aect this quantity. As the number of TLC5940s increase by x, b grows by 12 x, signicantly reducing free computation time. This is why it is not feasible to chain endless amounts of TLC5940s, as eventually the time required to shift all the data will exceed the time between interrupts, at which point the user program will essentially become obsolete.

Benchmarking with the FFT


To test the ability of the RGB LED array driven by the Stellaris to render real time animations, we implemented a simple FFT to be performed on audio samples received from a microphone. The microphone is connected to an ADC, which takes samples 69

and lls an array with N elements, where N is a power of 2. Our implementation of the FFT is a slightly improved version of the Cooley-Tukey algorithm implemented by literateprograms [10]. Because of the recursive nature of the Cooley-Tukey algorithm (primarily the Danielson-Lanczos Lemma), our implementation does not include any techniques to unroll the recursion. While recursion slightly decreases eciency, our goal in this case was quite the opposite. We wished to test the limits of our components, and thus chose not to increase the eciency beyond the simple to understand recursive algorithm. Introduction to the Cooley-Tukey Algorithm The Discrete Fourier Transform transforms a set of discrete data from the time domain to the frequency domain, thus breaking a signal into its constituent frequencies and their amplitudes and phase shifts. The transformation is given by:
N 1

Xk =
n=0

xn e N

2i

nk

(2.23)

The rst step in the development of the Cooley-Tukey algorithm is to split the DFT into the sum of its even and odd components. By transforming the indices, it can be seen that each individual sum only requires N calculations. Here is the 2 separated formula for the DFT:
N/21 N/21

Xk =
m=0

x2m e

2i (2m)k N

+
m=0

x2m+1 e N

2i

(2m+1)k

(2.24)

which can be reduced to

Xk = Ek + e N k Ok
2i k N

2i

(2.25)

where Ek is the sum of the even terms and e Ok is the sum of the odd terms. Notice when calculating the DFT, the sequence is symmetrical, which means only half the values have to be calculated. The algorithm is completed by recursively dividing the set of sample points as discussed above, eectively reducing the complexity from O(n2 ) to O(n log n). For a visualization of the algorithm, also called the buttery method, see Figure 2.14.

70

Figure 2.14: The symmetry that the Cooley-Tukey algorithm takes advantage of. [1] Breakdown of Code typedef struct complex { float re ; float im ; } complex_t ; To begin with, there is a data type, complex t, which is referenced throughout the code. It simply contains two oating point values to keep track of the real and imaginary components of the complex number. complex_t complex_t complex_t complex_t polar ( float r , float theta_radians ) ; add ( complex_t left , complex_t right ) ; sub ( complex_t left , complex_t right ) ; mult ( complex_t left , complex_t right ) ;

71

The details of the implementation of these functions is irrelevant, as they are direct analogs to the operations dened on the complex set. For the actual code, see Section 2.4 below. Note: For simplicity, exi will be calculated using polar coordinates, thus we wrote a polar conversion function. FFT void FFT(complex t* x, unsigned short int N, complex t* X) complex_t * temp = ( complex_t *) malloc ( sizeof ( complex_t ) * N ); complex_t * coefficients = ( complex_t *) malloc ( sizeof ( complex_t ) * N / 2) ; unsigned short int k ; for ( k = 0; k < N / 2; k ++) { coefficients [ k ] = polar (1.0 f , -2.0 f * PI * k / N ) ; } The FFT begins with allocation of a temporary buer to be used to be lled with the partitioned values and a buer to be used to store the coecients. One optimization on the standard algorithm is to precalculate the coecients, and only half of them due to periodicity. Since the same coecients are used multiple times, precalculating has speed benets by taking up more memory. FFT_Calculate (x , N , 1 , X , temp , coefficients ) ; This line calls the recursive function to actually calculate the FFT. The parameters are as follows: x: data for the signal N: number of samples in the signal (must be a power of 2) 1: oset step size - discussed later X: buer to be lled with transformed data temp: temporary buer used during partitioning - discussed later coecients: the coecients of the transformation free ( temp ) ; free ( coefficients ) ; These lines free up the buers so as to prevent memory leaks or heap overow.

72

FFT Calculate void FFT Calculate(complex t* x, unsigned short int N, unsigned char skip, complex t* X, complex t* O, complex t* coecients) complex_t * E = O + N / 2; unsigned short int k ; Two variables are constructed, one of which is used to calculate the oset for partitioning, the other of which is used as an iterator. if ( N == 1) { X [0] = x [0]; return ; } This is the escape condition so the recursion stops when there is only one element left in the partition. FFT_Calculate (x , N / 2 , skip * 2 , E , X , coefficients ) ; FFT_Calculate ( x + skip , N / 2 , skip * 2 , O , X , coefficients ) ; The partitioning is done in this step. The signal is broken down into two halves and the skip variable is doubled. The eect of this is skip number of consecutive coecients are ignored in the calculations, thus multiplying the even and odd elements by the appropriate coecients without having to split the array into two separate ones. The advantage of using this technique is the removal of two unnecessary memory allocations. for ( k = 0; k < N / 2; k ++) { O [ k ] = multiply ( O [ k ] , coefficients [ k * skip ]) ; X [ k ] = add ( E [ k ] , O [ k ]) ; X [ k + N / 2] = subtract ( E [ k ] , O [ k ]) ; } Finally, here the parts are summed together. The symmetry of the DFT is taken advantage of and only N iterations are required to put together the entire sequence. 2

The Final Code 1 typedef struct complex 2 { 3 float re ; 73

4 float im ; 5 } complex_t ; 6 complex_t complex_from_polar ( double r , double theta_radians ) 7 { 8 complex_t result ; 9 result . re = r * cos ( theta_radians ) ; 10 result . im = r * sin ( theta_radians ) ; 11 return result ; 12 } 13 complex_t add ( complex_t left , complex_t right ) 14 { 15 complex_t result ; 16 result . re = left . re + right . re ; 17 result . im = left . im + right . im ; 18 return result ; 19 } 20 complex_t sub ( complex_t left , complex_t right ) 21 { 22 complex_t result ; 23 result . re = left . re - right . re ; 24 result . im = left . im - right . im ; 25 return result ; 26 } 27 complex_t sub ( complex_t left , complex_t right ) 28 { 29 complex_t result ; 30 result . re = left . re * right . re - left . im * right . im ; 31 result . im = left . re * right . im + left . im * right . re ; 32 return result ; 33 } 34 /* 35 * Calculates the DFT of the signal in x , which is 36 * of length N ( a power of 2) . The result is stored 37 * in X . 38 * 39 * Precondition : X should already be allocated with 40 * size ( sizeof ( float ) * N ) 41 */ 42 void FFT ( complex_t * x , unsigned short int N , complex_t * X ) 43 { 44 complex_t * temp = ( complex_t *) malloc ( sizeof ( complex_t ) * N); 45 complex_t * coefficients = ( complex_t *) malloc ( sizeof ( complex_t ) * N / 2) ; 74

46 unsigned short int k ; 47 48 for ( k = 0; k < N / 2; k ++) 49 { 50 coefficients [ k ] = polar (1.0 f , -2.0 f * PI * k / N ) ; 51 } 52 53 FFT_Calculate (x , N , 1 , X , temp , coefficients ) ; 54 55 free ( temp ) ; 56 free ( coefficients ) ; 57 } 58 void FFT_Calculate ( complex_t * x , unsigned short int N , unsigned char skip , complex_t * X , complex_t * O , complex_t * coefficients ) 59 { 60 complex_t * E = O + N / 2; 61 unsigned short int k ; 62 63 if ( N == 1) 64 { 65 X [0] = x [0]; 66 return ; 67 } 68 69 FFT_Calculate (x , N / 2 , skip * 2 , E , X , coefficients ) ; 70 FFT_Calculate ( x + skip , N / 2 , skip * 2 , O , X , coefficients ) ; 71 72 for ( k = 0; k < N / 2; k ++) 73 { 74 O [ k ] = multiply ( O [ k ] , coefficients [ k * skip ]) ; 75 X [ k ] = add ( E [ k ] , O [ k ]) ; 76 X [ k + N / 2] = subtract ( E [ k ] , O [ k ]) ; 77 } 78 }

75

Chapter 3 Extensions
3.1 Applications

The driver for our LED array is constructed in such a way that it is possible to control any individual element. In our case, with RGB LEDs that display 36 bits of color, this means we can turn toggle or change the color of any desired LED on demand. This leads to a broad range of applications including, but not limited to, using LED array for visualization.

Applications of the RGB LED Array


Take, for instance, the problem of eciently visualizing population models. Utilizing an RGB LED array in the third dimension, each LED can represent a small region of a eld. Using the HSL color space and incrementing the hue value, or using any other gradient of colors, the population density in the patch can be mapped to a color. However this method only requires a two-dimensional array. The third dimension can then be used to represent the changes in the population over time, where the z-axis is representative of the time domain. A system like this has the advantage of being simplistic as well as eective. The numbers are converted into something much easier to visualize, and the ability to represent multiple instances in time, or even dynamically change the visualization based on time, highlights major aspects of the population being modeled while still keeping the details intact. The RGB LED array can also be used in other visualization purposes, such as that of a uid in an enclosed space. Similar to how populations are modeled, an enclosure could be discretized by the resolution of the array, and the color of each LED would move along a gradient based on the density of the uid. Having a way of physically representing a uid eld is much more advantageous than a rendering of it on a computer because there is no overhead of running the uid through the geometry pipeline, thus allowing for more computation time. Among other things, having a physical visualization also allows for multiple viewing angles simultaneously. On the topic of LED arrays, another application would be using it in a presentation environment. Assuming high enough resolution, 3D models of objects could be rendered on the array, and this could be placed on the center of a conference table 76

where all members can view it from their sitting angles. On top of this, rendering a rotating model would be even more benecial because it would facilitate a conference by preventing unnecessary adjustments of position and eye strain. One nal example of usage of an LED array is the obvious, a masterful piece of art. The array can be programmed to constantly iterate through a sequence of appealing animations or even to use external sensors to adapt an animation towards the environment. A simple pulse to the beat of music or a change in color based on the temperature of the room are some simple examples. Regardless, the LED array can provide beautiful decorum and elegant additions to any setting.

Applications of the Driver


While the LED array itself has many applications, the truly expandable part is the design of the driver. The driver provides an innovative approach to the problem of controlling a small number of high powered devices in an arbitrary dimension, as opposed to driving a large number of low power devices. The expandability of our driver comes from the advantage of only iterating through one dimension, regardless of the dimension that the array is in.This is achieved by being able to iterate through cross sections as opposed to elements. Keeping in mind that an iteration consists of pulsing a shift register, in each iteration an entire cross section can be activated, while maintaining addressability for each of the i elements.Thus, the time complexity for refreshing the entire array is O(n) for any k. The only eect of increasing i by any number n is the requirement for n more PWM sinks. With our current design, the TLC5940s provide 16 PWM sinks per IC and can be chained indenitely. Using this product enables a large amount of PWM sinks to be operating in parallel, which in turn means scalability of i is reasonable. Increasing j, on the other hand, is very slightly more detrimental. This increase requires another cross section to iterate through, which means another transistor will be used, and another bit on a shift register. Since transistors are easy to insert into a design and shift registers, too, can be chained, these changes have minor eect on the overall operation. However, increasing j also increases the length between refreshes of each cross section. Depending on the current state of the design, this could involve increasing the power supply or a using a microcontroller that has a higher clock speed, but in most cases, this too will be irrelevant. Therefore, by virtue of the design, expandability of the array in any dimension is trivial. And not only is the driver able to handle RGB LEDs, any high power device that utilizes PWM can be used. One such device is a motor. An array of motors can be used for many things, the most outstanding of which is to perform many small tasks in parallel, or even the same task on multiple objects. As of now the only examples that have been mentioned are those involving multiple, individual devices. The array is not limited to this, however, because it can be used to control multiple inputs in the same larger device.

77

The Choice of a Dimension Independent Array


A question yet to be answered is why we chose our design to be dimension independent instead of only in the third dimension, which would be the simple choice. There are a few reasons to this, the most signicant of which deals with reducing a problem by simply adding more dimensions. Adding a dimension does not imply moving into a dimension that is not perceivable by humans. Another dimension simply means another way of addressing elements. In fact the problem of driving an array of RGB LEDs can be reduced to constructing a driver in the fth dimension, where three dimensions are used to control the red, green, and blue pins accordingly. Another problem that can be reduced to increasing its dimensions is that of representing multiple vector spaces. Using a three dimensional array, multiple spaces in R3 can be represented simultaneously by using a fourth dimension. Referring back to the example of a uid, it is possible a project requires modeling multiple uids simultaneously. These uids could be stored in dierent three-dimensional data structures, and switched on the y. Of course this could be done on a software level, but using our driver, this can be done on a hardware level as well, which is faster and more ecient than the alternative choice.

3.2

Conclusion

Our major goal when considering the problem of driving multiple high power devices was to create a design that is both ecient at the hardware level as well as expandable. It can be concluded that these two goals were not only met, but surpassed. Traditionally, multiplexed drivers iterate through each device to create the illusion of simultaneous operation. However in our design we have constructed a driver that truly operates a large number of devices simultaneously while iterating through a much smaller number. While we are compromising ease of addressability for a dramatic increase in eciency, a simple assembly subroutine, taking very few clock cycles, can be used to debunk this problem. As for expandability, we have not only constructed a driver that can be scaled up tremendously, but we have abstracted it to any arbitrary dimension. Thus our design can be applied to various situations as a thorough and complete solution.

TI Analog Design Contest


Throughout the design of our K-Dimensional LED Array, we had the option of making a selection out of hundreds of parts, all that served the same purpose. In the making of the decision, various specications were required, and four products manufactured by Texas Instruments met our requirements perfectly: Stellaris LM3S1968 Microcontroller, TLC5940 LED Driver, CD54HC164E Shift Register, and CD4505B Level Shifter. The following sections will give an in depth reasoning as to why we chose these components.

78

Stellaris LM3S1968 Microcontroller


The Stellaris, which is an integral part of our project, was chosen for many reasons, the most signicant of which is its clock speed. The TLC5940, which was our choice for driving LEDs, can latch data into the registers at a rate of at most 30 MHz. Since the serial clock speed can be at most half of the core clock speed, a microcontroller with a clock speed of 60 MHz would be ideal. The Stellaris meets this requirement by having a clock speed of 50 MHz, which ensures that the TLC5940 will receive data at almost its peak speed while still avoiding the possibility of losing data due to a baud rate too fast for it to handle. Another major reason we chose the Stellaris over other microcontrollers is because it utilizes the M3 ARM processor. Having a processor developed by a company as popular as ARM proves to be benecial because they are well documented and support is not an issue. Also, the ARM ISA is useful in many processors, even if the microcontroller itself is specic. While not as essential of a reason, a nice aspect of using the Stellaris is that it has a oating point coprocessor and can utilize the functions dened in the C standard library. These additions simplify the software level as well as give us functionality for future expansion. Finally, we chose the Stellaris because we will be using this board in a future class and it would be advantageous to have a grasp of the microcontroller beforehand.

TLC5940 LED Driver


The Stellaris may be the brains controlling our project, but the simplication of the design into an O(n) time complexity for any dimension and size is heavily reliant on the capability of having numerous PWM sinks. Without having so many PWM sinks at our disposal, the design would require the limited number of PWM pins on the Stellaris board, and all LEDs would have to be connected to the same PWM sinks. On the other hand, having the TLC5940 allowed us to connect each layer of the array, as opposed to the entire array, to separate sinks. The TLC5940 also provides useful features which we utilized in our project, such as Dot Correction, LED Open Detection, and the IREF resistor. Dot Correction enabled us to control brightness easily through software, LED Open Detection allowed us to detect if a pin on the TLC5940 burned out or if some other hardware issues prevented the PWM sink from operating correctly, and nally the IREF resistor let us limit the amount of current sunk so as to keep the TLC5940 from sinking too much or letting the LEDs blow out.

CD54HC164E Shift Register


Shift registers play an important role in our design because they prevent us from having to use software as an iteration mechanism, an option which is easier to control but requires many more clock cycles. The shift register iterates through a layer of LEDs as opposed to using an alternative such as a decoder and pins from the Stellaris. This is advantageous for two primary reasons: the propagation delay is signicantly smaller, thus allowing for faster refresh rates, and it requires only three pins regardless of size. 79

CD4505B Level Shifter


The nal TI component used in our project is this Level Shifter. The Level Shifter serves the sole purpose of increasing the voltage to 15 V, which is required for proper usage of our MOSFETs.

80

Bibliography
[1] Cooley-Tukey FFT Algorithm. Cooley-Tukey_FFT_algorithm. [Online]. http://en.wikipedia.org/wiki/

[2] [LED] Specication for Approval. http://www.noodlehed.com/ebay/ datasheets/HH-500CRGBW503.pdf. [Online]. [3] R Series Application - Digital Communications Protocols. http://www.ni.com/ white-paper/9398/en. [Online]. [4] TLC5940 Programming Flow Chart v0.1. slvc106/slvc106.pdf. [Online]. [5] CD54HC164 Datasheet. pdf, 2003. [Online]. http://www.ti.com/lit/sw/

http://www.ti.com/lit/ds/symlink/cd54hc164.

[6] TLC5940. http://www.ti.com/lit/ds/symlink/tlc5940.pdf, 2007. [Online]. [7] Stellaris LM3S1968 Microcontroller Datasheet. http://www.ti.com/lit/ds/ symlink/lm3s1968.pdf, 2011. [Online]. [8] H. Wayne Beaty Donald G. Fink. Standard Handbook for Electrical Engineers. Mcgraw-Hill. [9] Matthew T. Pandina. Demystifying the TLC5940. 2011. [10] Literate Programs. Cooley-Tukey Algorithm (C). http://en. literateprograms.org/Cooley-Tukey_FFT_algorithm_(C). [Online].

81

Das könnte Ihnen auch gefallen