You are on page 1of 49
Attorney Docket No, TESLP002 / P913-INUS. APPLICATION FOR UNITED STATES PATENT VECTOR COMPUTATIONAL UNIT By Inventor(s) Debjit Das Sarma San Jose, CA Emil Talpes San Mateo, CA Peter Bannon Woodside, CA Assignee: Tesla, Ine. VAN PELT, YI & JAMES LLP 10050 N, Foothill Blvd., Suite 200 Cupertino, CA 95014 Telephone (408) 973-2585 VECTOR COMPUTATIONAL UNIT. CROSS REFERE! TO OTHER APPLI TION! [0001] This application claims priority to U.S. Provisional Patent Application No. 62/625,251 entitled VECTOR COMPUTATIONAL UNIT filed February 1, 2018, and claims priority to U.S. Provisional Patent Application No, 62/536,399 entitled ACCELERATED MATHEMATICAL ENGINE filed July 24, 2017, and is a continuation-in-part of co-pending USS. Patent Application No. 15/710,433 entitled ACCELERATED MATHEMATICAL ENGINE filed September 20, 2017, which claims priority to U.S. Provisional Patent Application No. 62/536,399 entitled ACCELERATED MATHEMATICAL ENGINE filed July 24, 2017, all of which are incorporated herein by reference for all purposes. BACKGROUND OF THE INVEN1 [0002] Processing for machine learning and artificial intelligence typically requires performing mathematical operations on large sets of data and often involves solving multiple convolution layers and pooling layers. Machine learning and artificial intelligence techniques typically utilize matrix operations and non-linear functions such as activation functions. Applications of machine learning include self-driving and driver-assisted automobiles. In some scenarios, computer processors are utilized to perform machine learning training and inference Traditional computer processors are able to perform a single mathematical operation very quickly but typically can only operate on a limited amount of data simultaneously. As an alternative, graphical processing units (GPUs) may be utilized and are capable of performing the same mathematical operations but on a larger set of data in parallel. By utilizing multiple processor cores, GPUs may perform multiple tasks in parallel and are typically capable of completing large graphics processing tasks that utilized parallelism faster than a traditional computer processor. However, neither GPUs nor traditional computer processors were originally designed for machine learning or artificial intelligence operations, Machine leaming and artificial intelligence operations often rely on the repeated application of a set of specific machine leaning processor operations over very large datasets. Therefore, there exists a need Atomey Docket No. TESLPVO?/ POL3-INUS 1 Parise for a microprocessor system that supports performing machine learning and artificial intelligence specific processing operations on large datasets in parallel without the overhead of multiple processing cores for each parallel operation. Atomey Docket No. TESLPVO?/ POL3-INUS 2 Parise