Half-Precision Floating Point Format

Musings of an OS Plumber: Half-Precision Floating Point Format
1/31/12 6:31 PM
Musings of an OS Plumber
My blog has moved! in about 4 seconds. If Floating Point Format Half-Precisionnot, visit You should be redirected
Labels
http://blog.fpmurphy.com Half precision floating point is a 16-bit binary floating-point interchange format. It was not part of the original ANSI/IEEE 754 Standardupdate Floating-Point Arithmetic published in 1985 but is included in the current version of the standard, IEEE 754and for Binary your bookmarks. 2008 (previously known as IEEE 754r) which was published last August. See this Wikipedia article for background information.
Floating point formats defined by this standard are classified as either interchange or non-interchange. In the standard storage formats are narrow interchange formats, i.e. the set of floating point values that can be stored by the specified binary encoding is a proper subset of wider floating point formats such as the 32-bit float and 64-bit double. In particular, the standard defines the encodings for one binary storage floating-point format of 16 bits length and radix 2, and one decimal storage floating-point format of 32 bits length. Note that these two formats are for storage only and are not defined for in-memory arithmetic operations. The remainder of this post is about the 16-bit binary storage format which we will refer to as half precision. Half precision (also known as a 1.5.10 or s10e5 minifloat) was added to the standard because it is the de facto storage format for certain floating-point values frequently used in modern graphics processing units (GPUs) where minimizing memory usage and bus traffic is a major challange and priority.& It is used in several computer graphics environments including OpenGL, OpenEXR, and by hardware in MP3 decoders and nVidia graphic cards. This format became popular because it can store a larger range of values than an int16 without requiring the bandwidth and storage space of a float type. Typically this increased range of numbers is used to preserve more highlighting and shadow detail. The minimum and maximum representable values are 2.9810-8 and 65504 respectively. I only know of one C or C++ compiler which supports half precision i.e. Sourcery G++ Lite. It uses an __fp16 type to represent half precision with a number of limitations. However, whether __fp16 becomes part of ISO C remains to be seen.& The lastest version of the C++ ABI also provides some support for name mangling of half precisions. The GNU debugger appears to have some limited support. Ruby supports half precision (IEEE_binary16) using the float-formats package (but only for little endian platforms according to the float-formats README). The Python structs module which is the logical home for half-precision support does not currently support this format. A cursory search of CPAN did not reveal any modules with support for half precision in Perl. The following is a small C program which demonstrates how to encode and decode the half precision binary format. It is a based on some code from OGRE (Object-oriented Graphics Rendering Engine) header.
/* ** ** ** ** ** ** ** ** ** ** ** ** ** */
ksh93
Shell Linux XAM DBus SNIA JavaScript Fedora HAL

Java NVIDIA PowerShell SpiderMonkey Tomboy XML builtin C
CTP2 CTP3 CUDA Fedora 11 NSPR Ruby TwinView Vista XSL XSLT blogspot custom builtins AWS Accounting
Auditing Automatic voice translation C++ CCL CDROM E4X ECMAscript Extended Attributes Favicon FireBug FireFox Flointing point GNOME GOOGLE TRANSLATE GPU Google Charts Grady Booch HPET I/O IEEE 1003.1 IEEE 754 JS thread KDE event Lenovo Linux Linux Windows Porting Porting WaitForSingleObject WaitForSingleObject WaitForMultipleObject WaitForMultipleObject NtWaitForSingleObject NtWaitForMultipleObject
threads mutex mutant MIME MTTR POSIX POSIX.1e Plymouth Podcasts Python RSS SUA SimpleDB Snipping Tool The Open Group ToolTalk XSystem TraceMonkey Xinerama alarm Windows bit parallel filesystem XML ARRAYS XOP XSet XStream manipulation bootable capget capset compound-variable date dd display source code extended patterns half precision ksh93 L10N I18N localization message catalog linux capabilities nouveau printf regular expressions shell script stat subvariable
Blog Archive 2009 (35)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License, as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. IEEE 758-2008 Half-precision Floating Point Format -------------------------------------------------| Field | Last | First | Note |----------|------|-------|---------| Sign | 15 | 15 | | Exponent | 14 | 10 | Bias = 15 | Fraction | 9 | 0 |
2008 (23) December (19) IEEE Std. 1003.1-2008 Ruby 1.9.1 RC Released KSH93 Auditing and Accounting PowerShell CTP3 Listen to My Posts Fairytale of New York XSLT Variable Arrays
#include <stdio.h> #include <inttypes.h> typedef uint16_t HALF; /* ----- prototypes ------ */ float HALFToFloat(HALF); HALF floatToHALF(float); static uint32_t halfToFloatI(HALF); static HALF floatToHalfI(uint32_t); float HALFToFloat(HALF y) { union { float f; uint32_t i; } v; v.i = halfToFloatI(y); return v.f; } uint32_t static halfToFloatI(HALF y) {
Google Page Translator More on PowerShell Grady Booch Podcasts PowerShell Grows Up KSH93 Extended I/0 Half-Precision Floating Point Format XSLT Copy with Exception The Mysterious KSH93 Alarm Built-in Favicon Issues Displaying Source Code in a Post
Page 1 of 4
http://fpmurphy.blogspot.com/2008/12/half-precision-floating-point-format_14.html

{ int s = (y >> 15) & 0x00000001; int e = (y >> 10) & 0x0000001f; int f = y & 0x000003ff; // need to handle 7c00 INF and fc00 -INF? if (e == 0) { // need to handle +-0 case f==0 or f=0x8000? if (f == 0) return s << 31; else { while (!(f & 0x00000400)) { f <<= 1; e -= 1; } e += 1; f &= ~0x00000400; } } else if (e == 31) { if (f == 0) return (s << 31) | 0x7f800000; else return (s << 31) | 0x7f800000 | (f << 13); } e = e + (127 - 15); f = f << 13; return ((s << 31) | (e << 23) | f); } HALF floatToHALF(float i) { union { float f; uint32_t i; } v; v.f = i; return floatToHalfI(v.i); } HALF static floatToHalfI(uint32_t i) { register int s = (i >> 16) & 0x00008000; register int e = ((i >> 23) & 0x000000ff) - (127 - 15); register int f = i & 0x007fffff; // need to handle NaNs and Inf? if (e <= 0) { if (e < -10) { if (s) return 0x8000; else return 0; } f = (f | 0x00800000) >> (1 - e); return s | (f >> 13); } else if (e == 0xff - (127 - 15)) { if (f == 0) return s | 0x7c00; else { f >>= 13; return s | 0x7c00 | f | (f == 0); } } else { if (e > 30) return s | 0x7c00; return s | (e << 10) | (f >> 13); } } int main(int argc, char *argv[]) { float f1, f2; HALF h; printf("Please enter a floating point number: "); scanf("%f", &f1); // sign // exponent // fraction
1/31/12 6:31 PM
Vista Snipping Tool Rant KSH93 Custom Builtins II November (4)
// Plus or minus zero // Denormalized number -- renormalize it
// Inf // NaN
// sign // exponent // fraction
// handle -0.0
// Inf // NAN
// Overflow
Page 2 of 4
1/31/12 6:31 PM
h = floatToHALF(f1); f2 = HALFToFloat(h); printf("Results are: %f %f %#lx\n", f1, f2, h); }
The following example shows one way to read a single floating point number (1.0) encoded in a half precision binary floating point two byte string into a float using Python.
import struct def HalfToFloat(h): s = int((h >> 15) & 0x00000001) e = int((h >> 10) & 0x0000001f) f = int(h & 0x000003ff)
# sign # exponent # fraction
if e == 0: if f == 0: return int(s << 31) else: while not (f & 0x00000400): f <<= 1 e -= 1 e += 1 f &= ~0x00000400 print s,e,f elif e == 31: if f == 0: return int((s << 31) | 0x7f800000) else: return int((s << 31) | 0x7f800000 | (f << 13)) e = e + (127 -15) f = f << 13 return int((s << 31) | (e << 23) | f)
if __name__ == "__main__": # a half precision binary floating point string FP16='\x00\x3c' v = struct.unpack('H', FP16) x = HalfToFloat(v[0]) # hack to coerce to float str = struct.pack('I',x) f=struct.unpack('f', str) # print the resulting floating point print f[0]
I hope this post provided you with some useful information on the interesting subject of the half precision floating point binary format. I suspect within a few years most compilers and scripting langauges will support it.
2 comments: Anonymous said... This is a great blog, helped a lot. thanks

June 8, 2009 4:43 AM
Aabhas said... This is awesome! I've been trolling around the internet all day searching for something like this! Wonderful! thanks!
April 23, 2010 11:05 AM
Post a Comment
Page 3 of 4
1/31/12 6:31 PM
Enter your comment...
Comment as:
Publish
Select profile...
Preview
Newer Post
Home
Older Post
Copyright 2005-2009. All Rights Reserved.
Page 4 of 4

Half-Precision Floating Point Format

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Half-Precision Floating Point Format

Hochgeladen von

Copyright:

Verfügbare Formate

Musings of an OS Plumber: Half-Precision Floating Point Format

Shell Linux XAM DBus SNIA JavaScript Fedora HAL

Blog Archive 2009 (35)

Musings of an OS Plumber: Half-Precision Floating Point Format

Vista Snipping Tool Rant KSH93 Custom Builtins II November (4)

// Plus or minus zero // Denormalized number -- renormalize it

// sign // exponent // fraction

Musings of an OS Plumber: Half-Precision Floating Point Format

h = floatToHALF(f1); f2 = HALFToFloat(h); printf("Results are: %f %f %#lx\n", f1, f2, h); }

# sign # exponent # fraction

2 comments: Anonymous said... This is a great blog, helped a lot. thanks

Musings of an OS Plumber: Half-Precision Floating Point Format

Enter your comment...

Copyright 2005-2009. All Rights Reserved.

Das könnte Ihnen auch gefallen