Beruflich Dokumente
Kultur Dokumente
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a
whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol
U+0020 SPACE (also ASCII 32) represents a blank space punctuation character in text, used as a word divider in Western scripts.
Contents
Overview
Definition and ambiguity
Unicode
Substitutes
Overview
With many keyboard layouts, a horizontal whitespace character may be entered through the use of a spacebar .
Horizontal whitespace may also be entered on many keyboards through the use of the Tab ↹ key, although the length
of the space may vary. Vertical whitespace is a bit more varied as to how it is encoded, but the most obvious in typing
is the ↵ Enter result which creates a 'newline' code sequence in applications programs. Older keyboards might instead
say Return , abbreviating the typewriter keyboard meaning 'Carriage-Return' which generated an electromechanical
return to the left stop (CR code in ASCII-hex &0D;) and a line feed or move to the next line (LF code in ASCII-hex
&0A;); in some applications these were independently used to draw text cell based displays on monitors or for printing
on tractor-guided printers—which might also contain reverse motions/positioning code sequences allowing text-based
output devices to achieve more sophisticated output. Many early computer games used such codes to draw a screen
(e.g. Kingdom of Kroz), and word processing software would use this to produce printed effects such as bold,
underline, and strikeout.
Relative widths of various
spaces in Unicode
The term "whitespace" is based on the resulting appearance on ordinary paper. However they are coded inside an
application, whitespace can be processed the same as any other character code and programs can do the proper action
as defined for the context in which they occur.
Unicode
The table below lists the twenty-five characters defined as whitespace ("WSpace=Y", "WS") characters in the Unicode Character Database.[1] Seventeen use
a definition of whitespace consistent with the algorithm for bidirectional writing ("Bidirectional Character Type=WS") and are known as "Bidi-WS"
characters. The remaining characters may also be used, but are not of this "Bidi" type.
Note: Depending on the browser and fonts used to view the following table, not all spaces may be displayed properly.
Unicode characters with White_Space property[a][b]
Width May In General
Name Code point Script Block Notes
box break? IDN? category
HT, Horizontal Tab.
CHARACTER Other,
U+0009 9 Yes No Common Basic Latin HTML/XML named entity:
TABULATION control
	, LaTeX: '\tab'
ZWNJ, zero-width
non-joiner. When
placed between two
characters that would
ZERO otherwise be
WIDTH Context- General Other, connected, a ZWNJ
U+200C 8204 Yes ?
NON- dependent[7] Punctuation Format causes them to be
JOINER printed in their final
and initial forms,
respectively.
HTML/XML named
entity: ‌
ZWJ, zero-width
joiner. When placed
between two
characters that would
ZERO
Context- otherwise not be
General Other,
WIDTH U+200D 8205 Yes ? connected, a ZWJ
dependent[8] Punctuation Format
JOINER causes them to be
printed in their
connected forms.
HTML/XML named
entity: ‍
WJ, word joiner.
Similar to U+200B,
but not a point at
WORD General Other,
U+2060 8288 No No ? which a line may be
JOINER Punctuation Format
broken. HTML/XML
named entity:
⁠
Substitutes
Unicode also provides some visible characters that can be used to represent whitespace:
U+237D 9085 Shouldered open box Miscellaneous Technical ⍽ Used to indicate a NBSP
Non-space blanks
The Braille Patterns Unicode block contains U+2800 ⠀ BRAILLE PATTERN BLANK (HTML ⠀), a Braille pattern with no dots raised.
Some fonts display the character as a fixed-width blank, however the Unicode standard explicitly states that it does not act as a space.
Exact space
The Cambridge Z88 provided a special "exact space" (code point 160 aka 0xA0) (invokable by key shortcut ⌑ + SPACE ,[13]) displayed as
"…" by the operating system's display driver.[14][15] It was therefore also known as "dot space" in conjunction with BBC BASIC.[14][15]
Under code point 224 (0xE0) the computer also provided a special three-character-cells-wide SPACE symbol "SPC" (analogous to
Unicode's single-cell-wide U+2420).[14][15]
On-screen display
Text editors, word processors, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the
ends of lines longer than the screen or column width. In some cases, spaces are shown simply as blank space; in other cases they may be represented by an
interpunct or other symbols. Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and
tab settings) can also affect whitespace.
In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.
Computing applications
Programming languages
In programming language syntax, spaces are frequently used to explicitly separate tokens. In most languages multiple whitespace characters are treated the
same as a single whitespace character (outside of quoted strings); such languages are called free-form. In a few languages, including Haskell, occam, ABC,
and Python, whitespace and indentation are used for syntactical purposes. In the satirical language called Whitespace, whitespace characters are the only valid
characters for programming, while any other characters are ignored.
Excessive use of whitespace, especially trailing whitespace at the end of lines, is considered a nuisance. However correct use of whitespace can make the
code easier to read and help group related logic.
Most languages only recognize ASCII characters as whitespace, or in some cases Unicode newlines as well, but not most of the characters listed above. The C
language defines whitespace characters to be "space, horizontal tab, new-line, vertical tab, and form-feed".[17] The HTTP network protocol requires different
types of whitespace to be used in different parts of the protocol, such as: only the space character in the status line, CRLF at the end of a line, and "linear
whitespace" in header values.[18]
Web markup languages such as XML and HTML treat whitespace characters specially, including space characters, for programmers' convenience. One or
more space characters read by conforming display-time processors of those markup languages are collapsed to 0 or 1 space, depending on their semantic
context. For example, double (or more) spaces within text are collapsed to a single space, and spaces which appear on either side of the "=" that separates an
attribute name from its value have no effect on the interpretation of the document. Element end tags can contain trailing spaces, and empty-element tags in
XML can contain spaces before the "/>". In these languages, unnecessary whitespace increases the file size, and so may slow network transfers. On the other
hand, unnecessary whitespace can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an
infringement of license or copyright that was committed by copying and pasting.
In XML attribute values, sequences of whitespace characters are treated as a single space when the document is read by a parser.[19] Whitespace in XML
element content is not changed in this way by the parser, but an application receiving information from the parser may choose to apply similar rules to
element content. An XML document author can use the xml:space="preserve" attribute on an element to instruct the parser to discourage the
downstream application from altering whitespace in that element's content.
In most HTML elements, a sequence of whitespace characters is treated as a single inter-word separator, which may manifest as a single space character
when rendering text in a language that normally inserts such space between words.[20] Conforming HTML renderers are required to apply a more literal
treatment of whitespace within a few prescribed elements, such as the pre tag and any element for which CSS has been used to apply pre-like whitespace
processing. In such elements, space characters will not be "collapsed" into inter-word separators.
In both XML and HTML, the non-breaking space character, along with other non-"standard" spaces, is not treated as collapsible "whitespace", so it is not
subject to the rules above.
File names
Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names
instead use an underscore (_) as a word separator, as_in_this_phrase.
Another such symbol was U+2422 ␢ BLANK SYMBOL. This was used in the early years of computer programming when writing on coding forms. Keypunch
operators immediately recognized the symbol as an "explicit space".[10] It was used in BCDIC,[10] EBCDIC,[10] and ASCII-1963.[10]
See also
Carriage return
Form feed
Indent style
Line feed
Newline
Programming style
Prosigns for Morse code
Regular expression#Character classes for the white-space character class.
Space bar
Space (punctuation)
Tab key
Trimming (computer programming)
Whitespace (programming language)
Zero-width space
References
1. "The Unicode Standard" (http://unicode.org/versions/latest/). Unicode Consortium.
2. "Character design standards – space characters" (https://web.archive.org/web/20100314135826/https://www.microsoft.com/typography/de
velopers/fdsspec/spaces.htm). Character design standards. Microsoft. 1998–1999. Archived from the original (http://www.microsoft.com/typ
ography/developers/fdsspec/spaces.htm) on August 23, 2000. Retrieved 2009-05-18.
3. The Unicode Standard 5.0, printed edition, p.205
4. "General Punctuation" (https://www.unicode.org/charts/PDF/U2000.pdf) (PDF). The Unicode Standard 5.1. Unicode Inc. 1991–2008.
Retrieved 2009-05-13.
5. Sargent, Murray III (2006-08-29). "Unicode Nearly Plain Text Encoding of Mathematics (Version 2)" (https://www.unicode.org/notes/tn28/tn
28-2.html). Unicode Technical Note #28. Unicode Inc. pp. 19–20. Retrieved 2009-05-19.
6. Gillam, Richard (2002). Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard. Addison-Wesley. ISBN 0-201-
70052-2.
7. Faltstrom, P., ed. (August 2010). "Zero Width Non-Joiner" (https://tools.ietf.org/html/rfc5892#appendix-A.1). The Unicode Code Points and
Internationalized Domain Names for Applications (IDNA) (https://tools.ietf.org/html/rfc5892). IETF. sec. A.1. doi:10.17487/RFC5892 (https://
doi.org/10.17487%2FRFC5892). RFC 5892. Retrieved September 4, 2019.
8. Faltstrom, P., ed. (August 2010). "Zero Width Joiner" (https://tools.ietf.org/html/rfc5892#appendix-A.2). The Unicode Code Points and
Internationalized Domain Names for Applications (IDNA) (https://tools.ietf.org/html/rfc5892). IETF. sec. A.2. doi:10.17487/RFC5892 (https://
doi.org/10.17487%2FRFC5892). RFC 5892. Retrieved September 4, 2019.
9. "Unicode Standard Annex #44, Unicode Character Database" (http://www.unicode.org/reports/tr44/#White_Space).
10. Mackenzie, Charles E. (1980). Coded Character Sets, History and Development (https://books.google.com/books?id=6-tQAAAAMAAJ).
The Systems Programming Series (1 ed.). Addison-Wesley Publishing Company, Inc. pp. 41, 47, 52, 102–103, 117, 119, 130, 132, 141,
148, 150–151, 212, 424. ISBN 978-0-201-14460-4. LCCN 77-90165 (https://lccn.loc.gov/77-90165). Retrieved 2016-05-22. [1] (https://web.
archive.org/web/20160526172151/https://textfiles.meulie.net/bitsaved/Books/Mackenzie_CodedCharSets.pdf)
11. "American Standard Code for Information Interchange, ASA X3.4-1963" (http://worldpowersystems.com/archives/codes/X3.4-1963/index.ht
ml). American Standards Association (ASA). 1963-06-17. Archived (https://web.archive.org/web/20160526195837/http://worldpowersystem
s.com/archives/codes/X3.4-1963/index.html) from the original on 2016-05-26. Retrieved 2014-05-23.
12. Niklaus Wirth, Programming in Modula-2 (https://link.springer.com/content/pdf/bfm%3A978-3-642-83565-0%2F1.pdf)
13. "Cambridge Z88 User Guide" (https://cambridgez88.jira.com/wiki/display/UG/The+keyboard). 4.7 (4th ed.). Cambridge Computer Limited.
2016 [1987]. Basic concepts - The keyboard. Archived (https://web.archive.org/web/20161212173159/https://cambridgez88.jira.com/wiki/di
splay/UG/The+keyboard) from the original on 2016-12-12. Retrieved 2016-12-12.
14. "Cambridge Z88 User Guide" (https://cambridgez88.jira.com/wiki/display/UG40/Appendix+D+-+Character+set). 4.0 (4th ed.). Cambridge
Computer Limited. 1987. Appendix D. Archived (https://web.archive.org/web/20161212173345/https://cambridgez88.jira.com/wiki/display/U
G40/Appendix+D+-+Character+set) from the original on 2016-12-12. Retrieved 2016-12-12.
15. "Cambridge Z88 User Guide" (https://cambridgez88.jira.com/wiki/display/UG/Appendix+D+-+Character+set). 4.7 (4th ed.). Cambridge
Computer Limited. 2015 [1987]. Appendix D. Archived (https://web.archive.org/web/20161212173256/https://cambridgez88.jira.com/wiki/di
splay/UG/Appendix+D+-+Character+set) from the original on 2016-12-12. Retrieved 2016-12-12.
16. Usage of the different dash types is illustrated, e.g., in The Chicago Manual of Style, §§ 6.80, 6.83–6.86
17. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf Section 6.4, paragraph 3
18. Fielding, R.; et al., "2.2 Basic Rules", Hypertext Transfer Protocol—HTTP/1.1, RFC 2616 (https://tools.ietf.org/html/rfc2616)
19. "3.3.3 Attribute-Value Normalization" (http://www.w3.org/TR/REC-xml/#AVNormalize). Extensible Markup Language (XML) 1.0 (Fifth
Edition). World Wide Web Consortium.
20. "9.1 Whitespace" (http://www.w3.org/TR/html4/struct/text.html#h-9.1). W3CHTML 4.01 Specification. World Wide Web Consortium.
External links
Property List of Unicode Character Database (http://unicode.org/Public/UNIDATA/PropList.txt)
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the
Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.