Typography and TV Captioning

Closed captions represent a significant technological advance, but the quality of the type design has lagged behind. Here is a state-of-the-art report

Typography, like many of the graphic arts, is a rather static medium. By its very nature, the printed page tends to limit the liveliness of type. But that isn’t so on television, where technology helps letters come alive in ways the printed page can’t reproduce. That’s especially true in one particular video art – television captioning, or subtitling programs for the hearing-impaired.

Everyone knows that TV is a tremendous source of information and entertainment, but until recently deaf persons knew this only intellectually. Unable to hear part or, often, all of the soundtrack, the hearing-impaired viewer is cut off from much of the meaningfulness of television. The impact of TV depends largely on sound, and captioning is the best way to represent that component for a deaf audience.

The concept of captioning films for deaf viewers is decades old – British producer J. Arthur Rank exhibited a film with offscreen captions in London in 1949 – but captioning did not become popular in North America until 1977, when WGBH-TV, the Public Broadcasting Service station in Boston, telecast an episode of The French Chef with captions. The captions, which looked much like the subtitles of a foreign-language movie, were usually placed at the bottom of the screen and were set in a font resembling a cross between Helvetica and Franklin Gothic. This experimental captioned show was a success and led to the establishment of the Caption Center at WGBH, North America’s first company in the business of captioned TV.

The respected work of the Caption Center – its nightly rebroadcast of the ABC evening news attracted a large following – set a real precedent for captioned programming, and soon the deaf community began to lobby for more and more captioning. But there were problems. Not only was the Caption Center unable to accommodate the demand, but also many hearing viewers complained that the captions were distracting. At that time, captions were “open,” or visible on every set. Open captions had many advantages – different fonts and colours were available, for example, and anyone with a TV set could watch them – but their “openness” was their undoing. Unwilling to alienate the mainstream audience, broadcasters widely refused to air open-captioned programs. A better way had to be found, one which could accommodate both hearing and deaf viewers with equal ease.

Technology came to the rescue in the form of “closed” captions, computer codes transmitted along with the picture which become captions only if the viewer connects a special “decoder” to his or her set. Similar to a cable converter, a decoder translates the caption codes into characters which then appear on the TV screen. Through closed captioning, a program can be enjoyed by deaf viewers with captions and by hearing viewers without.

Closed captioning held such promise that the nonprofit National Captioning Institute (NCI) was formed in 1981 specifically ~~for~~ to closed-caption TV programming. NCI was soon followed by a northern counterpart, the Canadian Captioning Development Agency (CCDA), which executes most Canadian captioning, as well as a few small firms in both countries. (Caption companies do nothing but create the captions for prerecorded programs; they are not responsible for other aspects of a program’s content and production.)

All these businesses caption prerecorded programming in about the same way. Armed with a special videocassette, and working from a script or transcript of the program, the captioner breaks up the dialogue and other text from the show into captions of up to about 30 words in length. Then, the captioner makes decisions, based on the standards of the captioning firm where he or she works and the needs of the program, about where the captions will be located on the screen, what they will say, and when they should appear and disappear. With all this information entered into a computer, a new master version of the program is created with the caption data “encoded” into a special portion of the TV signal which is invisible on normally-adjusted home TV sets without a decoder. Encoded captions require up to 40 hours of work for a one-hour program.

While closed captioning has been a big success, it has nevertheless engendered certain typographic compromises. Here, because caption design offers clues to meaning in unique ways, type and layout aren’t just a matter of æsthetics. Unfortunately, however, the potential of caption design lies unexploited because of the low standards at most captioning firms.

Home decoders are what actually generate the characters that captioning viewers read. Built into each decoder is one and only one uniformly-spaced dot-matrix captioning typeface with italics, underlining, and a very few special and accented characters. There have been several generations of decoder fonts, with more-recent decoders offering a somewhat wider range of characters and a less ragged look. But because there is only one font at a captioner’s disposal, issues of typographic design in captioning have more to do with layout than font selection.

Captioning fonts, for many reasons, are far from perfect. Because of the lack of a complete set of accents, the fonts can’t do justice to languages other than English. Captions consist of light characters on a black background, quite the opposite of what readers are used to. Worse yet, captioners are virtually forced to caption in upper case because the letters j, q, y, p, and g in the captioning fonts’ lower case have no descenders. In these ways, closed-captioning contradicts a basic tenet of text design: For extended text, use dark-on-light type in upper and lower case.

Responsibility for these problems lies with the original North American design engineers, who have admitted that font quality was dictated by a desire to cut costs. If anything, the design of the captioning typeface should have been paramount, since good typography has everything to do with good captions, and if caption companies are serious about their type, they will contract with typographic design firms to design future generations of captioning typefaces. A good example to follow is the British Broadcasting Corporation, whose subtitling and captioning fonts, designed by the Department of Typography at the University of Reading, are a paragon of legibility, parsimony, and suitability for the medium.

Generally speaking, TV captioning has three basic responsibilities: Speaker identification (since the deaf viewer can’t necessarily rely on voices), faithful rendition of the audio, and accurate timing. Of these, speaker IDs are the hardest to get right. Unlike subtitling in foreign films, it’s part of the captioning idiom to move captions around to denote who is speaking. Alignment of a caption is a basic way to identify a speaker. The very logical open captioning produced in the ’70s by the Caption Center helped establish the contention that centred captions suggest a speaker above the axis of symmetry, while flush-left and flush-right captions suggest speakers at screen left and right, respectively.

All well and good, but closed-captioning technology gets in the way of clear IDs. Captions can’t go just anywhere; they are limited to four lines, each up to 32 characters wide, at the top of the screen and four more at the bottom. Furthermore, captions meant for the original decoder model* – as captions typically are – can be positioned only every four characters apart on each line. Most captions are flush left; centering is possible, but only in four-space increments. Right justification is practically impossible. With these constraints, captions alone frequently fail to make it clear who is speaking, particularly when the speaker is at screen right or part of a group.

Identifying a speaker involves a mix of philosophy and typography. The National Captioning Institute feels, without much evidence, that even hearing viewers cannot really tell who is speaking by voice alone; so, as Linda Carson, executive director of production at NCI, explains it, “we only show a change of speaker.” NCI captioners usually show such a change by moving successive captions to the left or right in rough relation to where the actors are situated in the frame. Such captions look like left-justified blocks moved en masse to the left or right. When the speaker is extremely ambiguous, NCI places the name of the speaker in brackets – [DIANE], for example – and usually on a different line.

NCI’s somewhat rudimentary design works well most of the time, though when captions move quickly, the practice of setting off commentary and speaker IDs with brackets just isn’t distinctive enough. The Caption Center, on the other hand, operates under a different philosophy; for them, it’s important to clarify not only that the speaker has changed, but who the new speaker is. The Caption Center is much more apt to use an explicit speaker ID, and their typography is more elegant; for example, Diane: in upper and lower case on its own line, with a colon for extra clarity.

By comparison, the Canadian Captioning Development Agency’s standards are truly bizarre. Like NCI, CCDA believes it’s necessary to show only a change of speaker, though CCDA feels there’s enough difference between a left-justified block of caption text and a centred block to do the job. But since centring occurs only in four-character increments, it is approximate at best, and many centred captions look like left-justified captions. On the whole, CCDA’s captions make it far too difficult to tell who is speaking from the caption alone, a double failure of typography and philosophy. CCDA also disregards æsthetics (not to mention the alignment conventions of all the other captioners) when it blithely sets captions shaped like a parallelogram – that is, not centred and with ragged margins on both ends. To make matters worse, CCDA’s explicit speaker IDs look like [ DIANE ]: with extra spaces inside the brackets and an unnecessary colon thereafter. CCDA doesn’t always have the good sense to set such an ID on its own line, choosing instead to run the ID and the text together on the same line. With analogues neither in print nor in other captioning, CCDA’s captions are a typographic disaster.

As if everyday speaker IDs weren’t complicated enough, offscreen speakers need their own typographic protocol. Narration is commonly captioned in italics, as are the words of a speaker who is hidden (behind a closed door, for example). Using italics for hidden speakers is a typographic custom unique to captioning. Usually, though, offscreen speakers have to be identified by name, as do sound and voice effects, part of the second responsibility of captioning. Many sounds – phone ringing, thunder, knock on door – are pertinent to the story, so they too must be captioned.

Situations like these give the captioner a chance to test the limits of closed-captioning typography. NCI captions sound effects and other commentary the same way it captions explicit speaker IDs – in capitals, between brackets. A thunderclap would come out as [THUNDER] in the NCI framework. CCDA goes one worse with [ THUNDER ] – there are those strange spaces again. The Caption Center’s conventions, on the other hand, are more sophisticated, with all commentary in lower-case italic between parentheses: ( thunder ). (The decoder forces a blank space around italic text, so the spaces between the parentheses are unavoidable.) The Caption Center is freer in its use of commentary and more eloquent in such text, giving a hearing-impaired person an experience more nearly equivalent to that of a hearing viewer. The Caption Center’s sense of layout – its care to accurately position captions, to disambiguate speakers, and to cleverly notate audio effects – sets it apart from other captioning firms. The Caption Center’s work is by far the best of the North American captioning firms. It is a model of economy, style, logic, and lucidity.

Timing, the third responsibility of captioning, is quite a dilemma. Since captions are set in the written word to represent the spoken, captions reside in a puzzling never-never land of language. Speech is faster than reading , so some editing and timing changes have to occur to let the viewer read the caption in about the time the character takes to speak. The challenge is then to edit the speaker’s words and retain both the meaning and the flavour. Where necessary, sentences are excised and terms are rearranged; occasionally, misguided captioners delete individual words under the incorrect assumption that people read word-by-word. (Does it really make sense to abbreviate the traditional wedding vow to “for better, worse, for richer, poorer,” as NCI once did? Is CCDA’s change from “I knew you had it in you” to “I knew it was in you” worth it to save one word?)

Captioning problems are often multi-layered – most prominently in the case of music. The convention is to bracket phrases of music with cute characters called staff notes (♪), which resemble eighth notes in traditional music transcription. However, for copyright reasons, songs usually can’t be edited at all. Furthermore, music relies on tempo, something difficult to render given the slow transmission rate of captions and the limits fo human reading comprehension. Songs, a special form of language, need their own special captions, but to date no captioner has made a special effort to represent singing. Though it means extra work, there would be real benefit in fine-tuning the points at which captions appear and disappear to suit the speed of the song.

In fact, music captioning reveals one way captioners have adapted conventions of typography to suit the medium rather than adopting a more innovative approach. NCI and CCDA both believe that musical phrases do not need end punctuation, though both firms will caption a question mark or an exclamation point if it’s essential. Only the Caption Center punctuates songs as if they were everyday sentences. Moreover, the Caption Center notates mood music quite intelligently: Jazzy background music comes out as (♪ soft jazz ♪). But strangely, the Caption Center never sets a comma at the end of a caption – not even before a quotation mark – because, as Caption Center director Mardi Loeterman puts it, “in the decoder font, the comma looks like a period. When a comma is absolutely necessary for the meaning, we use an em dash.” (By that logic, most commas could be mistaken for periods. So why not eliminate them altogether?) For no good reason, NCI and CCDA end the last caption of a song with two staff notes (♪♪), ostensibly to show that a song has ended. (Then why not start and finish each song with two staff notes for symmetry’s sake?)

Some other typographic innovations are very sensible. Consider an actor reading aloud from a book. NCI devised a clever means of notating that the text is a quotation: If the quotation spans more than one caption, all captions but the last have a quotation mark at the beginning but none at the end. The last caption in the quotation reverses the procedure, with a quotation mark at the end but not the beginning. (The Caption Center follows suit in its captioning, while CCDA senselessly surrounds every caption with quotation marks, making each caption look like a discrete quotation of its own.) The paradigm is an effective expansion of the convention, familiar in print, of writing all but the last paragraph of a long quotation with a quotation mark only at the beginning.

Like the exception that proves the rule, one category of programming – commercials – often forces captioners to flout captioning customs. By convention, captions for commercials must be verbatim, but they must also avoid covering up the product, onscreen titles, copyright lines, and logos. These constraints are considered more important than standard speaker identification, so captions can go wherever they fit. Captions may be omitted only when a title and the narration are identical, although often (maddeningly) the two differ by only one word when the product and logo fill most of the screen. Then, a caption designer captions only the extra word followed by an ellipsis.

Because commercials reproduce very rapid speech, captions appear and disappear faster and can dance around from corner to corner and top to bottom in a visual antiphony. Commercials are hard to caption, but watching a well-captioned spot is a rewarding experience. A rapid-fire commercial – with captions that are long enough to be sensible but short enough to avoid covering too much of the action, timed to exactly correspond with scene changes, and set so that it’s easy to figure out who is saying what and how – is a rare example of the union of typographic art and science. It’s here that the distinctive pleasure of captioned TV is most intense, combining art, communication, and living language in a one-of-a-kind medium which, for once, makes TV and reading more than just passive activities.

* There are several models of closed-caption decoder on the market. The newer models are compatible with the older, but not vice-versa. The original decoder contained the smallest character set and offered the most primitive caption-placement capabilities. Even though more-recent decoders are more sophisticated, almost all captions are designed for the lowest common denominator of the original decoder. [↑]

Typography and TV Captioning

Closed captions represent a significant technological advance, but the quality of the type design has lagged behind. Here is a state-of-the-art report

See also