unicode | Arabesque

Vim will show you the decimal, octal, and hex index of the character under the cursor if you type ga in normal mode. Keying this on an ASCII a character yields the following in the status bar:

<a>  97,  Hex 61,  Octal 141

This information can be useful, but it’s worth it to extend it to include some other relevant information, including the Unicode point and name of the character, its HTML entity name (if applicable), and any digraph entry method. This can be done by installing the characterize plugin by Tim Pope.

With this plugin installed, pressing ga over a yields a bit more information:

<a> 97, \141, U+0061 LATIN SMALL LETTER A

This really shines however when inspecting characters that are available as HTML entities, or as Vim digraphs, particularly commonly used characters like an EM DASH:

<—> 8212, U+2014 EM DASH, ^K-M, &mdash;

Or a COPYRIGHT SYMBOL:

<©> 169, \251, U+00A9 COPYRIGHT SIGN, ^KCo, ^KcO, :copyright:, &copy;

Or as one of the eyes in a look of disapproval:

<ಠ> 3232, U+0CA0 KANNADA LETTER TTHA

Note that ga shows you all the Unicode information for the character, along with any methods to type it as a digraph, and an appropriate HTML entity if applicable.

If you work with multibyte characters a lot, whether for internationalization reasons or for typographical correctness in web pages, this may be very useful to you.

Particularly when editing documents for human consumption rather than code, it’s often necessary to enter special characters into a document that can’t otherwise be produced by a single key press:

Letters with diacritical marks like ä, é, and ô — Vim refers to these as digraphs — a particular problem when using a US keyboard layout
Unicode characters like typographic dashes — or copyright symbols ©, or other symbols from the multi-byte portion of the UTF-8 character set (including foreign languages)
Literal control characters like <Tab>

Vim has a method for inserting each of these within the editor, rather than having to copy-paste them from another document. We won’t discuss Vim’s alternative multibyte input methods here, and will assume that you’re using a keyboard with a US or UK layout, and predominantly type in English — apologies to international readers, but I do not have another type of keyboard to test this out!

Some of the following assumes that you’re using Vim in a UTF-8 capable terminal, and with the encoding option in your .vimrc set to utf-8, which is highly recommended for the vast majority of editing requirements:

set encoding=utf-8

It also assumes that your font is capable of displaying all of the characters concerned; monospace fonts with workable symbol coverage include Consolas, Inconsolata, and Ubuntu Mono.

Digraphs

Vim has a special shorthand for entering characters with diacritical marks. If you need some familiar variant of a Latin alphabet character with a diacritical mark or embellishment, it’s likely you’ll be able to input it with the digraph system. It also has support for some other sometimes-needed characters like thorn Þ and eszett ß, and Cyrillic characters.

Digraph input is started in insert or command mode (but not normal mode) by pressing Ctrl-k, then two printable characters in succession; the first is often the “base” form of the letter, and the second denotes the appropriate embellishment.

Some simple examples that might occasionally be needed for English speakers to correctly type one of the language’s many “loan words”:

Ctrl-k c , -> ç
Ctrl-k e ' -> é
Ctrl-k o ^ -> ô
Ctrl-k a ! -> à
Ctrl-k u : -> ü
Ctrl-k = e -> €

This is just a small sample; Vim has support for a great many digraphs. Take a look at the relevant section of the documentation for a complete treatment of the feature. You can also type :digraphs within Vim to get a complete list of digraphs — several screenfuls of them!

Note that you can enter all of these characters using the Unicode mode discussed later in this article as well; two-character mnemonic digraphs simply happen to be easier to remember than four-digit codes.

Unicode characters

For characters not covered in the digraph set, you can also enter unicode characters by referring to their code page number. In insert or command mode (but not normal mode) this is done by typing Ctrl-v and then u, followed by the hexadecimal number. Some potentially useful examples:

Ctrl-v u 2018 -> ‘, a LEFT SINGLE QUOTATION MARK
Ctrl-v u 2019 -> ’, a RIGHT SINGLE QUOTATION MARK
Ctrl-v u 2014 -> —, an EM DASH

These are handy in some cases when writing HTML documents, as an alternative to using HTML entities like — or ©. An exhaustive summary of these characters and their codes is available on the Unicode website.

Other non-printable characters

The unicode character input method is actually a specialised case of inputting literal characters with a Ctrl-v prefix. We can input other non-printable and control characters using this prefix:

Ctrl-v <Enter> -> ^M
Ctrl-v <Tab> -> ^I

This is sometimes handy when conforming to someone else’s tab style, and can also be handy when searching for characters literally in searches.

Systems, Tools, and Terminal Science

Tag Archives: unicode

Vim character info

Special characters in Vim

Digraphs

Unicode characters

Other non-printable characters