Show UTF-8 input as UTF-8 output

.. by doing the stupid "convert to unicode value and back" model.

This actually populates the 'struct video' array with the unicode
values, so UTF8 input actually shows correctly.  In particular, the nice
test-file (UTF-8-demo.txt) shows up not as garbage, but as the UTF-8 it


Since the *editing* doesn't know about UTF-8, and considers it just a
stream of bytes, the end result is not actually a usable utf-8 editor.
So don't get too excited yet: this is just a partial step to "actually
edit utf8 data"

NOTE NOTE NOTE! If the character buffer contains Latin1, we will
transform that Latin1 to unicode, and then output it as UTF8.  And we
will edit it correctly as the character-by-character data.  Also, we
still do the "UTF8 to Latin1" translation on *input*, so with this
commit we can actually continue to *edit* Latin1 text.

Signed-off-by: Linus Torvalds <>
1 file changed