CSI: Visual Studio – Unable to translate Unicode character at index X to specified code page

[Originally Posted By]: http://www.hanselman.com/blog/CSIVisualStudioUnableToTranslateUnicodeCharacterAtIndexXToSpecifiedCodePage.aspx

A crazy internal error from Visual Studio

A customer emailed me a weird one. I tend to have a sense for when something is up and when an obscure thing will turn into something interesting.

The person says:

…mysteriously most of my projects refuse to build.  “The build stopped unexpectedly because of an internal failure… something about unicode… blah blah”

There are a few messages out there on the web about it — even a really old hot fix.  What’s the best way to proceed with the VS team / MS?  Is there anyone actively interested in glitches like this?

My spidey-sense is tingling. First, when something says “internal failure” it means some fundamental expectation wasn’t met. Garbage in perhaps? He says “most of my projects” which implies it’s not a specific project. There’s also the sense that this is a “suddenly things stopped working” type thing. Presumably it worked before.

I say:

“Have you checked all the source files to make sure one isn’t filled with Unicode nulls or something?”

And says no, but sends a call-stack (which is always nice when it’s sent FIRST, but still):

Error    1    The build stopped unexpectedly because of an internal failure.
System.Text.EncoderFallbackException: Unable to translate Unicode character \uD97C at index 1321 to specified code page.
   at System.Text.EncoderExceptionFallbackBuffer.Fallback(Char charUnknown, Int32 index)
   at System.Text.EncoderFallbackBuffer.InternalFallback(Char ch, Char*& chars)
   at System.Text.UTF8Encoding.GetByteCount(Char* chars, Int32 count, EncoderNLS baseEncoder)
   at System.Text.UTF8Encoding.GetByteCount(String chars)
   at System.IO.BinaryWriter.Write(String value)
   at Microsoft.Build.BackEnd.NodePacketTranslator.NodePacketWriteTranslator.TranslateDictionary(Dictionary`2& dictionary, IEqualityComparer`1 comparer)
   at Microsoft.Build.Execution.BuildParameters.Microsoft.Build.BackEnd.INodePacketTranslatable.Translate(INodePacketTranslator translator)
   at Microsoft.Build.BackEnd.NodePacketTranslator.NodePacketWriteTranslator.Translate[T](T& value, NodePacketValueFactory`1 factory)
   at Microsoft.Build.BackEnd.NodeConfiguration.Translate(INodePacketTranslator translator)
   at Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.SendData(INodePacket packet)
   ...

OK, so it doesn’t like a character. But a character in WHAT? Well, we’d assume a source file, but it’s important to remember that there’s other pieces of input to a compiler like path names, environment variables, commands passed to the compiler as switches, etc.

It says Index 1321 which seems pretty far into a string before it gets mad. I asked a few people inside and Sara Joiner says:

It looks like the only place in BuildParameters that we call TranslateDictionary is when transferring the state of the environment [variables] across the wire.

Ah, so this is splitting up name-value pairs that are the environment variables! David Kean says “ask him what his PATH looks like.” I ask and I get almost 2000 bytes of PATH! It’s a HUGE path, it looks like it may even have been duplicated and appended to itself a few times.

Here’s just a bit of the PATH in question. See anything?

\;C:\PROGRA~1\DISKEE~1\DISKEE~1\;C:\Program Files (x86)\Windows Kits\8.0\Windows
Performance Toolkit\;C:\Program Files\Microsoft SQL
Server\110\Tools\Binn\;C:\Program Files\Microsoft\Web Platform
Installer\;C:\Program Files\TortoiseSVN\binVN\???p??;C:\Program
Files\TortoiseSVN\bin;C:\PHP\;C:\progra~1\NVIDIA
Corporation\PhysX\Common;C:\progra~2\Common Files\Microsoft Shared\Windows
Live;C:\progra~1\Common Files\Microsoft Shared\Windows
Live;C:\q\w32;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;
C:\Windows\System32\WindowsPowerShell\v1.0\;C:\progra~2\WIDCOMM\Bluetooth
Software\;C:\progra~2\WIDCOMM\Bluetooth

See those ??? marks? That doesn’t feel like question marks to me. I open the result of “SET > env.txt” as a binary file in Visual Studio and it looks like it’s 3Fs, which are ? marks.

I think the text file was converted to ANSI

This makes me think that there’s unicode goo in the PATH that was converted to ANSI with it was piped. Phrased differently, this text file isn’t reality.

However, elsewhere in the Windows UI his PATH variable looks like different.

C:\Program Files\TortoiseSVN\binVN\�侱ᤣp䥠؉;

Sometimes that corruption in the path looks like this and you might assume it’s Chinese. No, it’s corruption that’s getting interpreted as Unicode. Interestingly the error said the naughty character was 0xD97C which is &#0xD97C; � which implies to me that something got stripped out at some point in processing and turned into the Unicode equivalent of ‘uh…’ Regardless, it’s wrong and it needs to be removed.

I ask him if cleaning his PATH worked and the customer just send me a one line response via email…the best kind of response:

========== Build: 12 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

Yay! I hope this helps the next person who goes aGoogling for the answer and thought they were alone. Thanks to David Kean, Sara Joiner and Srinivas Nadimpalli for looking at the call stack and guessing at solutions with me!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s