I’ve tried a few different things from TOF
And i’ve tried converting the encoding as well and making sure I decompose things to the right form
So far nothing works
The really weird part is that in simple tests the name of the file is actually NOT as it seems
In Finder lists it shows as
Hauptmen.frm
but when I do ls -al in terminal its
-rwxr-xr-x@ 1 npalardy staff 16373 8 Oct 2020 Hauptmen?.frm
and if I just run some code to grab the parent then list all the names it appears to not have those special characters … unless I check the binary and then its
Ran into this trying to transmit binary from a language that insists strings be utf-8.
I had to go out of my way to prevent binary data from being converted. A hex dump was full of extra 0xC2 plus another wrong byte.
In the first hex dump, 0xC2 is a utf-8 escape for binary codes over 0x7F.
UTF-8 0xc281 decodes to
The second dump is a ‘latin1’ / ISO-8859 (8 bit binary) &xFC = ü
I’m not sure if this gives you any more idea about where to look for the ‘helpful translation’ section.
yeah this is a puzzler
the zip file when unzipped gives me those odd names
and the one thing that is in there is a VB6 project manifest file which contains a line like
Form=Hauptmenü.frm
in WindowsANSI
But so far I cant figure out how to take that ansi and actually get the file which IS on disk
Its like macOS doesnt see it or tries to turn the ANSI encoded string into a UTF16 one which wont match and so it says “no such file”
Is not this a 2-characters vs single-character trick?
Like you have é alone but can also have e and ´ separated (proven by the fact I just wrote them). IIRC, both can be combined in a “single character” and has a different code point than the all-in-one version.
I wish it was
The VB6 manifest has it one way - and the file system actually has it unzipped another way
Still trying to sort out why and how I can deal with this
Well this is really screwy
The manifest is encoded in WindowsAnsi
This manifest contains the name of the file
And in that the string contains
Hauptmenü.frm
and the ü is encoded as a single byte &hFC
The zip file, when uncompressed on macOS gives me a file that has the name with that single &hFC encoded in an encoding I dont recognize and its 2 bytes &hC2 &h81
So far I havent found any combination of convert encoding / compose / decompose etc that turns that &hFC byte into &hC2 &h81
What I’ve landed on that DOES work is along these lines
f is the dir that contains the unzipped manifest & vb files
Dim value As String = ConvertEncoding("Hauptmenü.frm", Encodings.WindowsANSI)
Dim newvalue As String = value.ReplaceAll(ChrB(&hFC), ChrB(&hC2)+ChrB(&h81))
Dim fl1 As folderitem = f.child(newvalue)
Dim newUrl As String = f.URLPath + "/" + EncodeURLComponent(newvalue)
Dim fl2 As folderitem = GetFolderItem(newUrl, FolderItem.PathTypeURL)
Break
in this F1 will not be NIL but EXISTS will be FALSE
But F2, fetched by URL, will not be nil AND exists will be TRUE
The 0xC281 is a ‘utf8’ escaped 0x81. In the DOSLatinUS encoding this is ü.
Public Function utf8escape(ss as String) as String
Dim ret As String
Dim s() As String
Dim i,c As Integer
s = SplitB(ss,"")
While i <= UBound(s)
c = AscB(s(i))
If c < 128 Then
ret = ret + s(i)
Else
// '0xC0 + c >> 6' + '0x80 + c & 0x7f'
ret = ret + ChrB(192 + (Bitwise.ShiftRight(c,6) And 127)) + ChrB(128 + (c And 127))
End If
i = i + 1
Wend
Return ret
End Function
Dim value As String = ConvertEncoding("Hauptmenü.frm", Encodings.DOSLatinUS) value = utf8escape(value)
oooooo !!! this might work even better than the hacky way I have it now !
EDIT - once I figure out what the right conversion from WindowsAnsi to DosLatin is this could help
its that part thats tripping this whole mess up
value1 is what I start with - it is windows ansi as best I can tell
and the file system seems to have it in that weird dos latin 1 form
so I need to figure the right conversion from one to the other
so far I’m not quite there
Dim value1 As String = DefineEncoding( chrb(&hFC) + ".frm" , Encodings.WindowsANSI)
Dim newvalue1 As String = utf8escape(value1)
Dim value2 As String = ConvertEncoding(value1, Encodings.DOSLatinUS)
Dim newvalue2 As String = utf8escape(value1)
Break