I am currently messing about with splitting up a long string of data and would like to quantify (very approximately) the period of execution for various parts of it with a super simple method that is not going to be hindered by the code execution its self.
I am not bothered about absolute, simply relative.
make start_time and end_time as DOUBLE’s
some_way_to_count_time_now = the Xojo microseconds function
dim start_time as double = microseconds
code does its thing
dim end_time as double = microseconds
// because end_time occurs after you get a positive value this way
dim run_time as double = end_time - start_time
ok Norman, silly forum code mistake, thanks for the code, I am astonished I missed the ‘microseconds’ function, and there it is in Dash as plain as day, DOH!
I often use the wonderful expletive brought to the wider world through Father Ted, feck feck feck right off!!!
so feck is my got to textual word for many a situation that requires a little more impact than ‘oh gosh!’, even my very old and, most prim, Mum does not appear to balk at this wonderful word.
I am sure that and, some other adult words, maybe offensive to some readers…but (yes you guessed it) feck right off! its the real world, get with the fun and smile at the joke.
perhaps a new topic about ones most favourite not swear word would be appropriate, having read the forum guidelines recently, which apparently only nine of the membership have actually done! I have, apparently, got a badge to prove it, jajaja
for me the split function is actually working way faster than I though it might (although I am using a reasonably nimble machine)
for a 1.2Mb file with end of line search character I got 20,000+ array elements in just over 4ms.
I tried to upload the text file so anyone could test but its not allowed for some reason.
Careful with SplitB on UTF-8 as it will split multibyte characters wrong (unless you want to handle all the bytes yourself)
On ASCII or other single byte encodings its fine
As long as the UTF-8 data is correctly formed there should be no problem using B functions with multi-byte data. If it isn’t correctly formed you are probably screwed as the standard string functions won’t work correctly either.
Here is the section on self-synchronisation for UTF-8: “The leading bytes and the continuation bytes do not share values (continuation bytes start with 10 while single bytes start with 0 and longer lead bytes start with 11). This means a search will not accidentally find the sequence for one character starting in the middle of another character.”
We have also tested this and using it in production with text from several different multi-byte scripts and it has yet to fail.
I’m actually puzzled now (again). From what the doc has to say about Split and SplitBytes (or SplitB), I don’t understand what the difference between all these is. Either you split into characters (or strings, on a string), or into bytes. What is the point of SplitB (or SplitBytes)?
SplitB / ReplaceB etc… are lower level functions which look for matching sequences of bytes. These functions will work with ASCII and UTF-8 multi-byte characters in most situations.
They work with multi-byte UTF-8 characters in most situations because bytes 1 / 2 / 3 etc. that make-up a single multi-byte character have different ranges. This means you cannot get a false match which turns out to be part of one multi-byte character and a part of the next multi-byte character.
Example:
Dim s As String
Dim a(-1) As String
Dim s2 As String
s = "abc" + Chr(9) + "γρεεκ" + Chr(9) + "кгышшфт" + Chr(9) + "ોવમ્કતોે્ઠ" + Chr(9) + "ะ้ฟร" + Chr(9) + "クォあsdklじゃsぢ王wq" + Chr(9) + "شقشزهذ"
a = SplitB(s, Chr(9))
However, if you are splitting on an empty string to convert a string into an array of individual characters then SplitB will only work for ASCII and not UTF-8 multi-byte characters.
Example:
Dim s As String
Dim a(-1) As String
Dim b(-1) As String
Dim s2 As String
s = "abc" + Chr(9) + "γρεεκ" + Chr(9) + "кгышшфт" + Chr(9) + "ોવમ્કતોે્ઠ" + Chr(9) + "ะ้ฟร" + Chr(9) + "クォあsdklじゃsぢ王wq" + Chr(9) + "شقشزهذ"
'does not work
a = SplitB(s, "")
'does work
b = Split(s, "")
It is worth noting that even the standard Xojo string functions don’t handle all situations correctly such as emojis since Xojo string functions don’t work with user perceived characters.
Personally I would have expected .SplitB etc to split on bytes, but no matter as I don’t use them. So far I’ve not had any issues with UTF-8 such as emojis but then I don’t try to split them.
Certainly the way UTF-8 is constructed is very useful. I have a method to verify that a string is valid UTF-8, and to replace wrong bytes with the replacement character. Works fine OK even with chars such as OLD PERSIAN SIGN AURAMAZDAAHA (𐏊) which is F0 90 8F 8F.
WOW!
thanks for all the replies, I know my case is going to be only ASCII (as valid characters) but I did forget to ensure encoding, which I always do in serial port code but for some reason ignored that here.
everything mention here will be thought about and tested, many thanks to all contributors.