TOF : Split performance test - Xojo vs VB6

Oh I can try that and concatenate the strings in memory.

Oh it is true, needs seven seconds then.

1 Like

On the same PC as the PyPy 3.10 test above:

So, comparing like for like, Java was slower than both PyPy and JS.

so 42 to 53 seconds are for me 11 seconds…nay Python is slower. Man. Nice try. Only for information: Java would be ready at 28:49 instead of 28:53.

Correct, that’s how much time Java took on the same computer as PyPy was run.
I guess you missed that those 11 seconds were for Java and not Python.

Which java version was it? Cause here python is slower definitely. And which is?

This is the test I did in B4J (which transpiles to Java). The first one is using regex (as this is the only command available to do a split in B4J, but for fun I also wrote a ‘Split’ for a second test).

Windows 11, i7, Java 11

Dim i As Int
Dim j As Int
Dim kk As Int
Dim sTagsArray() As String
	
Dim startTime As Long = DateTime.now
Dim endTime As Long

Dim sNewString As String 'ignore
Dim myTestString As String
	
myTestString = "1230,345,456,6780,789,901,1240,346,457,5680,679,345,4560,678,789,9010,124,346,4570,568,679"
For i = 0 To 10000000
	sTagsArray = Regex.Split(",",myTestString)
	kk = sTagsArray.Length - 1
	For j = 0 To kk
		sNewString = sTagsArray(j)
	Next
Next
	
endTime = DateTime.Now
Log("Test1 (regex): " & ((endTime - startTime) / 1000) & " secs")

Results:
Debug: Test1 (regex): 4.625 secs
Build: Test1 (regex): 4.33 secs

For the second test with an own written ‘split’ method:

Dim i As Int
Dim j As Int
Dim kk As Int
Dim sTagsList As List
	
Dim startTime As Long = DateTime.now
Dim endTime As Long
	
Dim sNewString As String 'ignore
Dim myTestString As String
	
myTestString = "1230,345,456,6780,789,901,1240,346,457,5680,679,345,4560,678,789,9010,124,346,4570,568,679"
For i = 0 To 10000000
	sTagsList = Split(",",myTestString)
	kk = sTagsList.Size - 1
	For j = 0 To kk
		sNewString = sTagsList.get(j)
	Next
Next
	
endTime = DateTime.Now
Log("Test2 (helper): " & ((endTime - startTime) / 1000) & " secs")

The handwritten Split method:

Sub Split(pattern As String, text As String) As List
	Dim values As List
	values.Initialize
	Dim startIndex, endIndex As Int
	Dim c As Char
	Dim com As Char = Asc(pattern)
       
	For endIndex = 0 To text.Length - 1
		c = text.CharAt(endIndex)		
		If c = com Then
			values.Add(text.SubString2(startIndex, endIndex))
			startIndex = endIndex + 1
		End If		
	Next
	values.Add(text.SubString2(startIndex, endIndex ))
	Return values
End Sub

Results:
Debug: Test2 (helper): 2.051 secs
Build: Test2 (helper): 2.022 secs

For me this comes as no real surprise. Java has been faster than Xojo in every test I’ve ever came across, even if it has a ‘RAD’ layer like B4X on top of it. And those values are the worst values I had running the tests several times (not in a loop, but by start/stopping the app). In some cases Test 2 was below 1.5 secs.

4 Likes

The argument will be: Java is automatically finding out that it is not used job and does nothing of it. That’s why it is so fast. Ahhmmm, with a counter in the loop I can see that Java does it but the args will come up ever and ever.

Ich can only say: …

small hint: tried that on several systems with same python code you used. Sorry, on ALL systems it was slower. #magic system at your site.

By the way I missed your answers. Like always

Your Python Sourcecode on MacOS:

Means 24 seconds. What do you want to tell me about it. And one thing: not to forget to print the counter after while python otherwise is not concatenating correctly…not used variable. Nice try, nice trap. Analyzing the memory usage of the running program shows exactly that. So guess what: not really true what you said.

mport time

start_time = time.time()

myTestString = "1230,345,456,6780,789,901,1240,346,457,5680,679,345,4560,678,789,9010,124,346,4570,568,679,"
i = 0
while i <= 10000000:
    sTagsArray = myTestString.split(",")
    kk = len(sTagsArray)
    j = 0
    while j < kk:
        sNewString = sTagsArray[j]
        j += 1
    i += 1

end_time = time.time()

print("total time taken: ", end_time - start_time)
print(i)



Makes it simple. This Code running on the same machine as my Java ones.

I have done it with much bigger Strings already before long time. Guess what. Python was never faster than Java.

Absolutely nothing magic about it, I detailed the spec above, it’s a very modest mini PC. Despite the fact I clearly stated I ran the script using PyPy 3.10 I’m guessing you used CPython.

Sorry, I missed the bit where you were claiming there was something wrong with the i variable

so i Debug Mode it is done in 4,271 seconds.

Starting the jar file is resulting in:

so 3,540 seconds.

PyPy I would not consider while too many Libs we need for Python are: not working with it. And many are in the need of CPYTHON so maybe you are on the right way while near to c++ but nit having the entire space of possibilities of python. But you are right, with PyPy I get a result of 4,65 seconds. Blazing fast.

The compiled Java Version (compiled with Java native compiler) makes it even nicer: 3,15 seconds. So I guess I have no need for PyPy. It is for us not even usable. If for you: Bingo you are nearly as fast as C++.

But that is NOT running Python. That is running PyPy compilation. Means: not Python anymore. Python by self is interpreted and made exactly for that. But, by the way, somehow your Java config may have to les Headspace or something like that. Java was here faster then PyPy

It is Python - Python is a language specification. Sure CPython is the default implementation of that specification but PyPy is another.

pypy is compiling to machines language. It is programmed in python. Java is running as: Java. That is the difference. But hey, what we are fighting about? Both are blazing fast and that’s it. If you do not need professional GUI features are libs which are not supported by pypy compiler it is the best solution to compile your code. No question.

But running python code is normally interpreted code. Compiled it is perfectly running as machine code. Only sometimes CPython is needed while stuffs not supported. That makes a good channel for Python programmers. I would not choose it for big applications but that is the programmers choice. It is in case of speed in the upper class of languages. Interesting would be what happened with Delphi and C# Code.

1 Like

Aas far as I see it’s all about the Split Function and Memory Management. It would be more effecient (faster) in every Language to use Pointers aka MemoryBlocks in Xojo. Therefore this Benchmark does not show anything.

in Freebasic it takes between 21-23 Seconds (i5, Linux)

The problem in Freebasic: There’s no Split Function. I’ve found some inspiration for a Split-Function from Freebasic Portal

#include "datetime.bi"

Function Split( DELIMIT As String, TEXT As String, RET() As String) As Integer

  dim As integer DMAX=0
  dim RES() As ZString Ptr
  dim As integer I1 , I2, ini, fini
  dim As ZString Ptr p , p1 , p2, p3
  dim As integer LDelimit = Len(DELIMIT), LT= Len(TEXT)
  dim As integer Posi()

  if LT=0 Or LDelimit > LT Then
    ReDim RET(1)
    RET(0) = "0"
    DMAX=0
    Return DMAX
    Exit Function
  endIf

  p = StrPtr(TEXT)
  p1=p
  if LDelimit>0 Then
    do While *p
      I2=0
      If *p = DELIMIT[0] Then
        p3=p
        if LDelimit>1 Then
           for I1 = 1 To LDelimit-1
             I2=0
             p+=1
             If *p <> DELIMIT[I1] Then Exit For
             I2=1
           next
        else
           I2=1
        endIf
        if I2=1 Then
           If p= StrPtr(TEXT)+LT-1 Then fini=1
           DMAX+=1
           ReDim Preserve Posi(0 To DMAX-1)
           Posi(DMAX-1)=p3 - p1 +1
        endIf
      endIf
      p+=1
    loop

    if DMAX=0 And ini=0 Then

      DMAX=1
      ReDim RET(2)
      RET(0) = "1"
      RET(1) = TEXT
      Return DMAX
      Exit Function

    elseIf DMAX=0 And ini=1 Then
        
      DMAX=1
      ReDim RET(2)
      RET(0) = "1"
      RET(1) = Mid(TEXT,LDelimit+1)
      Return DMAX
      Exit Function
        
    endif

    if fini=0 Then DMAX+=1
    reDim RET(0 To DMAX)
    reDim RES(0 To DMAX-1)
    RES(0) = Allocate(Len(TEXT)+1)
    *RES(0) = TEXT

    for I1 = 0 To DMAX-2
      p2= p + Posi(I1)-1
      *p2 = 0
      RES(I1+1) = p2 + LDelimit
      RET(I1+1)=*RES(I1)
    next
    
    if fini=1 Then
        p2= p+Posi(DMAX-1)-1
        'p2= p - LDelimit + 1
        *p2 = 0
    endIf
    
    RET(DMAX)=*RES(DMAX-1)
    RET(0) = Str$(DMAX)
    Deallocate RES(0)
    
  else
  
     ReDim RET(2)
     RET(0) = "1"
     RET(1) = TEXT
     DMAX=1

  endIf
  return DMAX
  
End Function



dim startTime As Long = second(now)
dim endTime As Long

dim i as Integer
dim j as Integer
dim kk as Integer
dim sTagsArray() as String
	
dim sNewString As String 'ignore
dim myTestString As String
	
myTestString = "1230,345,456,6780,789,901,1240,346,457,5680,679,345,4560,678,789,9010,124,346,4570,568,679"
for i = 0 To 10000000
  if Split(",",myTestString,sTagsArray()) then
    kk = ubound(sTagsArray) - 1
    for j = 0 To kk
      sNewString = sTagsArray(j)
     next
   endif
next
	
endTime = second(Now)
print "Test1 (regex): " & (endTime-startTime) & " secs"

It’s hard to compare string performance between programming languages that use different encodings as handling of ASCII or UTF-16 data is faster than handling UTF-8 data.

Java uses utf8 only to mention.

I think internally Java uses UTF-16.

Only when you choose utf16. Simply.

Java docs talk about modified UTF-8 encoding, whatever that is…