Validate numeric string

Below is a 8th library for an old school style state machine based validator for numeric strings. Using RegEx is not always the best tool for the job when simple state machine will do.

Includes words that test if string is a valid int or float and word that test if string is a valid number and also returns type of number.

\
\ validate.8th
\
\ Simple state machine based string validator for numbers.
\
ns?

ns: validate

0 constant INTEGER
1 constant FLOAT

private

32 constant SPACE
09 constant TAB

\ status
0 constant EMPTY
1 constant PARTIAL-FAIL
2 constant PARTIAL-OK
3 constant OK
4 constant ERROR

\ state
0 constant S0
1 constant IPART
2 constant FPART
3 constant ESIGN
4 constant EPART

: make-dword  \ lword hword -- dword
  0xffff n:band 16 n:shl swap
  0xffff n:band
  n:bor ;

: s1
  IPART "state" t:!
  PARTIAL-FAIL "status" t:! ;

: s2
  IPART "state" t:!
  OK "status" t:! ;

: s3
  FPART "state" t:!
  PARTIAL-FAIL "status" t:! ;

: s4
  ESIGN "state" t:!
  PARTIAL-OK "status" t:! ;

: s5
  OK "status" t:! ;

: s6
  FPART "state" t:!
  PARTIAL-FAIL "status" t:! ;

: s7
  ESIGN "state" t:!
  PARTIAL-OK "status" t:! ;

: s8
  OK "status" t:! ;

: s9
  ESIGN "state" t:!
  PARTIAL-OK "status" t:! ;

: s10
  EPART "state" t:!
  PARTIAL-FAIL "status" t:! ;

: s11
  EPART "state" t:! ;

: s12
  OK "status" t:! ;

[ ( ` SPACE S0 make-dword ` "dword" t:@ n:= ), ' noop,
  ( ` TAB   S0 make-dword ` "dword" t:@ n:= ), ' noop,
  ( ` '+    S0 make-dword ` "dword" t:@ n:= ), ' s1,
  ( ` '-    S0 make-dword ` "dword" t:@ n:= ), ' s1,
  ( ` '0    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '1    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '2    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '3    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '4    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '5    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '6    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '7    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '8    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '9    S0 make-dword ` "dword" t:@ n:= ), ' s2,
  ( ` '.    S0 make-dword ` "dword" t:@ n:= ), ' s3,
  ( ` 'e    S0 make-dword ` "dword" t:@ n:= ), ' s4,
  ( ` 'E    S0 make-dword ` "dword" t:@ n:= ), ' s4,
  ( ` '0 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '1 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '2 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '3 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '4 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '5 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '6 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '7 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '8 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '9 IPART make-dword ` "dword" t:@ n:= ), ' s5,
  ( ` '. IPART make-dword ` "dword" t:@ n:= ), ' s6,
  ( ` 'e IPART make-dword ` "dword" t:@ n:= ), ' s7,
  ( ` 'E IPART make-dword ` "dword" t:@ n:= ), ' s7,
  ( ` '0 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '1 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '2 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '3 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '4 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '5 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '6 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '7 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '8 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` '9 FPART make-dword ` "dword" t:@ n:= ), ' s8,
  ( ` 'e FPART make-dword ` "dword" t:@ n:= ), ' s9,
  ( ` 'E FPART make-dword ` "dword" t:@ n:= ), ' s9,
  ( ` '+ ESIGN make-dword ` "dword" t:@ n:= ), ' s10,
  ( ` '- ESIGN make-dword ` "dword" t:@ n:= ), ' s10,
  ( ` '0 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '1 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '2 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '3 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '4 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '5 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '6 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '7 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '8 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '9 ESIGN make-dword ` "dword" t:@ n:= ), ' s11,
  ( ` '0 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '1 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '2 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '3 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '4 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '5 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '6 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '7 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '8 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ` '9 EPART make-dword ` "dword" t:@ n:= ), ' s12,
  ( ERROR "status" t:! ) ] var, state-table

: number
  S0 "state" t:!
  EMPTY "status" t:!
  ( "status" t:@ ERROR n:= if
      break
      2drop
    else
      nip "state" t:@ make-dword "dword" t:! state-table @ a:when
    then ) s:each

  "status" t:@ "state" t:@ make-dword
  [ ( ` OK IPART make-dword ` over n:= ),         ( validate:INTEGER ),
    ( ` OK FPART make-dword ` over n:= ),         ( validate:FLOAT ),
    ( ` PARTIAL-OK FPART make-dword ` over n:= ), ( validate:FLOAT ),
    ( ` PARTIAL-OK ESIGN make-dword ` over n:= ), ( validate:FLOAT ),
    ( ` OK EPART make-dword ` over n:= ),         ( validate:FLOAT ),
    ( ` PARTIAL-OK EPART make-dword ` over n:= ), ( validate:FLOAT ),
    ( null ) ] a:when nip ;

public

: int?  \ s -- T
  #p:number null? if
    drop false
  else
    INTEGER n:=
  then ;

: float?  \ s -- T
  #p:number null? if
    drop false
  else
    FLOAT n:=
  then ;

: num?  \ s -- n|null T
  #p:number null? if
    false
  else
    true
  then ;

ns

Awful lot of code for something that should be so simple…

In Swift

func isNumeric() -> Bool {
        guard let _ = Float(self) else { return false }
        return true
    }
2 Likes

Like Dave, I’m not sure why so much code, but I do agree that regular expressions are too many dev’s go-to for such things.

Regexes, no matter how carefully crafted or precompiled, are notoriously slow for one thing so, I am very careful about using them on hot paths.

How slow? Slow enough that this:

if (name.IndexOf(‘/’) > -1 && Regex.IsMatch(name,@“/\d+$”)
… is often noticeably faster overall than just the Regex, so if you expect only a small percentage of comparisons to match, a redundant “guard” test like that for some portion of what the Regex is looking for is usually a performance win if inside any sort of loop; that way the Regex is seldom actually executed, due to short circuiting by the compiler. In a heavy string processing environment I got a 30% performance lift from such pre-tests. Make sure the string is a candidate for the Regex; only then let the Regex take a closer look.

1 Like

Yes, one could use a function that tries to parse a number but my version goes thru string in one pass handling all valid forms like scientific notation without converting string to number. It’s fast as you just keep track of status and state doing table lookups .You also directly know the number type based on status and state. Also some programming languages happily convert badly formed numeric string to number.

FSM’s are VERY useful

Did you benchmark it?

I have not done any benchmarks but it should be fast enough. Code enclosed inside backticks is evaluated at compile time and not evaluated at runtime. Routine just builds a dword from current character and current state, does the lookup of the dword value and executes a word that updates current state and current status when dword value is found. If dword value is not found, we know the input is bad.