February 29, 2024

VBA String data type and memory layout

Last modified: Tue Mar 12 22:22:31 UTC+0100 2024 © Attila Tarpai

VBA Strings, memory layout, string functions

VBA Strings are Unicode BSTR structures based on underlying COM/Automation. BSTR is a composite data type:

  • length prefix - 32-bit integer, number of BYTES in the data string excluding the terminating zeroes
    values of 0, 2, 4, ..
  • data string - string of Unicode 2-byte characters
  • terminator - NULL (0x0000) WCHAR

A VBA String variable is a BSTR POINTER: it points to the first character of the data string:

                               +-----------+
                            -4 | Length=10 |
+-----------+                  +--+--+--+--+--+--+--+--+--+--+--+--+
|  pointer  |  ------------>   |     |     |     |     |     |  0  |
+-----------+                  +--+--+--+--+--+--+--+--+--+--+--+--+
Dim str As String                 BSTR unicode 2-byte characters
str = "hello"

At address -4 we find the Length field in memory.

The default value for a VBA String variable and for functions returning String is zero, a "null-pointer", i.e. no allocated string buffer is associated with the pointer. The assignment str = vbNullString will also write zero in the String variable after freeing up the BSTR buffer:

null-string:

+-----------+                  +-----------+
|     0     |                  |     0     |
+-----------+                  +-----------+
Dim str As String              Dim str As String
                               str = "hello"
                               str = VBA.vbNullString

An empty-string "" points to this 6-byte BSTR structure:

empty-string:
                               +-----------+
                            -4 |     0     |
+-----------+                  +--+--+-----+
|  pointer  |  ------------>   |  0  |
+-----------+                  +--+--+
Dim str As String
str = ""

VBA.vbNullChar is not really a constant: str = VBA.vbNullChar allocates this new BSTR each time of assignment:

                               +-----------+
                            -4 | Length=2  |
+-----------+                  +--+--+--+--+
|  pointer  |  ------------>   |  0  |  0  |
+-----------+                  +--+--+--+--+
Dim str As String
str = VBA.vbNullChar

After a String variable is assigned to hold an address of a particular BSTR string, these two are inseparable. Another assignment first frees the memory holding the previous string, allocates a new BSTR and resets the pointer. This happens extensively when VBA code is running and working with strings; all string functions, concatenation allocates a new BSTR (there is only one exception, similar to arrays, assignment from Functions - see later).

No two pointers are allowed to point to the same BSTR memory address.

On null- and empty-string

  • Both null- and the empty-string tests True for vbNullString.
  • Both appears as "" in Debug windows.
  • Len()/LenB() returns zero for both.
  • AscW() gives Run time error 5 for both.

For true null-string test use:
If StrPtr(s) Then ...

Variant/String

Variant stores the BSTR pointer:

Dim v
v = "hello"

0800-000000000000-C8A3DAD238020000-0000000000000000 (64-bit VBA Variant)

+--+--+-----+-----+-----+-----------------------+-----------------------+
|08 00|    reserved     |      pointer          |           0           |   64-bit VBA Variant
+--+--+-----+-----+-----+-----------------------+-----------------------+
                              |
                              |        +--+--+--+--+--+--+--+--+-
                              +------> |     |     |     |     |
                                       +--+--+--+--+--+--+--+--+-
                                          BSTR unicode 2-byte characters

08 is vbString. Type mismatch when passed to functions expecting String. All other operations uses implicite conversion.

String assignment

String conversion in VBA from other data types is quite liberal, VBA easily converts any expression to a temp-BSTR (except vbNull).

Only the String and Variant data type can store a BSTR pointer and be lvalue in String assignment. The two behaves exactly the same in assignment and conversion is implicite.

Dim v
Dim str As String
v = str
str = v

When the rvalue is another String- or Variant/String variable, VBA String data type is treated as value type: it involves deep-copy i.e. a new BSTR allocated and the string characters are copied.

String assignment from Function does NOT create a new BSTR copy i.e. VBA performs a simple pointer-assignment. The behaviour is the same for both String and Variant/String type Functions - similar to Mid()/Mid$():

lvalue        rvalue                     lvalue        rvalue

Variable      Variable                   Variable      Function
type          type                       type          type

  String     String                        String     String
         \ /                                      \ /
          =                                        =
         / \                                      / \
 Variant     Variant/String               Variant     Variant/String


    BSTR deep copy                            NO BSTR copy

  s2      =       s1                       s2      =       f1()
   |               |                        |               |
   +---> BSTR2     +---> BSTR1              +----->---------+---> BSTR1

Len()/LenB() for String

Len() returns the number of characters in string i.e Lenght field / 2.
LenB() returns the number of bytes in string i.e. the Lenght field at -4.

Mid()/Mid$()

Mid()/Mid$() functions out of range return an allocated empty string "" and never null-string.

In VBA the only way to manipulate characters directly in a string is the Mid() statement: it writes directly into the BSTR buffer i.e. no new BSTR is created:

Mid(str, 1, 1) = "x"

Asc()/AscW()

AscW returns 16-bit signed Integer (-32768 to 32767): the unicode character as it is stored in the buffer. No translation.
Asc returns values 0 to 255: behaves differently based on current locale.

StrPtr()

The StrPtr() function was added in Office 2010 and is located in VBA._HiddenModule:

Function StrPtr(Ptr As String) As LongPtr

It returns LongPtr, a memory address, which simply compiles to a Long or LongLong integer on 32/64-bit VBA.

For example:

Dim str As String
str = "hello"

Debugging gives this output (VBA32):

Hex(StrPtr(str)): 207F2A84
mem dump:
207F2A84: 68 00 65 00 6C 00 6C 00|6F 00 00 00 6E 00 67 00  | h.e.l.l.o...n.g."

The returned value is the content of the String variable, i.e. a pointer to the first character of the data string of the BSTR structure. Can be null-pointer.

NB: StrPtr(var) converts other data types to a temp BSTR, like numerics, expressions and even Fix-length strings; then returns the temp-BSTR address. vbNull gives error as usual.

Passing both String- and Variant/String arguments to StrPtr() returns the true BSTR pointer stored in the variable, even if the variable is the element of an Array (either in String- or Variant Array):

String        Variant/String       In String Array        In Variant Array
 _________     ________________     _________              _________
|_->BSTR__|   |0800____->BSTR__|   |___PSA___|            |___PSA___|
                                        |_______String         |________Variant/String
                                        |_______String         |________Variant/String
                                        |_______String         |________Variant/String

                          StrPtr(true-BSTR-Ptr)
                          VarPtr(true-Ptr)

PSA is Pointer to SAFEARRAY.

StrPtr() uses temp BSTR in Variant of array type

The only exception I've found is when the array are stored in Variant: there is some conversion going on and a temp BSTR is created and passed to StrPtr(). It is the same case for String- and Variant type Arrays in Variant. It's unclear why a String in a String Array returns true BSTR pointer yet String in a Variant of array does not. Tried to see if there is some connection to VarPtr(), but.. it got more confusing:

In Variant of array of String     In Variant of array of Variant
 ________________                  ________________
|0820______PSA___|                |0C20______PSA___|
            |_______String                    |________Variant/String
            |_______String                    |________Variant/String
            |_______String                    |________Variant/String

    StrPtr(temp-BSTR-Ptr)            StrPtr(temp-BSTR-Ptr)    <-- this I do not
    VarPtr(temp-Ptr)                 VarPtr(true-Ptr)         <-- this I understand from Sub(v)

VarPtr()
- In Variant/String(): VarPtr(v(0)) returns an address of a temp variable (as if a temp VT_BYREF Variant holding the address was passed by caller.. and VarPtr() returned this address. Vague guess only)
- In Variant/Variant(): here VarPtr(v(0)) returns true address!

Code example:

Dim v

ReDim v(3) As String
v(0) = "v0"
Debug.Print StrPtr(v(0))  ' temp BSTR address - not BSTR stored in v(0)

ReDim v(3)
v(0) = "v0"
Debug.Print StrPtr(v(0))  ' temp BSTR address - not BSTR stored in v(0)

VBA Fixed-length Strings

There is also the (legacy?) fixed-length String:

Dim str As String * 4

Allocates 4 x 2 = 8 bytes for unicode chars on "stack". No trailing zero. Filled with zeroes.
Every operation creates a new BSTR from it, on write back truncated/padded with spaces. Even when function returns null-ptr.
Ergo: the run-time takes care of fixed-strings, converts them back and forth. Just an emulation? Inefficient?

Useful to limit string lenght automatically.

No comments:

Post a Comment