VBA Strings, memory layout, string functions
VBA Strings are Unicode BSTR structures based on underlying COM/Automation. BSTR is a composite data type:
- length prefix - 32-bit integer, number of BYTES in the data string excluding the terminating zeroes
values of 0, 2, 4, .. - data string - string of Unicode 2-byte characters
- terminator - NULL (0x0000) WCHAR
A VBA String variable is a BSTR POINTER: it points to the first character of the data string:
+-----------+ -4 | Length=10 | +-----------+ +--+--+--+--+--+--+--+--+--+--+--+--+ | pointer | ------------> | | | | | | 0 | +-----------+ +--+--+--+--+--+--+--+--+--+--+--+--+ Dim str As String BSTR unicode 2-byte characters str = "hello"
At address -4 we find the Length field in memory.
The default value for a VBA String variable and for functions returning String is zero, a "null-pointer", i.e. no allocated string buffer is associated with the pointer. The assignment str = vbNullString
will also write zero in the String variable after freeing up the BSTR buffer:
null-string: +-----------+ +-----------+ | 0 | | 0 | +-----------+ +-----------+ Dim str As String Dim str As String str = "hello" str = VBA.vbNullString
An empty-string ""
points to this 6-byte BSTR structure:
empty-string: +-----------+ -4 | 0 | +-----------+ +--+--+-----+ | pointer | ------------> | 0 | +-----------+ +--+--+ Dim str As String str = ""
VBA.vbNullChar
is not really a constant: str = VBA.vbNullChar
allocates this new BSTR each time of assignment:
+-----------+ -4 | Length=2 | +-----------+ +--+--+--+--+ | pointer | ------------> | 0 | 0 | +-----------+ +--+--+--+--+ Dim str As String str = VBA.vbNullChar
After a String variable is assigned to hold an address of a particular BSTR string, these two are inseparable. Another assignment first frees the memory holding the previous string, allocates a new BSTR and resets the pointer. This happens extensively when VBA code is running and working with strings; all string functions, concatenation allocates a new BSTR (there is only one exception, similar to arrays, assignment from Functions - see later).
No two pointers are allowed to point to the same BSTR memory address.
On null- and empty-string
- Both null- and the empty-string tests True for
vbNullString
. - Both appears as
""
in Debug windows. - Len()/LenB() returns zero for both.
- AscW() gives Run time error 5 for both.
For true null-string test use:If StrPtr(s) Then ...
Variant/String
Variant stores the BSTR pointer:
Dim v
v = "hello"
0800-000000000000-C8A3DAD238020000-0000000000000000 (64-bit VBA Variant)
+--+--+-----+-----+-----+-----------------------+-----------------------+
|08 00| reserved | pointer | 0 | 64-bit VBA Variant
+--+--+-----+-----+-----+-----------------------+-----------------------+
|
| +--+--+--+--+--+--+--+--+-
+------> | | | | |
+--+--+--+--+--+--+--+--+-
BSTR unicode 2-byte characters
08 is vbString. Type mismatch when passed to functions expecting String. All other operations uses implicite conversion.
String assignment
String conversion in VBA from other data types is quite liberal, VBA easily converts any expression to a temp-BSTR (except vbNull
).
Only the String and Variant data type can store a BSTR pointer and be lvalue in String assignment. The two behaves exactly the same in assignment and conversion is implicite.
Dim v Dim str As String v = str str = v
When the rvalue is another String- or Variant/String variable, VBA String data type is treated as value type: it involves deep-copy i.e. a new BSTR allocated and the string characters are copied.
String assignment from Function does NOT create a new BSTR copy i.e. VBA performs a simple pointer-assignment. The behaviour is the same for both String and Variant/String type Functions - similar to Mid()/Mid$():
lvalue rvalue lvalue rvalue Variable Variable Variable Function type type type type String String String String \ / \ / = = / \ / \ Variant Variant/String Variant Variant/String BSTR deep copy NO BSTR copy s2 = s1 s2 = f1() | | | | +---> BSTR2 +---> BSTR1 +----->---------+---> BSTR1
Len()/LenB() for String
Len() returns the number of characters in string i.e Lenght field / 2.
LenB() returns the number of bytes in string i.e. the Lenght field at -4.
Mid()/Mid$()
Mid()/Mid$() functions out of range return an allocated empty string "" and never null-string.
In VBA the only way to manipulate characters directly in a string is the Mid() statement: it writes directly into the BSTR buffer i.e. no new BSTR is created:
Mid(str, 1, 1) = "x"
Asc()/AscW()
AscW returns 16-bit signed Integer (-32768 to 32767): the unicode character as it is stored in the buffer. No translation.
Asc returns values 0 to 255: behaves differently based on current locale.
StrPtr()
The StrPtr() function was added in Office 2010 and is located in VBA._HiddenModule:
Function StrPtr(Ptr As String) As LongPtr
It returns LongPtr, a memory address, which simply compiles to a Long or LongLong integer on 32/64-bit VBA.
For example:
Dim str As String str = "hello"
Debugging gives this output (VBA32):
Hex(StrPtr(str)): 207F2A84 mem dump: 207F2A84: 68 00 65 00 6C 00 6C 00|6F 00 00 00 6E 00 67 00 | h.e.l.l.o...n.g."
The returned value is the content of the String variable, i.e. a pointer to the first character of the data string of the BSTR structure. Can be null-pointer.
NB: StrPtr(var) converts other data types to a temp BSTR, like numerics, expressions and even Fix-length strings; then returns the temp-BSTR address. vbNull gives error as usual.
Passing both String- and Variant/String arguments to StrPtr() returns the true BSTR pointer stored in the variable, even if the variable is the element of an Array (either in String- or Variant Array):
String Variant/String In String Array In Variant Array _________ ________________ _________ _________ |_->BSTR__| |0800____->BSTR__| |___PSA___| |___PSA___| |_______String |________Variant/String |_______String |________Variant/String |_______String |________Variant/String StrPtr(true-BSTR-Ptr) VarPtr(true-Ptr)
PSA is Pointer to SAFEARRAY.
StrPtr() uses temp BSTR in Variant of array type
The only exception I've found is when the array are stored in Variant: there is some conversion going on and a temp BSTR is created and passed to StrPtr(). It is the same case for String- and Variant type Arrays in Variant. It's unclear why a String in a String Array returns true BSTR pointer yet String in a Variant of array does not. Tried to see if there is some connection to VarPtr(), but.. it got more confusing:
In Variant of array of String In Variant of array of Variant ________________ ________________ |0820______PSA___| |0C20______PSA___| |_______String |________Variant/String |_______String |________Variant/String |_______String |________Variant/String StrPtr(temp-BSTR-Ptr) StrPtr(temp-BSTR-Ptr) <-- this I do not VarPtr(temp-Ptr) VarPtr(true-Ptr) <-- this I understand from Sub(v)
VarPtr()
- In Variant/String(): VarPtr(v(0)) returns an address of a temp variable (as if a temp VT_BYREF Variant holding the address was passed by caller.. and VarPtr() returned this address. Vague guess only)
- In Variant/Variant(): here VarPtr(v(0)) returns true address!
Code example:
Dim v ReDim v(3) As String v(0) = "v0" Debug.Print StrPtr(v(0)) ' temp BSTR address - not BSTR stored in v(0) ReDim v(3) v(0) = "v0" Debug.Print StrPtr(v(0)) ' temp BSTR address - not BSTR stored in v(0)
VBA Fixed-length Strings
There is also the (legacy?) fixed-length String:
Dim str As String * 4
Allocates 4 x 2 = 8 bytes for unicode chars on "stack". No trailing zero. Filled with zeroes.
Every operation creates a new BSTR from it, on write back truncated/padded with spaces. Even when function returns null-ptr.
Ergo: the run-time takes care of fixed-strings, converts them back and forth. Just an emulation? Inefficient?
Useful to limit string lenght automatically.