Last modified: Sun Mar 10 23:40:49 UTC+0100 2024 © Attila Tarpai
The VBA Array assignment BUG
I bumped into something weird and erroneous when I was trying to find a way to return arrays from functions without array copy.
Sample code in this GitHub repo.
Abstract
UDF functions returning arrays in a VBA Standard Module after ReDim hold an address of an incomplete SAFEARRAY structure without Feature Flags.
This could be fatal if the Array returned is used later in another array assignment and the array is of String or Variant type. Excel may crash and exit unexpectedly.
Tested in VBA 7.1 with 32-bit Office 2016 and 64-bit Office 365.
Intro
A common way to work with functions returning Array is to Dim or ReDim a local array in the function, fill with values and use array-assignment to the function before exit:
Private Function return_arr_string() As String()
Dim arr(2) As String
arr(0) = "hi"
return_arr_string = arr
End Function
But array-assignment can be expensive. Arrays are value types in VBA and array assignment copies the whole array content. F. ex. for string arrays this involves several string copies. The new array allocates new BSTR for every element and copies the strings one by one.. something we would like to avoid here.
Function returning array
In VBA the function return value is a hidden local variable, as if it was declared as the very first local variable of the function. Functions returning array is no exception: a hidden Pointer to SAFEARRAY (PSA) is allocated (functions can return only Dynamic Arrays). Initially, before ReDim or assignment, a null-ptr and the usage, syntax is the same as with other Dynamic Array variables.
Private Function return_arr_string() As String()
[Dim return_arr_string() As String] <- as if declared here
When this function returns, its PSA is available for the caller. When the value is used as rvalue in a subsequent assignment, no array copy is made only a simple pointer assignment. VBA is smart enough to recognize that the return value variable goes out of scope, thus avoiding unnecessary array copies.
OK, so lets ReDim the function variable itself, being a dynamic array variable, what can go wrong:
Private Function return_arr_string() As String()
ReDim return_arr_string(2)
return_arr_string(0) = "hi"
End Function
OOPS, that does not compile. There is an unlucky VBA language construct: both array-indexing and function call uses parenthesis (), and the line above interpreted as function call.
OK, no problem, lets pass the array to another procedure to work on. Passing arrays is always ByRef in VBA (it has to be) with only a pointer passed:
Private Function return_arr_string() As String()
ReDim return_arr_string(2)
fill_arr_string return_arr_string
End Function
Private Sub fill_arr_string(arr() As String)
arr(0) = "hi"
End Sub
Now this is nice, goal achieved, we can return a newly allocated and populated array without making array copies.
But wait.
The BUG
Here comes the error, bug, language feature or something weird.
Lets see the code first with String Arrays then analyze it:
Private Sub array_assignment_bug_string_array()
Dim arr() As String, arr2() As String
arr = return_arr_string
arr2 = arr
Debug.Print Hex(StrPtr(arr(0)))
Debug.Print Hex(StrPtr(arr2(0)))
End Sub
Private Function return_arr_string() As String()
ReDim return_arr_string(2)
fill_arr_string return_arr_string
End Function
Private Sub fill_arr_string(arr() As String)
arr(0) = "not me"
End Sub
The code is simple - the explanation is not. We re-use the above construct for a local array, but after that make an array assignment to another local dynamic array.
It prints the following:
StrPtr(arr(0)): 27CC53F1C48
StrPtr(arr2(0)): 27CC53F1C48
This must be wrong, arr(0)
and arr2(0)
are 2 different String variables in 2 different Arrays holding the same BSTR memory address! (Test was run in 64-bit VBA7.1)
This was not supposed to happen:
arr2 = arr
|___ String --> BSTR1 |___ String --> BSTR1
|___ String --> BSTR2 |___ String --> BSTR2
This was supposed to happen. Array assignment with proper deep-copy:
arr2 = arr
|___ String --> BSTR3 |___ String --> BSTR1
|___ String --> BSTR4 |___ String --> BSTR2
So now what? We have two pointers into the BSTR buffer with the same memory address. What if we change one of them? Will the other reflect the change? What if one of them gets de-allocated? Will the other one point to a de-allocated memory location? What if after some time strings come and go in the BSTR buffer, what will be at that memory address one of them still holding?
Lets go gentle first and change only one character with the Mid()
statement, in arr(0)
. The Mid() statement is the only way to manipulate BSTR characters directly without de- and re-allocation in VBA. StrPtr() will not change:
Private Sub array_assignment_bug_string_array()
Dim arr() As String, arr2() As String
arr = return_arr_string
arr2 = arr
Mid(arr(0), 1, 1) = "!"
Debug.Print "arr(0): ", arr(0)
Debug.Print "arr2(0): ", arr2(0)
End Sub
It is really not good, arr2(0)
has changed too, the output:
arr(0): !ot me
arr2(0): !ot me
Assigning arr(0) to another string, again, and again and then print them sometimes
- shows no change (probably re-allocation never used the memory area again)
- shows random string fragments
- I also managed to crash Excel once completely in F8-step mode
Clearly a corrupted BSTR buffer.
Analysis
The following analysis was done with vbadump.bas (on GitHub). We dump these arrays at various points of execution calling dump_safearray <array>, 1
(1 is for printing array elements as well).
The test was run in 32-bit VBA7.1 (Office 2016) but the result is exactly the same on 64-bit.
In return_arr_string()
after ReDim:
dump_safearray return_arr_string, 1
Addr of SAFEARRAY = 0C450A38
SAFEARRAY STRUCTURE:
Offs Name Value Hex
0 cDims 0001
2 fFeatures 0000
4 cbElements 00000004
8 cLocks 00000000
12 pvData 09600330
16 cElements 00000003
20 lLbound 00000000
Flags: NONE
09600330: 0C4B71C4 String
09600334: 00000000 String
09600338: 00000000 String
In the caller after arr = return_arr_string
assignment:
dump_safearray arr, 1
Addr of SAFEARRAY = 0C450A38
SAFEARRAY STRUCTURE:
Offs Name Value Hex
0 cDims 0001
2 fFeatures 0000
4 cbElements 00000004
8 cLocks 00000000
12 pvData 09600330
16 cElements 00000003
20 lLbound 00000000
Flags: NONE
09600330: 0C4B71C4 String
09600334: 00000000 String
09600338: 00000000 String
The array assignment from function arr = return_arr_string
operates as expected, it's a simple pointer assignment of PSA:
- Addr of SAFEARRAY is the same
- address of elements in pvData is the same, so are the elements
And now for arr2 after arr2 = arr
array assignment:
dump_safearray arr2, 1
Addr of SAFEARRAY = 2067AC78
SAFEARRAY STRUCTURE:
Offs Name Value Hex
0 cDims 0001
2 fFeatures 0000
4 cbElements 00000004
8 cLocks 00000000
12 pvData 096005E8
16 cElements 00000003
20 lLbound 00000000
Flags: NONE
096005E8: 0C4B71C4 String
096005EC: 00000000 String
096005F0: 00000000 String
StrPtr(arr(0)): C4B71C4
StrPtr(arr2(0)): C4B71C4
The array assignment arr2 = arr
went almost well:
- a new SA allocated for arr2: Addr of SAFEARRAY is different
- a new block of data allocated for elements of arr2: pvData is different
- but elements has the same BSTR pointer!
Obviously only a simple memcopy happened without BSTR copies. But why?
Because these SAFEARRAY structures have no Features Flags set. The safearray returned by return_arr_string()
is incomplete, has no flags, NONE.
Here is a proper String Array (well, it takes a while to write this article so now I'm at my VBA64 computer again):
ReDim arr(2) As String
dump_safearray arr, 1
Addr of SAFEARRAY = 0000027CC4CC9750
SAFEARRAY STRUCTURE:
Offs Name Value Hex
-4 VbVarType 00000008
0 cDims 0001
2 fFeatures 0180
4 cbElements 00000008
8 cLocks 00000000
16 pvData 0000027CC517AFA0
24 cElements 00000003
28 lLbound 00000000
Flags: FADF_BSTR|FADF_HAVEVARTYPE
0000027CC517AFA0: 0000000000000000 String
0000027CC517AFA8: 0000000000000000 String
0000027CC517AFB0: 0000000000000000 String
The Feature Flags are properly set:
-
FADF_HAVEVARTYPE
: there is a type at -4 (as most of Arrays in VBA has it)
-
FADF_BSTR
: array contains BSTR pointers
The latter flag is the missing flag: COM array copy functions has to know this to perform String-by-String deep copy of the elements if set. Otherwise its simply a memcopy of the memory pointed by pvData.
So the question is, is this normal or a serious error in VBA?
Compile Error: Invalid ReDim
One remark here I read somewhere: implicitly declared Variants cannot hold an array, i.e. cannot ReDim-ed to be an array and there might be a connection here. The following function fv()
returns Variant. Variable v
ReDim is successful, as it is explicitly declared, but for the function variable itself raises Invalid ReDim:
Function fv() As Variant
Dim v As Variant
ReDim v(5) ' OK
ReDim fv(2) ' ERROR: Invalid ReDim
End Function
Well, our UDF function variable is sort of implicitely declared. Do all UDF functions has this feature? I do not know the internal operation of Win32/COM/Automation VBA uses.
Function returning Array of Variant
A reminder how Variant stores Strings and Arrays: it stores a pointer. Variant/String a BSTR pointer, Variant/Array() a Pointer to SAFEARRAY (PSA):
+--+--+-----+-----+-----+-----------------------+-----------------------+
|08 00| reserved | pointer | 0 | 64-bit VBA Variant
+--+--+-----+-----+-----+-----------------------+-----------------------+
|
| +--+--+--+--+--+--+--+--+-
+------> | | | | |
+--+--+--+--+--+--+--+--+-
BSTR unicode 2-byte characters
+--+--+-----+-----+-----+-----------------------+-----------------------+
|0C 20| reserved | PSA | 0 | 64-bit VBA Variant
+--+--+-----+-----+-----+-----------------------+-----------------------+
|
| +---------------+
+------> | SAFEARRAY |
| STRUCTURE |
+---------------+
The same bug happens when the Function returning Array is of type Variant. We do exactly the same as with the String array, but also put another Array in the second element.
The test code:
Private Sub fill_arr_variant(v())
v(0) = "string"
v(1) = Array("a", 5)
End Sub
Private Function return_arr_variant() As Variant()
ReDim return_arr_variant(2)
fill_arr_variant return_arr_variant
End Function
Private Sub array_assignment_bug_variant_array()
Dim varr(), varr2()
varr = return_arr_variant
dump_safearray varr, 1
varr2 = varr
dump_safearray varr2, 1
End Sub
The results are the same and worse. Not even BSTR pointers are the same but the SAFEARRAY pointers in two different Variant/Array variable are also the same: two variables hold the same memory address of an Array. Erase one or set the Variant to something else will definitely crash the application due to corrupted heap! I could do it several times during tests.
varr2 = varr
|____ Variant/String --> BSTR1 |____ Variant/String --> BSTR1
|____ Variant/Array --> SA1 |____ Variant/Array --> SA1
|___ Variant/String |___ Variant/String
|___ Variant/Integer |___ Variant/Integer
The remedy
What if we specify the type for ReDim?
Private Function return_arr_string() As String()
ReDim return_arr_string(2) As String ' still NO flags..
fill_arr_string return_arr_string
End Function
Nope. I got the same result, a SA without Flags.
The only solution, which helped:
Avoid ReDim the array variable of Functions returning Array in the function itself
If we move ReDim into the worker Sub, it creates a proper SAFEARRAY structure. Which is weird.. how come, fill_arr_string()
gets an address of a null-pointer, what is the difference for ReDim? There is something not clear going on.
Private Function return_arr_string() As String()
fill_arr_string return_arr_string
End Function
Private Sub fill_arr_string(arr() As String)
ReDim arr(2) ' OK
arr(0) = "not me"
End Sub
But now we are slowly loosing the whole point of a function returning Array.. our main procedure could just do this:
Dim arr() As String, arr2() As String
fill_arr_string arr
arr2 = arr
It is also a solution: avoid Functions returning Array - when String or Variant and later the array would be assigned to another array.