March 10, 2024

VBA Array assignment BUG

Last modified: Sun Mar 10 23:40:49 UTC+0100 2024 © Attila Tarpai

The VBA Array assignment BUG

I bumped into something weird and erroneous when I was trying to find a way to return arrays from functions without array copy.

Sample code in this GitHub repo.

Abstract

UDF functions returning arrays in a VBA Standard Module after ReDim hold an address of an incomplete SAFEARRAY structure without Feature Flags.

This could be fatal if the Array returned is used later in another array assignment and the array is of String or Variant type. Excel may crash and exit unexpectedly.

Tested in VBA 7.1 with 32-bit Office 2016 and 64-bit Office 365.

Intro

A common way to work with functions returning Array is to Dim or ReDim a local array in the function, fill with values and use array-assignment to the function before exit:

Private Function return_arr_string() As String()
    Dim arr(2) As String
    arr(0) = "hi"
    return_arr_string = arr
End Function

But array-assignment can be expensive. Arrays are value types in VBA and array assignment copies the whole array content. F. ex. for string arrays this involves several string copies. The new array allocates new BSTR for every element and copies the strings one by one.. something we would like to avoid here.

Function returning array

In VBA the function return value is a hidden local variable, as if it was declared as the very first local variable of the function. Functions returning array is no exception: a hidden Pointer to SAFEARRAY (PSA) is allocated (functions can return only Dynamic Arrays). Initially, before ReDim or assignment, a null-ptr and the usage, syntax is the same as with other Dynamic Array variables.

Private Function return_arr_string() As String()
    [Dim return_arr_string() As String]           <- as if declared here 

When this function returns, its PSA is available for the caller. When the value is used as rvalue in a subsequent assignment, no array copy is made only a simple pointer assignment. VBA is smart enough to recognize that the return value variable goes out of scope, thus avoiding unnecessary array copies.

OK, so lets ReDim the function variable itself, being a dynamic array variable, what can go wrong:

Private Function return_arr_string() As String()
    ReDim return_arr_string(2)
    return_arr_string(0) = "hi"
End Function

OOPS, that does not compile. There is an unlucky VBA language construct: both array-indexing and function call uses parenthesis (), and the line above interpreted as function call.

OK, no problem, lets pass the array to another procedure to work on. Passing arrays is always ByRef in VBA (it has to be) with only a pointer passed:

Private Function return_arr_string() As String()
    ReDim return_arr_string(2)
    fill_arr_string return_arr_string
End Function

Private Sub fill_arr_string(arr() As String)
    arr(0) = "hi"
End Sub

Now this is nice, goal achieved, we can return a newly allocated and populated array without making array copies.

But wait.

The BUG

Here comes the error, bug, language feature or something weird.

Lets see the code first with String Arrays then analyze it:

Private Sub array_assignment_bug_string_array()
    Dim arr() As String, arr2() As String
    arr = return_arr_string
    arr2 = arr
    Debug.Print Hex(StrPtr(arr(0)))
    Debug.Print Hex(StrPtr(arr2(0)))
End Sub

Private Function return_arr_string() As String()
    ReDim return_arr_string(2)
    fill_arr_string return_arr_string
End Function

Private Sub fill_arr_string(arr() As String)
    arr(0) = "not me"
End Sub

The code is simple - the explanation is not. We re-use the above construct for a local array, but after that make an array assignment to another local dynamic array.

It prints the following:

StrPtr(arr(0)):             27CC53F1C48
StrPtr(arr2(0)):            27CC53F1C48

This must be wrong, arr(0) and arr2(0) are 2 different String variables in 2 different Arrays holding the same BSTR memory address! (Test was run in 64-bit VBA7.1)

This was not supposed to happen:

arr2                     =    arr
 |___ String --> BSTR1        |___ String --> BSTR1
 |___ String --> BSTR2        |___ String --> BSTR2

This was supposed to happen. Array assignment with proper deep-copy:

arr2                     =    arr
 |___ String --> BSTR3        |___ String --> BSTR1
 |___ String --> BSTR4        |___ String --> BSTR2

So now what? We have two pointers into the BSTR buffer with the same memory address. What if we change one of them? Will the other reflect the change? What if one of them gets de-allocated? Will the other one point to a de-allocated memory location? What if after some time strings come and go in the BSTR buffer, what will be at that memory address one of them still holding?

Lets go gentle first and change only one character with the Mid() statement, in arr(0). The Mid() statement is the only way to manipulate BSTR characters directly without de- and re-allocation in VBA. StrPtr() will not change:

Private Sub array_assignment_bug_string_array()
    Dim arr() As String, arr2() As String
    arr = return_arr_string
    arr2 = arr
    Mid(arr(0), 1, 1) = "!"
    Debug.Print "arr(0): ", arr(0)
    Debug.Print "arr2(0): ", arr2(0)
End Sub

It is really not good, arr2(0) has changed too, the output:

arr(0):       !ot me
arr2(0):      !ot me

Assigning arr(0) to another string, again, and again and then print them sometimes

  • shows no change (probably re-allocation never used the memory area again)
  • shows random string fragments
  • I also managed to crash Excel once completely in F8-step mode

Clearly a corrupted BSTR buffer.

Analysis

The following analysis was done with vbadump.bas (on GitHub). We dump these arrays at various points of execution calling dump_safearray <array>, 1 (1 is for printing array elements as well).

The test was run in 32-bit VBA7.1 (Office 2016) but the result is exactly the same on 64-bit.

In return_arr_string() after ReDim:

dump_safearray return_arr_string, 1

Addr of SAFEARRAY = 0C450A38
SAFEARRAY STRUCTURE:
Offs          Name          Value Hex
 0            cDims         0001
 2            fFeatures     0000
 4            cbElements    00000004
 8            cLocks        00000000
 12           pvData        09600330
 16           cElements     00000003
 20           lLbound       00000000
Flags: NONE

09600330: 0C4B71C4 String
09600334: 00000000 String
09600338: 00000000 String

In the caller after arr = return_arr_string assignment:

dump_safearray arr, 1

Addr of SAFEARRAY = 0C450A38
SAFEARRAY STRUCTURE:
Offs          Name          Value Hex
 0            cDims         0001
 2            fFeatures     0000
 4            cbElements    00000004
 8            cLocks        00000000
 12           pvData        09600330
 16           cElements     00000003
 20           lLbound       00000000
Flags: NONE

09600330: 0C4B71C4 String
09600334: 00000000 String
09600338: 00000000 String

The array assignment from function arr = return_arr_string operates as expected, it's a simple pointer assignment of PSA:
- Addr of SAFEARRAY is the same
- address of elements in pvData is the same, so are the elements

And now for arr2 after arr2 = arr array assignment:

dump_safearray arr2, 1

Addr of SAFEARRAY = 2067AC78
SAFEARRAY STRUCTURE:
Offs          Name          Value Hex
 0            cDims         0001
 2            fFeatures     0000
 4            cbElements    00000004
 8            cLocks        00000000
 12           pvData        096005E8
 16           cElements     00000003
 20           lLbound       00000000
Flags: NONE

096005E8: 0C4B71C4 String
096005EC: 00000000 String
096005F0: 00000000 String

StrPtr(arr(0)):             C4B71C4
StrPtr(arr2(0)):            C4B71C4

The array assignment arr2 = arr went almost well:

  • a new SA allocated for arr2: Addr of SAFEARRAY is different
  • a new block of data allocated for elements of arr2: pvData is different
  • but elements has the same BSTR pointer!

Obviously only a simple memcopy happened without BSTR copies. But why?

Because these SAFEARRAY structures have no Features Flags set. The safearray returned by return_arr_string() is incomplete, has no flags, NONE.

Here is a proper String Array (well, it takes a while to write this article so now I'm at my VBA64 computer again):

ReDim arr(2) As String
dump_safearray arr, 1

Addr of SAFEARRAY = 0000027CC4CC9750
SAFEARRAY STRUCTURE:
Offs          Name          Value Hex
-4            VbVarType     00000008
 0            cDims         0001
 2            fFeatures     0180
 4            cbElements    00000008
 8            cLocks        00000000
 16           pvData        0000027CC517AFA0
 24           cElements     00000003
 28           lLbound       00000000
Flags: FADF_BSTR|FADF_HAVEVARTYPE

0000027CC517AFA0: 0000000000000000 String
0000027CC517AFA8: 0000000000000000 String
0000027CC517AFB0: 0000000000000000 String

The Feature Flags are properly set:

  • FADF_HAVEVARTYPE: there is a type at -4 (as most of Arrays in VBA has it)
  • FADF_BSTR: array contains BSTR pointers

The latter flag is the missing flag: COM array copy functions has to know this to perform String-by-String deep copy of the elements if set. Otherwise its simply a memcopy of the memory pointed by pvData.

So the question is, is this normal or a serious error in VBA?

Compile Error: Invalid ReDim

One remark here I read somewhere: implicitly declared Variants cannot hold an array, i.e. cannot ReDim-ed to be an array and there might be a connection here. The following function fv() returns Variant. Variable v ReDim is successful, as it is explicitly declared, but for the function variable itself raises Invalid ReDim:

Function fv() As Variant
    Dim v As Variant
    ReDim v(5)       ' OK
    ReDim fv(2)      ' ERROR: Invalid ReDim
End Function

Well, our UDF function variable is sort of implicitely declared. Do all UDF functions has this feature? I do not know the internal operation of Win32/COM/Automation VBA uses.

Function returning Array of Variant

A reminder how Variant stores Strings and Arrays: it stores a pointer. Variant/String a BSTR pointer, Variant/Array() a Pointer to SAFEARRAY (PSA):

+--+--+-----+-----+-----+-----------------------+-----------------------+
|08 00|    reserved     |      pointer          |           0           |   64-bit VBA Variant
+--+--+-----+-----+-----+-----------------------+-----------------------+
                              |
                              |        +--+--+--+--+--+--+--+--+-
                              +------> |     |     |     |     |
                                       +--+--+--+--+--+--+--+--+-
                                          BSTR unicode 2-byte characters


+--+--+-----+-----+-----+-----------------------+-----------------------+
|0C 20|    reserved     |         PSA           |           0           |   64-bit VBA Variant
+--+--+-----+-----+-----+-----------------------+-----------------------+
                              |
                              |        +---------------+
                              +------> |   SAFEARRAY   |
                                       |   STRUCTURE   |
                                       +---------------+

The same bug happens when the Function returning Array is of type Variant. We do exactly the same as with the String array, but also put another Array in the second element.

The test code:

Private Sub fill_arr_variant(v())
    v(0) = "string"
    v(1) = Array("a", 5)
End Sub

Private Function return_arr_variant() As Variant()
    ReDim return_arr_variant(2)
    fill_arr_variant return_arr_variant
End Function

Private Sub array_assignment_bug_variant_array()
    Dim varr(), varr2()
    varr = return_arr_variant
    dump_safearray varr, 1
    varr2 = varr
    dump_safearray varr2, 1
End Sub

The results are the same and worse. Not even BSTR pointers are the same but the SAFEARRAY pointers in two different Variant/Array variable are also the same: two variables hold the same memory address of an Array. Erase one or set the Variant to something else will definitely crash the application due to corrupted heap! I could do it several times during tests.

varr2                                        =   varr
 |____ Variant/String --> BSTR1                   |____ Variant/String --> BSTR1
 |____ Variant/Array  --> SA1                     |____ Variant/Array  --> SA1
                          |___ Variant/String                              |___ Variant/String
                          |___ Variant/Integer                             |___ Variant/Integer

The remedy

What if we specify the type for ReDim?

Private Function return_arr_string() As String()
    ReDim return_arr_string(2) As String  ' still NO flags..
    fill_arr_string return_arr_string
End Function

Nope. I got the same result, a SA without Flags.

The only solution, which helped:

Avoid ReDim the array variable of Functions returning Array in the function itself

If we move ReDim into the worker Sub, it creates a proper SAFEARRAY structure. Which is weird.. how come, fill_arr_string() gets an address of a null-pointer, what is the difference for ReDim? There is something not clear going on.

Private Function return_arr_string() As String()
    fill_arr_string return_arr_string
End Function

Private Sub fill_arr_string(arr() As String)
    ReDim arr(2)       ' OK
    arr(0) = "not me"
End Sub

But now we are slowly loosing the whole point of a function returning Array.. our main procedure could just do this:

    Dim arr() As String, arr2() As String
    fill_arr_string arr
    arr2 = arr

It is also a solution: avoid Functions returning Array - when String or Variant and later the array would be assigned to another array.


No comments:

Post a Comment