PowerShell: Working With Arrays
Posted on: December 16th, 2024
Handling arrays in PowerShell is one of those really simple things that you probably don't give a lot of thought to. I never gave it much thought either until relatively recently, when run times started to become something I needed to address. Rethinking how I worked with arrays was the first thing that came to mind.
What Is An Array?
Let's start from the beginning. An array is a collection of values or objects, all grouped together in a single container that can be referenced, filtered, exported, etc. There are several ways you can create an array, either empty or with data already in them. We'll start there before adding/removing.
### Create an empty array
### Standard Array
$array = @()
### ArrayList
$array = [System.Collections.ArrayList]::new()
### Generic List, data type of string
[System.Collections.Generic.List[string]]::new()
What's the difference with these, and why may one be significant over the other? Using @() is easy, why would I do it any other way? The answer to that is exactly what made me want to cover this topic.
A Standard Array, @(), has a fixed size. This means it exists as is. No changes may be made. That may be confusing. We can add and remove things from it, right? Technically, yes. Every time you add or remove something from a Standard Array, the entire array is rebuilt from scratch. That starts to mean something when you're dealing with thousands, tens of thousands, millions of individual values or objects.
An ArrayList and Generic List both support dynamic sizing, meaning you can add and remove objects in a manner more closely to what you may be imagining. The entire array does not need to be rebuilt at each step. The difference between these two is data type flexibility. An ArrayList does not care what you throw in to it. You can have an integer, 5, a Boolean, $false, and a string, "test", all in a single array. Generic Lists are restrictive on what data types can be added to them, declared within the brackets. Data types can be any of the standard data types; int, string, char, datetime, bool, etc.
Adding To And Subtracting From Arrays
How you add to or remove from your array depends on what kind of array you are using. If you are using a Standard Array, adding a value or object is as easy as "$array += $thing". This triggers a full rebuild of the array, causing potentially long processing times if you're adding a large amount in a foreach loop. Removing from a Standard Array can only be done by recreating the array by piping to a Where-Object.
### Create a Standard Array with users in it
$array = @(
[PSCustomObject]@{
givenName = "John"
Surname = "Smith"
},
[PSCustomObject]@{
givenName = "Charles"
Surname = "Miner"
}
)
### Add a new user to the array
$array += [PSCustomObject]@{
givenName = "Bill"
Surname = "Brown"
}
### Remove Charles (Any entry opposite of givenName and Surname equalling Charles and Miner, respectively)
$array = $array | where {!($_.givenName -eq 'Charles' -and $_.Surname -eq 'Miner')}
Working with ArrayLists and Generic Lists are just as easy, with minor differences. The only drawback being that the array must begin empty and the add and remove functions cannot handle more than one input at a time. Once you have your empty array created, using $array.Add($value) will add to the array and $array.Remove($value). The $value when adding can be anything (mind the data type if using a Generic List), though it will have to be exact when removing. You can do this by piping the array to a Where-Object either in the Remove function directly or separately in a variable. See below for examples:
### Create ArrayList
$users = [System.Collections.ArrayList]::new()
### Add two users
$users.Add([PSCustomObject]@{
givenName = "John"
Surname = "Smith"
}
)
$users.Add([PSCustomObject]@{
givenName = "Bill"
Surname = "Brown"
}
)
### Remove John, query within Remove
$users.Remove(($users | where {$_.givenName -eq "John"}))
### Create Generic List
$users2 = [System.Collections.Generic.List[PSCustomObject]]::new()
### Add two users
$users2.Add([PSCustomObject]@{
givenName = "John"
Surname = "Smith"
}
)
$users2.Add([PSCustomObject]@{
givenName = "Bill"
Surname = "Brown"
}
)
### Remove John, use variable for Remove
$john = $users2 | where {$_.givenName -eq "John"}
$users2.Remove($john)
What Was That About Saving Time?
I regularly work with sets of thousands of users, sometimes tens of thousands. Your scenario may be similar, or tens of thousands of client systems, or hundreds of thousands of logs on your domain controllers. Depending on your situation, you may want a more efficient way of managing arrays. For me it was streamlining my Azure runbooks, where you're billed for every minute of run time. I could see a good portion of my run time was building out the arrays. If you're like me, you learned @() to create the array, use a foreach loop to make your transformations, and use += to add or filter with Where-Object to remove. Easy. Why fix what isn't broken? Let's run an experiment to address that.
If you have a few minutes to spare, copy and run the below script. You will start with a dataset of integers, 1 through 100,000. Simple data, nothing transformative. You will create a Standard Array, add all values to the array individually, then create an ArrayList and do the same, then a Generic List and do the same. Each portion is timed and will display the time in seconds. Your mileage may vary, and each run may take a slightly different amount of time, but you will likely fall to the same conclusion each time. Standard Arrays are horribly inefficient for repetitive changes.
$testNumbers = @(1..100000)
### Standard Array test
$arrTest = @()
$start1 = Get-Date
foreach ($number in $testNumbers) {
$arrTest += $number
}
$stopwatch1 = ((Get-Date)-$start1)
$time1 = $stopwatch1.TotalSeconds
Write-Host "Time to complete additions in seconds: $time1" -ForegroundColor Green
### ArrayList Test
$arrTest = [System.Collections.ArrayList]::new()
$start2 = Get-Date
foreach ($number in $testNumbers) {
$arrTest.Add($number) | Out-Null
}
$stopwatch2 = ((Get-Date)-$start2)
$time2 = $stopwatch2.TotalSeconds
Write-Host "Time to complete additions: $time2" -ForegroundColor Green
### Generic List Test
$arrTest = [System.Collections.Generic.List[int]]::new()
$start3 = Get-Date
foreach ($number in $testNumbers) {
$arrTest.Add($number) | Out-Null
}
$stopwatch3 = ((Get-Date)-$start2)
$time3 = $stopwatch3.TotalSeconds
Write-Host "Time to complete additions: $time3" -ForegroundColor Green
I've run this a handful of times while writing this article to get an average and prove this is consistent. Standard Arrays are averaging around 220 seconds, ArrayLists are averaging around 1 second, and Generic Lists are averaging around 10 seconds. A 22,000% and 2,200% reduction in processing time, respectively. The difference is huge. If you are not concerned with the type of data being thrown in your arrays, ArrayLists can drastically cut down on processing time. If you want some peace of mind knowing only your desired data type is being picked up, Generic Lists may be for you. Standard arrays are still great if you are declaring a static grouping of data, but you may want to avoid it if you are doing a lot of adding or subtracting from them.