Feature | split() | concat() | array_zip() | explode() |
---|---|---|---|---|
Description | Splits a string column into an array of substrings based on a delimiter. | Concatenates multiple arrays or strings into a single array or string. | Zips multiple arrays element-wise into a single array of structs. | Flattens an array into multiple rows, with one row per element in the array. |
Input Type | String | Arrays/Strings | Arrays | Array |
Output Type | Array of Strings | Array or String | Array of Structs | Multiple Rows (with original columns) |
Key Use Cases | Splitting strings based on delimiters (e.g., splitting comma-separated values). | Merging multiple arrays into one, or multiple strings into one. | Aligning data from multiple arrays element-wise, treating each set of elements as a row (struct). | Flattening arrays for row-by-row processing (e.g., after zipping or concatenating arrays). |
Example | split(col("string_col"), ",") → ["a", "b", "c"] | concat(col("array1"), col("array2")) → ["a", "b", "x", "y"] | array_zip(col("array1"), col("array2")) → [{'a', 1}, {'b', 2}] | explode(col("array_col")) → Converts an array into separate rows. |
Handling Different Lengths | Not applicable | If input arrays have different lengths, the shorter ones are concatenated as-is. | If input arrays have different lengths, the shorter ones are padded with null . | Not applicable. Converts each element into separate rows, regardless of length. |
Handling null values | Will split even if the string contains null values (but may produce empty strings). | If arrays contain null , concat() still works, returning the non-null elements. | Inserts null values into the struct where input arrays have null for a corresponding index. | Preserves null elements during the explosion but still creates separate rows. |
Breakdown:
split()
is used to break a single string into an array of substrings.concat()
merges arrays or strings, resulting in a single array or string.array_zip()
aligns elements from multiple arrays, creating an array of structs.explode()
takes an array and converts it into multiple rows, one for each array element.
Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca
(remove all space from the email account 😊)