Larry Steinle

January 30, 2011

String.Split On Steroids

Filed under: RegEx,VS.Net — Larry Steinle @ 9:25 am
Tags: , , , , ,

The String.Split function uses a separator to divide a string value into an array of string values. Unfortunately the split function does not support text qualifiers. As a result, if the separator is contained within a text qualified block of characters, the text block gets split.

In this article we will create a new extension called, FullSplit, that will implement the same basic functionality of the String.Split function with the added support of text qualifiers and assignment operators. When assignment operators are used the return value will be of type StringDictionary where the left side of the operator represents the DictionaryEntry.Key property and the right side represents the DictionaryEntry.Value property.

Finally we will conclude the article by updating the code from a previous post, Extending IEnumerable, to support dictionary entries by separating the key/value pair with an operator.

This article represents the second edition of an article I previously wrote and posted on The Code Project titled, “Split Function that Supports Text Qualifiers.” In this version I am merely refactoring the code into a format that is cleaner and easier to support. At the time of the original article Extensions didn’t exist but a quick comparison of this article and the previous demonstrates how much easier they can make life!

To give credit where credit is due I would like to thank Abishek Bellamkonda once again for helping with the regular expression used in the original article and today’s post.

Creating the Interface

Our interface needs to support the following capabilities:

  • Split values where the text blocks may contain the separator into a string array.
  • Split values that contain a key/value pair into a StringDictionary. The key cannot contain a separator.
  • When values are split that contain a text qualifier remove the text qualifier so that the raw text block can easily be accessed.
  • Compress an array or list into a single text value optionally adding text qualifiers to the beginning and end of each value.
  • Compress a Dictionary into a single text value where the key/value pair is preserved by separating the pair with an operator character and optionally adding the text qualifier around the value.
  • Compress a collection of objects in the same manner as a Dictionary by specifying a key property name and value property name.

In order to achieve these objectives we will need to create a new module with the following routines:

Public Module ParseExtensions
  Public Function FullSplit(ByVal value As String, ByVal separator As Char, ByVal qualifier As Char) As String()
  Throw New NotImplementedException("FullSplit")
  End Function

  Public Function FullSplit(ByVal value As String, ByVal separator As Char, ByVal qualifier As Char, ByVal assignmentOperator As Char) As StringDictionary
  Throw New NotImplementedException("FullSplit")
  End Function

  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char, ByVal qualifier As Char, ByVal assignmentOperator As Char, ByVal keyPropertyName As String, ByVal valuePropertyName As String) As String
  Throw New NotImplementedException("Collapse")
  End Function
End Module

To make function calls easier on the consumer there will be a few helper functions:

Public Module ParseExtensions
  Public Function FullSplit(ByVal value As String, ByVal separator As Char) As String()
  Return FullSplit(value, separator, Nothing)
  End Function

  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char) As String
    Return Collapse(value, separator, Nothing, Nothing, Nothing, Nothing)
  End Function

  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char, ByVal propertyName As String) As String
  Return Collapse(value, separator, Nothing, Nothing, Nothing, propertyName)
  End Function

  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char, ByVal qualifier As Char) As String
    Return Collapse(value, separator, qualifier, Nothing, Nothing, Nothing)
  End Function

  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char, ByVal qualifier As Char, ByVal propertyName As String) As String
  Return Collapse(value, separator, qualifier, Nothing, Nothing, propertyName)
  End Function

  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char, ByVal qualifier As Char, ByVal assignmentOperator As Char) As String
  Return Collapse(value, separator, qualifier, assignmentOperator, Nothing, Nothing)
  End Function
End Module

Why Support an Assignment Operator?

In case you are wondering why we are supporting assignment operators in the split function consider command line arguments. The function, Environment.GetCommandLineArguments, returns a collection of values where the space represents the separator. The function, Environment.CommandLine, returns a string value.

There is no function that returns a key/value pair for command line arguments. With our new FullSplit function we will be able to support a dictionary of key/value pairs from the command line. In the event that there is no assignment operator we’ll simply treat the value as the key to the collection. Then you can test for on/off functionality by using the Dictionary.ContainsKey function and you can use the Dictionary.Item property to access the value for the necessary key making command line argument parsing much easier!

The FullSplit Function

Before we begin to get into the code remember to define the import statements for the necessary namespaces:

Imports System.Collections 'DictionaryEntry located here
Imports System.Collections.Specialized 'StringDictionary located here
Imports System.Text.RegularExpressions 'RegEx located here
Imports System.Reflection 'PropertyInfo located here
Imports System.Runtime.CompilerServices 'Extension attribute located here

 

Our custom FullSplit function is actually very simple and straight-forward. We begin by creating the regular expression with the separator and qualifier. Then the expression is applied to the value. Finally we remove the qualifier at the beginning and end of the value in the results.

Since there will be two separate routines for FullSplit, one returning the string array and the other returning a StringDictionary, I’ve moved the logic to remove the text qualifier into a separate routine. If needed these helper routines can be re-scoped as public and turned into extensions as well.

  Public Function FullSplit(ByVal value As String, ByVal separator As Char, ByVal qualifier As Char) As String()
    Dim regExPattern As String = String.Format("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", RegEx.Escape(separator), RegEx.Escape(qualifier))
    Dim results As String() = RegEx.Split(value, regExPattern, RegExOptions.Compiled Or RegExOptions.MultiLine Or RegExOptions.IgnoreCase)
    Return RemoveQualifier(results, qualifier)
  End Function

  Private Function RemoveQualifier(ByVal value As IEnumerable, ByVal qualifier As Char) As String()
    Return RemoveQualifier(value, qualifier, Nothing)
  End Function

  Private Function RemoveQualifier(ByVal value As IEnumerable, ByVal qualifier As Char, ByVal propertyName As String) As String()
    Dim results As New ArrayList
    Dim instanceValue As Object

    For Each Item As Object in value
      If propertyName IsNot Nothing AndAlso propertyName.Trim.Length > 0 Then
        Dim propertyItem As PropertyInfo = item.GetType.GetProperty(propertyName)
        instanceValue = propertyItem.GetValue(item, Nothing)
      Else
        instanceValue = item.ToString
      End If

      If instanceValue Is Nothing Then
        results.Add(instanceValue)
      Else
        results.Add(RemoveQualifier(instanceValue.ToString, qualifier))
      End If
    Next

    Return CType(results.ToArray(GetType(String)), String())
  End Function

  Private Function RemoveQualifier(ByVal value As String, ByVal qualifier As Char) As String
    Dim result As String = value
    If result.StartsWith(qualifier) AndAlso result.EndsWith(qualifier) Then result = result.SubString(1, result.Length - 2)
    Return result
  End Function

In the next code snippet we handle converting a string array with key/value pairs into a StringDictionary. We’ll recycle the previous FullSplit function to get a string array of key/value pairs splitting each key/value pair and adding them to the StringDictionary. Again the beginning and ending text qualifier will be removed prior to adding the value to the collection. If there is no assignment operator in the value then the original value will be saved as both a key and a value.

  Public Function FullSplit(ByVal value As String, ByVal separator As Char, ByVal qualifier As Char, ByVal assignmentOperator As Char) As StringDictionary
    Dim results As New StringDictionary

    For Each pair As String In FullSplit(value, separator, qualifier)
      Dim indexOfOperator As Integer = pair.IndexOf(assignmentOperator)
      Dim keyName As String = pair
      Dim keyValue As String = pair

      If indexOfOperator > 0 Then
        keyName = pair.Substring(0, indexOfOperator)
        keyValue = pair.Substring(indexOfOperator + 1)
      End If

  keyName = keyName.Replace(separator, String.Empty)
  keyValue = RemoveQualifier(keyValue, qualifier)

      results.Add(keyName, keyValue)
    Next

    Return results
  End Function

Updating the Collapse Function to Support Assignment Operators

We finish by modifying the Collapse extension for the IEnumerable interface. Replace the propertyName parameter with keyPropertyName and valuePropertyName parameters. The modified routine will continue to support IEnumerable that contain both standard variable types and objects.

For a collection of objects specify which property holds the value to save by passing the property name in either the keyPropertyName parameter or the valuePropertyName parameter. To save a key/value pair specify the property that contains the key with the keyPropertyName and the property that contains the value with valuePropertyName parameters.

As this routine is a little more complicated inline documentation has been provided to further explain how the logic is implemented in support of the functional requirements.


Public Module ParseExtensions
  Public Function Collapse(ByVal value As IEnumerable, ByVal separator As Char, ByVal qualifier As Char, ByVal assignmentOperator As Char, ByVal keyPropertyName As String, ByVal valuePropertyName As String) As String
    Dim unitedValue As String = String.Empty
    Dim instanceKey As Object
    Dim instanceValue As Object

    For Each item As Object in value
      'Check for DictionaryEntry
      If assignmentOperator <> Nothing _
      AndAlso (keyPropertyName Is Nothing OrElse keyPropertyName.Trim.Length = 0) _
      AndAlso (valuePropertyName Is Nothing OrElse valuePropertyName.Trim.Length = 0) _
      AndAlso TypeOf item Is DictionaryEntry Then
        keyPropertyName = "Key"
        valuePropertyName = "Value"
      End If

      'Get Key from Object
      If keyPropertyName IsNot Nothing AndAlso keyPropertyName.Trim.Length > 0 Then
        Dim propertyItem As PropertyInfo = item.GetType.GetProperty(keyPropertyName)
        instanceKey = propertyItem.GetValue(item, Nothing)
      Else
        instanceKey = item
      End If

      'Get Value from Object
      If valuePropertyName IsNot Nothing AndAlso valuePropertyName.Trim.Length > 0 Then
        Dim propertyItem As PropertyInfo = item.GetType.GetProperty(valuePropertyName)
        instanceValue = propertyItem.GetValue(item, Nothing)
      Else
        instanceValue = item
      End If

      If assignmentOperator <> Nothing Then
        'When uniting a key/value pair we have to have a key name.
        'In the event that a key is not specified use the value.
        If keyPropertyName IsNot Nothing AndAlso keyPropertyName.Trim.Length > 0 Then
          If instanceKey IsNot Nothing Then unitedValue &= instanceKey.ToString.Replace(separator, String.Empty)
        Else
          If instanceValue IsNot Nothing Then unitedValue &= instanceValue.ToString.Replace(separator, String.Empty)
        End If
        unitedValue &= assignmentOperator
      End If

      'When uniting a key/value pair we have to have a value name.
      'In the event that a value name is not specified use the key.
      'When uniting a regular value then if there is no value name specified use the key value.
      'That way if there is a key specified we get the value from the object otherwise we use the object.
      If qualifier <> Nothing Then unitedValue &= qualifier
      If valuePropertyName IsNot Nothing AndAlso valuePropertyName.Trim.Length > 0 Then
        If instanceValue IsNot Nothing Then unitedValue &= instanceValue.ToString
      Else
        If instanceKey IsNot Nothing Then unitedValue &= instanceKey.ToString
      End If
      If qualifier <> Nothing Then unitedValue &= qualifier
      If separator <> Nothing Then unitedValue &= separator
    Next

    If unitedValue.EndsWith(separator) Then unitedValue = unitedValue.Substring(0, unitedValue.Length - 1)

    Return unitedValue
  End Function
End Module

To demonstrate that the function works use the following test code. Remember that order is not guaranteed by StringDictionary.

Dim collapsedValue As String
Dim parsedValue() As String
Dim parsedKeyValue As StringDictionary

collapsedValue = "Test1,'Test2a','Test,2b',Test3"
parsedValue = collapsedValue.FullSplit(","c, "'"c)
collapsedValue = String.Empty
collapsedValue = parsedValue.Collapse(","c, "'"c)

collapsedValue = "Key1=Test1,Key2a='Test2a',Key2b='Test,2b',Key3=Test3"
parsedKeyValue = collapsedValue.FullSplit(","c, "'"c, "="c)
collapsedValue = String.Empty
collapsedValue = parsedKeyValue.Collapse(","c, "'"c, "="c)

Advertisement

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: