Parse (split) or Tokenize a String in C++ Using Delimiter

In this tutorial, you will learn how to parse a string or in other words how to split a string in C++ using a string delimiter.

Tokenizing or Parsing or Splitting a string refers to the process of dividing a string according to delimiters. There are numerous approaches to tokenizing or splitting a string. There is no inbuild function present in the C++ standard library to perform this task.

I will be discussing various ways with code examples to demonstrate how you can split a string in C++. And you can use whichever method suits best for your application.

Method 1: Using Find and SubStr Function in of String Library to Parse String

You can use the std:string:find() function to get the index of the delimiter from which you need to split the string and then use the function std:string:substr() to extract or take out the required parsed or tokenized string you need for your code.

Let us see the above function in working in the below code example. And how you can use it too in C++ code. You need to import the string library to get the above-mentioned function to be working.

#include<bits/stdc++.h>
#include<string.h>
using namespace std;

int main(){

    //Initializing the String to be Parse or Tokenized.
    string givenString = "Split or Parse or Tokenize this String";

    //Initializing the delimeter and size of Token
    string requiredDelimeter = " ";

    size_t tokenPos = 0;

    //Splitting the given String using substr in tokens
    string tokenizedString;

    while((tokenPos = givenString.find(requiredDelimeter))!= string::npos){
        tokenizedString = givenString.substr(0, tokenPos);

        cout<<tokenizedString<<endl;

        //Reducing the Original String to New String where tokenized string is removed
        givenString.erase(0, tokenPos + 1);
    }
    cout<<givenString<<endl;

    return 0;
}

Output:

Split
or
Parse
or
Tokenize
this
String

As you can see in the above code. We are looping till we are able to find the delimiter in the string and print it. For sake of the program, we are also reducing the original string and omitting the split-out string.

Similarly instead of sp[ace you can use any other delimiter you have in your string. For example, let’s say you have a delimiter as @ in your string. Let us see in the example modified string how you can get the tokenized string with the different delimiter.

#include<bits/stdc++.h>
#include<string.h>
using namespace std;

int main(){

    //Initializing the String to be Parse or Tokenized.
    string givenString = "[email protected]@[email protected]@[email protected]@String";

    //Initializing the delimeter and size of Token
    string requiredDelimeter = "@";

    size_t tokenPos = 0;

    //Splitting the given String using substr in tokens
    string tokenizedString;

    while((tokenPos = givenString.find(requiredDelimeter))!= string::npos){
        tokenizedString = givenString.substr(0, tokenPos);

        cout<<tokenizedString<<endl;

        //Reducing the Original String to New String where tokenized string is removed
        givenString.erase(0, tokenPos + 1);
    }
    cout<<givenString<<endl;

    return 0;
}

Output:

Split
or
Parse
or
Tokenize
this
String

Method 2: Using Regex to Split or Parse the String Using Delimiter

You can use regex to split the string or get it tokenized easily. In the below example code I will show you how you can implement this regex solution to parse strings in C++.

#include<bits/stdc++.h>
#include<string.h>
#include<regex.h>
#include<iostream>
using namespace std;

vector<string> splitUsingRegex(const string givenString, const regex delimeter)
{
    sregex_token_iterator it{ givenString.begin(), givenString.end(), delimeter, -1 };
    vector<string> tokenizedString{ it, {} };
  
    // Removing Empty Strings
    tokenizedString.erase( 
        remove_if(
            tokenizedString.begin(), tokenizedString.end(),[](string const& s) {
                return s.size() == 0;
                }
        ),
    tokenizedString.end());
  
    return tokenizedString;
}


int main(){

    //Initializing the String to be Parse or Tokenized.
    string givenString = "[email protected]@[email protected]@[email protected]@String Using Regex";

    //Initializing the delimeter and size of Token
    const regex requiredDelimeter(R"([\s|@]+)");

    const std::vector<std::string> tokenized = splitUsingRegex(givenString, requiredDelimeter);

    for(auto token : tokenized)
        cout<<token<<endl;

    return 0;
}

Output:

Split
or
Parse
or
Tokenize
this
String
Using
Regex

As you can see in the above code, we are able to use multiple delimiters with help of regex and are able to split the string based on “@” and spaces as well. Though this is the most intuitive solution for this problem certainly this code takes a lot of time.

Hence using regex can slow down your code when compared to the std::string::find() and std:string::substr() usage in the first method.

Method 3: Using String Stream function to Split the String using delimiter

You can use the string streamer method from sstream.h[1] the library. It has an inbuilt string stream that helps you parse the string in a stream and let you split it using getline method of string and iterator whenever delimiter is detected in the given string.

In the below code example we are implementing the string split or tokenization in C++ using String Stream. This method runs a lot faster when compared to the method discussed above using Regex. And uses a lot fewer code lines.

#include<bits/stdc++.h>
#include<string.h>
#include<sstream>
using namespace std;


int main(){

    //Initializing the String to be Parse or Tokenized.
    string givenString = "[email protected]@[email protected]@[email protected]@String";

    //Initializing the delimeter and Toke
    string requiredDelimeter = "@";
    
    string tokenString;

    //Getting Stream String Method
    istringstream newStreamer(givenString);

    //Initialize a String Iterator
    string::iterator it;

    while(getline(newStreamer, tokenString, *(it=requiredDelimeter.begin()))){
        
        cout<<tokenString<<endl;

        ++it;
    }
    return 0;
}

Output:

Split
or
Parse
or
Tokenize
this
String
Parse (split) or Tokenize a String in C++ Using Delimeter

Conclusion

That is all for this post on how to parse (split) a string in C++. As you saw there are different ways by which you can achieve that in the above code.

Let me know in the comment section if you need more methods to be added to this tutorial. Or you have or know any code that is much faster than the methods mentioned above.

Then please follow us on Facebook and Twitter. Let us know the questions and answer you want to cover in this blog.

Further Read:

  1. Different Ways to Sort a Vector In C++

Leave a Reply

Your email address will not be published. Required fields are marked *