How to count characters in groups in a String – Python for ML

If you were to solve this using C/C++ you’ll have to use only basic features available. Python has many features that helps us solve this problem easily.

In this post I’ll explain how can we count characters in a given string and output that in an expected format.

The Problem

Input : aaabbccccd

Output : a3b2c3d1

Understanding the Problem

Input to the program is a string or text

This is not a fixed length text that means your program should be able to handle input of any length

Your task is to count the number of characters in each group and output them in same order.

The Approach

This is a simple problem and there are multiple ways to solve this in Python.

First, our program should be able to read each character in the input string. This can be achieved through a for loop

Second, we need to count the number of occurrences of each character in each loop and store it somewhere in each loop. No, we can’t use four different variables to store this because characters are not same in every variation of input. For example, if input is – zzzzxxxyy then output should be z4x3y2.

We’ll use ‘dictionary’ in python to do this job. Dictionary is a unique collection of key value pair where key will be unique and hence it will be suitable for our job. You can read more about it here.

Note : You can use Counter class from ‘collections’ but order of input characters won’t be maintained. Hence we’ll not use it in our example.

Third, output of dictionary will be in key value format. Hence we’ll have to re-format it to suit our needs.

Show me the Code

''' '''
''' Code to count number of characters withing each group, in a string '''

input = "aaabbccccd"
dictionary = {}

for i in range(len(input)):
    dictionary[input[i]] = dictionary.get(input[i], 0)+1

output = ""
for key in dictionary:
    output = output + f'{key}{dictionary[key]}'

print(output)

# output : a3b2c4d1
#

Explanation

Variables : We have declared three variables,

input – to hold the input value

dictionary – to store the count for each characters in the input string

output – To store the formatted string required


for i in range(len(input)):

for loop reads each character one by one. There are two more functions being called here.

len() – returns actual length of input string; in our case it is 10.

range() – is a python specific function that returns a range object which is iterable. Most common use of range() function in Python is to iterate sequence type (List, string etc.. ) with for and while loop.

Note that you just can’t use for i in len(input)


Let’s see the line inside the for loop.

dictionary[input[i]] = dictionary.get(input[i], 0)+1

Dictionary object has function called ‘get()’ that returns corresponding value for the the key that is provided. It can be made to return any other value, if the value for key is unavailable. This will help us in assigning value for corresponding key for the first time.

Iteration 1:

Input value is ‘a’. Dictionary is empty.

Dictionary.get(input[i], 0) will return zero because ‘a’ is unavailable.

Hence we add, ‘1’ to the key ‘a’

Dictionary = {‘a’:1}

Iteration 2:

Input value is again ‘a’.

Dictionary.get(input[i], 0) will return ‘1’ because there’s already a value.

Hence we add, ‘1’ to that and store ‘2’ in key ‘a’

Dictionary = {‘a’:2}

Iteration 3:

……..

Iteration 4:

Input value now is ‘b’.

Dictionary.get(input[i], 0) will return zero because ‘b’ is unavailable.

Hence we add, ‘1’ to the key ‘b’

Dictionary = {‘a’:3, ‘b’:1}

This will continue and after 10 iterations, dictionary will be {‘a’:3,’b’:2,’c’:4,’d’:1}


Next task is to print the dictionary in required format.

output = ""
for key in dictionary:
    output = output + f'{key}{dictionary[key]}'

We declare an empty string ‘output’ to store the final string.

for loop would iterate over each key in our dictionary (a,b,c,d). In each iteration, we take key value from dictionary and join them without any prefix/suffix and finally add it to our ‘output’ variable.

We use f-strings to format the output.

F-strings are the new string formatting mechanism introduced in PEP 498. F- Strings provide convenient way to embed python expressions inside string literals.

Finally expected output will available in the variable ‘output’

print(output)
#output : a3b2c4d1

Conclusion

Feel free to debug this code on your own. Change the input and see of you get the expected output. Use print statements at every loop to understand variables and their functionality.

I hope this explanation helps you to understand the program better.