A short post about itertools.count()
(an infinite counter).
What does itertools.count()
do?
itertools.count(start=0, step=1)
creates an iterator that returns evenly spaced numbers.
>>> foo = itertools.count()
>>> foo
count(0)
>>> next(foo)
0
>>> next(foo)
1
>>> next(foo)
2
Above snippet uses default values for start
and step
but you can adjust them if needed:
>>> foo_with_start = itertools.count(5)
>>> next(foo_with_start)
5
>>> next(foo_with_start)
6
>>> foo_with_start_and_step = itertools.count(5, 10)
>>> next(foo_with_start_and_step)
5
>>> next(foo_with_start_and_step)
15
>>> next(foo_with_start_and_step)
25
It’s mandatory to use integers as arguments - non-integer values will work fine too:
>>> foo_floats = itertools.count(1.0, .25)
>>> next(foo_floats)
1.0
>>> next(foo_floats)
1.25
>>> next(foo_floats)
1.5
>>> next(foo_floats)
1.75
>>> next(foo_floats)
2.0
Usage examples
Here are some situations where you can leverage itertools.count()
.
Adding counter to existing data
Let’s say you have a list of datapoints and you need to zip it with a counter. This can be easily done with itertools.count()
:
>>> data = [1.245, 1.123, 1.023, 1.231, 1.052]
>>> list(zip(itertools.count(1), data))
[
(1, 1.245),
(2, 1.123),
(3, 1.023),
(4, 1.231),
(5, 1.052)
]
Emulating database sequence object
When writing scripts for generating fake data for various kind of tests, I’d often write a generator function that would emulate the database sequence object:
def db_sequence() -> int:
num = 1
while True:
yield num
num += 1
Upon calling with next
, this generator object would return consecutive integer numbers:
>>> seq = db_sequence()
>>> next(seq)
1
>>> next(seq)
2
>>> next(seq)
3
And final exemplary script for generating fake records could look like this:
from random import choice
from typing import Any, Dict, List
FIRST_NAMES = ["John", "Adam", "Philip"]
LAST_NAMES = ["Doe", "Sandler", "Glass"]
def db_sequence() -> int:
num = 1
while True:
yield num
num += 1
db_seq = db_sequence()
def generate_records(batch_size: int=1) -> List[Dict[str, Any]]:
return [
{
"id": next(db_seq),
"first_name": choice(FIRST_NAMES),
"last_name": choice(LAST_NAMES),
}
for _ in range(batch_size)
]
And now, when calling generate_records
function multiple times, id value is increasing with every record:
>>> generate_records(5)
[{'id': 1, 'first_name': 'Adam', 'last_name': 'Doe'},
{'id': 2, 'first_name': 'Philip', 'last_name': 'Glass'},
{'id': 3, 'first_name': 'John', 'last_name': 'Doe'},
{'id': 4, 'first_name': 'John', 'last_name': 'Glass'},
{'id': 5, 'first_name': 'John', 'last_name': 'Sandler'}]
>>> generate_records(5)
[{'id': 6, 'first_name': 'John', 'last_name': 'Sandler'},
{'id': 7, 'first_name': 'John', 'last_name': 'Sandler'},
{'id': 8, 'first_name': 'Philip', 'last_name': 'Sandler'},
{'id': 9, 'first_name': 'Philip', 'last_name': 'Sandler'},
{'id': 10, 'first_name': 'Adam', 'last_name': 'Glass'}]
Whole definition of db_sequence()
function can be removed and db_seq
variable should be assigned with itertools.count()
:
import itertools
from random import choice
from typing import Any, Dict, List
FIRST_NAMES = ["John", "Adam", "Philip"]
LAST_NAMES = ["Doe", "Sandler", "Glass"]
db_seq = itertools.count(1)
def generate_records(batch_size: int=1) -> List[Dict[str, Any]]:
return [
{
"id": next(db_seq),
"first_name": choice(FIRST_NAMES),
"last_name": choice(LAST_NAMES),
}
for _ in range(batch_size)
]
There’s no difference in behaviour of these two versions and the latter one is a bit shorter.
Conclusion
This post described on of the infinite iterators available in itertools
module (count()
). I’ll describe the other two (count()
and repeat()
) in a separate posts.
Best Regards,
Kuba
(6/52) This is a 6th post from my blogging challenge (publishing 52 posts in 2024).