Files
30-seconds-of-code/blog_posts/code-anatomy-performant-python.md
2021-06-12 18:03:28 +03:00

64 lines
3.2 KiB
Markdown

---
title: Code Anatomy - Writing high performance Python code
type: story
tags: python,list,performance
authors: maciv,chalarangelo
cover: blog_images/walking-on-top.jpg
excerpt: Writing short, efficient Python code is not always straightforward. Read how we optimize our list snippets to increase performance using a couple of simple tricks.
---
Writing short and efficient Python code is not always easy or straightforward. However, it's often that we see a piece of code and we don't realize the thought process behind the way it was written. We will be taking a look at the [difference](/python/s/difference) snippet, which returns the difference between two iterables, in order to understand its structure.
Based on the description of the snippet's functionality, we can naively write it like this:
```py
def difference(a, b):
return [item for item in a if item not in b]
```
The above implementation may work well enough, but doesn't account for duplicates in `b`, making the code take more time than necessary in cases with many duplicates in the second list. To solve this issue, we can make use of the `set()` method, which will only keep the unique values in the list:
```py
def difference(a, b):
return [item for item in a if item not in set(b)]
```
This version, while it seems like an improvement, may actually be slower than the previous one. If you look closely, you will see that `set()` is called for every `item` in `a` causing the result of `set(b)` to be evaluated every time. Here's an example where we wrap `set()` with another method to better showcase the problem:
```py
def difference(a, b):
return [item for item in a if item not in make_set(b)]
def make_set(itr):
print('Making set...')
return set(itr)
print(difference([1, 2, 3], [1, 2, 4]))
# Making set...
# Making set...
# Making set...
# [3]
```
The solution to this issue is to call `set()` once before the list comprehension and store the result to speed up the process:
```py
def difference(a, b):
_b = set(b)
return [item for item in a if item not in _b]
```
Another option worth mentioning when analyzing performance for this snippet is the use of a list comprehension versus using something like `filter()` and `list()`. Implementing the same code using the latter option would result in something like this:
```py
def difference(a, b):
_b = set(b)
return list(filter(lambda item: item not in _b, a))
```
Using `timeit` to analyze the performance of the last two code examples, it's pretty clear that using list comprehension can be up to ten times faster than the alternative, as it's a native language feature that works very similar to a simple `for` loop without the overhead of the extra function calls. This explains why we prefer it, apart from readability.
This pretty much applies to most mathematical list operation snippets, such as [difference](/python/s/difference), [symmetric_difference](/python/s/symmetric-difference) and [intersection](/python/s/intersection).
**Image credit:** [Kalen Emsley](https://unsplash.com/@kalenemsley?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)