Fluent Python Chapter 10. 시퀀스 해킹, 해시, 슬라이스

Computer Engineering/Fluent Python 정리

Fluent Python Chapter 10. 시퀀스 해킹, 해시, 슬라이스

jordan.bae 2022. 1. 31. 18:22

Chapter1의 Introduction 부분에서 이야기 한 것 처럼 지난 5년간 다양한 언어나 소프트웨어를 공부하고 이용하여 소프트웨어를 개발했는데 이것 저것 하다보니 자주 사용하는 언어임에도 불구하고 파이썬을 잘 활용하고 있느냐에 대한 답변을 자신있게 하기 어렵다고 느껴서 Fluent Python이라는 책을 공부하며 정리하고 있습니다. 올 해에는 새로운 기술들도 좋지만 기존에 활용하던 언어나 프레임워크 그리고 소프트웨어를 더 잘 사용할 수 있도록 깊게 공부할 수 있는 한해를 만들고 싶은 소망을 가지고 있습니다. 21장 까지 정리를 성공하고 맛있는걸 먹으면서 스스로 축하할 날이 어서 왔으면 좋겠네요!

지난 chapter 정리한 포스팅

Fluent Python Chapter 1. 파이썬 데이터 모델 (Feat. 일관성 있는 언어)

Fluent Python Chapter 2-1. Sequence (Sequence의 분류, listcomp, generator, tuple, slicing

Fluent Python Chapter 2-2. Sequence (bisect, deque, memoryview)

Fluent Python Chapter 3. Dictionary (feat. hashtable)

Fluent Python Chapter 4. 텍스트와 바이트 (feat. 깨지지마라..)

Fluent Python Chapter 5. 일급 함수

Fluent Python Chapter 6. 일급 함수 디자인 패턴

Fluent Python Chapter 7. 함수 데커레이터와 클로저 (feat. 메타프로그래밍)

Fluent Python Chapter 8. 객체 잠조, 가변성, 재활용

Fluent Python Chapter 9. 파이썬스러운 객체

Chapter10 - Introduction

드디어 10장 까지 왔네요! 10장의 title은 시퀀스 해킹, 해시, 슬라이스 입니다.

9장에서 다웠던 2차원 벡터 클래스를 N차원 벡터 클래스로 확장하면서 시퀀스 프로토콜을 구현하면서 슬라이스 객체에 대해서 살펴보고, 포함된 요소 값을 모두 고려하는 해싱을 구현하면서 reduce함수도 다시 한 번 살펴봅니다.

N차원 Vector 클래스

책에서는 전에 만들었던 2차원 벡터 클래스를 상속받는 것이 아니라 구성을 이용해서 구현합니다. 그 이유는 덕 타이핑에 대해서 설명하고 조금 더 magic method들을 살펴보기 위해서 라고 생각합니다. 또, 파이썬에서는 어떤 객체가 어떤 프로토콜을 부분적으로 (즉, 필요한 부분만) 구현하는 것을 지향하는 것 같습니다. + 책에서 생성자가 호환되지 않으므로 상속받는 것이 좋지 않다고 소개하고 있습니다.

먼저 N차원으로 확장된 Vector 클래스 코드를 살펴보겠습니다.

from array import array
import reprlib
import math

class Vector:
    typecode = 'd'
    
    def __init__(self, components):
        # 보호된 객체 속성이라는 것을 표시하기 위해서 _사용.
        self._components = array(self.typecode, components)
        
    def __iter__(self):
        return iter(self._components)
    
    # 개발자를 위한 문자열 정보
    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find('['):-1]
        return 'Vector({})'.format(components)
    
    # 사용자를 위한 문자열 정보
    def __str__(self):
        return str(tuple(self))
    
    def __bytes__(self):
        return (bytes([ord(self.typecode)]) + bytes(self._components))
    
    def __eq__(self, other):
        return tuple(self) == tuple(other)
    
    def __abs__(self):
        return math.sqrt(sum(x*x for x in self))
    
    def __bool__(self):
        return bool(abs(self))
    
    @classmethod
    def frombytes(cls, octects):
        typecode = chr(octects[0])
        memv = memoryview(octects[1:]).cast(typecode)
        return cls(memv)
        
# __init__()과 __repr__() 테스트
vector = Vector([1.2, 7.1, 4.3, 2.4])

vector
Vector([1.2, 7.1, 4.3, 2.4])


vector2 = Vector(range(10))

vector2
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])

repr(vector2)
'Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])'

2차원 벡터 클래스에서 그대로인 method도 있고, 달라진 부분도 있습니다.

몇 가지만 살펴보겠습니다.

- repr()은 debugging에 사용되는데 객체가 커다란 경우 콘솔에 너무 많은 데이터를 표시할 수 있습니다. 제한된 길이로 표현하기 위해서 reprlib 모듈을 사용해서 축약해서 표현하였습니다.

- 보호된 객체 속성을 표현하기 위해서 self._components 로 정의.

프로토콜과 덕 타이핑

1장에서 시퀀스 프로토콜을 지원하는 시퀀스형 객체의 클래스를 만들 때 상속이 아닌 단지 시퀀스 프로토콜에 따르는 메서드를 구현하면 된다고 소개되어 있었습니다. 객체지향 프로그래밍에서 프로토콜은 문서에만 정의되어 있고 실제 코드에서는 정의되지 않는 비공식 인터페이스입니다. 예를 들어 저희가 구현할 시퀀스 프로토콜은 파이썬에서 __len__()과 __getitem__() 메서드를 동반할 뿐입니다. 어떤 클래스를 상속 받았냐가 중요한 것이 아니고 필요한 메서드만 구현이 되어 있으면 됩니다.

이러한 메커니즘을 덕 타이팅 이라고 부릅니다. 어떤 동작을 하면 그 타입의 객체입니다.

프로토콜이 비공식적(문서에 정의되어 있고, 코드에 정의되어 있지 않음.)이기 때문에 필요에 따라 일부만 구현할 수도 있습니다. 반복만 지원하고 싶으면 __getitem__()메서드만 구현하면 되며, __len__() 메서드를 구현할 필요는 없습니다.

슬라이싱

먼저, 시퀀스 프로톨콜을 지원하기 위해 __len__()과 __getitem() 메서드를 구현합니다.

class Vector:
	...
	# 중량
    
	def __len__(self):
        return len(self._components)
    
    def __getitem__(self):
        return self._components[index]
        
# sequence와 관련된 연산
vector = Vector(range(10))

len(vector)
10

vector[0], vector[1]
(0.0, 1.0)

vector[1:]
array('d', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])

슬라이싱이 지원은 되지만 슬라이싱의 반환하는 객체가 Vector객체가 되면 좋을 것 같다. 이를 위해서는 슬라이싱 연산을 배열의 슬라이싱에 위임하지 말고 __getitem__을 수정해야 한다.

먼저, 슬라이싱의 작동 방식을 살펴보면 아래와 같다.

class MySeq:
    def __getitem__(self, index):
        return index
    
s = MySeq()

s[1]
1

s[1:4:1], s[1:2], type(s[1:])
(slice(1, 4, 1), slice(1, 2, None), slice)

dir(slice)
['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'indices',
 'start',
 'step',
 'stop']
 
help(slice.indices)Help on method_descriptor:

indices(...)
    S.indices(len) -> (start, stop, stride)
    
    Assuming a sequence of length len, calculate the start and stop
    indices, and the stride length of the extended slice described by
    S. Out of bounds indices are clipped in a manner consistent with the
    handling of normal slices.

위의 코드로 부터 아래와 같은 부분을 파악 할 수 있다.

- slice 표현식을 사용하면 slice 객체가 return된다.

- slice는 start, step, stop속성을 가지고 있다.

- slice는 내장형 자료형이다.

- help 함수를 통해 indices 메서드를 살펴보면 len길이를 받아서 start, stop, stride를 계산한다.

indices함수를 테스트 해보면 아래와 같다.

# 전체 길이가 5일 때 slice(None, 10, 2)라는 표현식을 0:5:2로 변환한다.
# 즉 [1,2,3,4,5][:10:2] = [1,2,3,4,5][:5:2]
slice(None, 10, 2).indices(5)
(0, 5, 2)

[1,2,3,4,5][:5:2], [1,2,3,4,5][:10:2]
([1, 3, 5], [1, 3, 5])

이제 슬라이싱이 어떻게 동작하는지 살펴봤으니 __getitem__()의 구현을 변경해서 슬라이싱 했을 때 Vector 객체를 반환하도록 해본다.

from array import array
import reprlib
import math
import numbers

class Vector:
    typecode = 'd'
    
    ..... 생략
    
    def __getitem__(self, index):
        cls = type(self)
        if isinstance(index, slice):
            return cls(self._components[index])
        elif isinstance(index, numbers.Integral):
            return self._components[index]
        else:
            msg = '{cls.__name__} indices must be integers'
            raise TypeError(msg.format(cls=cls))

slice 객체면 vector객체를 생성해서 반환.

동적 속성 접근

파이썬에서 my_obj.x 표현식이 주어지면, 파이썬은 아래와 같이 동작한다.

1. my_obj 객체에 x 속성이 있는지 검사.

2. 1번에서 없으면, 이 객체의 클래스(my_obj.__class__)에서 더 검색한다.

3. 상속 그래프를 따라 계속 올라간다.

4. 그래도 x 속성을 찾지 못하면 self와 속성명을 문자열로 전달해서 my_obj의 클래스에 정의된 __getattr__() 메서드를 호출.

Vector에서 x,y,z,t의 속성을 바로 접근할 수 있도록 구현해보자.

class Vector:
    typecode = 'd'
    shortcut_names = "xyzt"
    
    def __getattr__(self, name):
        cls = type(self)
        if len(name) == 1:
            pos = cls.shortcut_names.find(name)
            if 0 <= pos < len(self._components):
                return self._components[pos]
        
        msg = '{__name__!r} object has no attribute {!r}'
        raise AttributeError(msg.format(cls, name))


vector = Vector([1.2, 7.1, 4.3, 2.4])

# 동적 접근
print(vector.x, vector.y)

__getattr__() 만 구현하면 문제가 발생할 수 있다. 다음과 같은 vector.x = 10 코드를 실행시키면 vector 객체가 x라는 속성이 추가된다. 위에서 살펴본것 처럼 v.x 와 v.__getattr__(x)는 다르다.

vector = Vector([1.2, 7.1, 4.3, 2.4])

print(vector.x, vector.y)
1.2 7.1

# vector의 첫 번째 element를 수정하는 것이 아니라 vector객체에 새로운 x 속성을 추가하는 것이다.
vector.x = 10

print(vector)
(1.2, 7.1, 4.3, 2.4)

이 문제를 해결하기 위해서 __setattr__()을 구현해야 한다.

    def __setattr__(self, name, value):
        cls = type(self)
        if len(name) == 1:
            if name in cls.shortcut_names:
                error = "readonly atrribute {attr_name!r}"
            elif name.islower():
                error = "can't set attributes 'a' to 'z' in {cls_nam!r}"
            else:
                error = ''
            if error:
                msg = error.format(cls_name=cls.__name__, attr_name=name)
                raise AttributeError(msg)
        super().__setattr__(name, value)


# test

vector.x = 10

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [50], in <module>
      1 vector = Vector([1.2, 7.1, 4.3, 2.4])
      3 print(vector.x, vector.y)
----> 5 vector.x = 10
      7 print(vector)

Input In [49], in Vector.__setattr__(self, name, value)
     29     if error:
     30         msg = error.format(cls_name=cls.__name__, attr_name=name)
---> 31         raise AttributeError(msg)
     32 super().__setattr__(name, value)

AttributeError: readonly atrribute 'x'

해싱과 더 빠르게 동작하는 ==(eq) 만들기

해싱 과정에서 모든 원소들의 값을 반영하도록 할 때 reduce함수를 사용하면 여러 값들을 하나의 값으로 만들 수 있다.

또, 동치 연산을 할 때 앞에서 값이 다르면 바로 False를 return해서 효율적으로 동작할 수 있다.

import operator
import functools


class Vector:
    typecode = 'd'
    shortcut_names = "xyzt"
    
    def __hash__(self):
        hashes = (hash(x) for x in self._components)
        #  hashes = map(hash, self._components)로 고치면 map단계가 잘 드러남.
        return funtools.reduce(operator.xor, hashes, 0)
    
    # zip ()가장 짧은 피연산자에서 엄추므로, 먼저 피연산의 검사를 해야 정확하게 동작한다.
    def __eq__(self, other):
    	return len(self) == len(other) and all(a == b for a,b in zip(self, other))

정리

9장에서 파이썬스러운 객체를 만들었던 것과 같이 10장에서도 표준 파이썬 객체가 동작하는 방식을 살펴보고 구현했다. __getattr__(), __setattr__()을 구현해서 파이썬에서 객체의 속성을 동적으로 어떻게 접근하는지도 살펴봤다. 그리고 시퀀스 프로토콜을 지원하기 위해서 __len__(), __getitem__()을 구현하면서 slice객체에 대해서도 살펴봤다.

Reference

- Fluent Python Chapter 10

- https://github.com/fluentpython/example-code

GitHub - fluentpython/example-code: Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015)

Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015) - GitHub - fluentpython/example-code: Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015)

github.com

저작자표시 (새창열림)

'Computer Engineering > Fluent Python 정리' 카테고리의 다른 글

Fluent Python Chapter 12. 내장 자료형 상속과 다중 상속 (0)	2022.02.08
Fluent Python Chapter 11. 인터페이스: 프로토콜에서 ABC까지 (2)	2022.02.07
Fluent Python Chapter 9. 파이썬스러운 객체 (0)	2022.01.29
Fluent Python Chapter 8. 객체 참조, 가변성, 재활용 (1)	2022.01.27
Fluent Python Chapter 7. 함수 데커레이터와 클로저 (feat. 메타프로그래밍) (0)	2022.01.23

현재글Fluent Python Chapter 10. 시퀀스 해킹, 해시, 슬라이스

Jordan Tech Lab

Jordan's Tech Lab. (https://github.com/baidoosik)

airflow, Python, mysql, 데이터, 알고리즘, 프로그래밍, 장고, 코딩, 개발자, C#, 기본문법, django, 데이터엔지니어링, 남해, 서버, #코딩, 워케이션, 개발, 남해 워케이션, 파이썬,

Jordan Tech Lab