{ "cells": [ { "cell_type": "markdown", "metadata": { "nbsphinx": "hidden" }, "source": [ "[prev: Aperçu de l'écosystème](intro.ipynb) | [home](../index.ipynb) | [next: Scipy](scipy.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## La structure de base : le *array*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La contribution majeure de Numpy est de proposer une implémentation performante de tableaux uniformes multi-dimensionnels : le `array`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# on importe le package numpy.\n", "# il est très fréquent d'abréger son nom en 'np'\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 3.],\n", " [0., 1., 5.]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Le array est un conteneur qui peut être initialisé\n", "# avec une liste, une liste de listes, une liste de listes de listes, ...\n", "# le niveau d'imbrication décrit le nombre de dimensions du array.\n", "x = np.array([[1, 0.0, 3], [0, 1, 5]])\n", "x" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 'ndim' est le nombre de dimensions du array\n", "x.ndim" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 'shape' informe sur la taille de chaque dimension\n", "# Dans l'exemple, x contient 2 listes à 3 éléments.\n", "x.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Contrairement aux conteneurs 'classiques', tous les éléments d'un array dovient être du même type.\n", "# Dans l'exemple, des flottants.\n", "x.dtype" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(nan, inf)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Numpy dispose de types pour gérer des valeurs non-numériques spécifiques : \"Not A Number\", et \"Infinity\".\n", "np.NAN, np.Inf" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n" ] }, { "data": { "text/plain": [ "array([nan, 2.])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ces types peuvent cohabiter avec des valeurs numériques\n", "y = np.array([np.NaN, 2], dtype=float)\n", "print(y.dtype)\n", "y" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0],\n", " [0, 0, 0]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Des fonctions existent pour créer des array aux remplissages particuliers.\n", "# Un array de 0\n", "np.zeros((2, 3), dtype=int)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1., 1., 1., 1., 1.])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Un array de 1\n", "np.ones(5)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1, -1, 0, 0],\n", " [ 0, 0, 0, 0],\n", " [ 0, 0, 0, 0],\n", " [ 0, 0, 0, 0],\n", " [ 0, 0, 0, 0]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Un array avec un contenu non prédéfini, à remplir par la suite\n", "# (le contenu initial du array sera conditionné par ce qu'il y a en mémoire, mais n'épiloguons pas sur le sujet)\n", "np.empty((5,4), dtype=int)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Au delà des array, numpy dispose de plusieurs fonctions pratiques\n", "# L'équivalent du range() de Python, mais qui retourne un array\n", "np.arange(6)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 1.5, 3. , 4.5, 6. ])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Le pendant de np.arange, pour lequel on ne précise pas le pas mais le nombre de valeurs\n", "np.linspace(0,6,5)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Et bien plus encore..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexation\n", "\n", "L'accès aux éléments d'un `array` est plus souple que dans le cas des conteneurs de base." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int32\n" ] }, { "data": { "text/plain": [ "array([[ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 9, 10, 11, 12]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# On créer un array d'entiers\n", "x = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])\n", "print(x.dtype)\n", "x" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Le premier indice permet d'accéder aux lignes, ...\n", "x[0]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ... et le deuxième indice aux colonnes (etc pour les array de dimensions supérieures)\n", "x[0][2] # marche mais peut mieux faire" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ... et le deuxième indice aux colonnes (etc pour les array de dimensions supérieures)\n", "x[0, 2] # voilà, là c'est plus propre" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 9])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# On peut accéder aux colonnes en utilisant un slice sur le première indice.\n", "x[:, 0]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 9, 10, 11, 12]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Le contenu d'un array peut être modifié\n", "x[0, 0] = -1\n", "x" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 2, 3, 4],\n", " [ 0, 6, 7, 8],\n", " [ 0, 10, 11, 12]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Il est possible de remplacer plusieurs éléments par la même valeur d'un seul coup.\n", "x[:, 0] = 0\n", "x" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 3],\n", " [ 0, 7],\n", " [ 0, 11]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Toutes les fonctionnalités des slices sont disponibles: arr[start:stop:step]\n", "x[:, ::2]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.10193638, 0.31366494, 0.58893349, 0.62615365],\n", " [0.69863562, 0.22738459, 0.85689817, 0.20049676],\n", " [0.73153055, 0.7271929 , 0.74053103, 0.70424826],\n", " [0.07807063, 0.90004515, 0.83373539, 0.57301106],\n", " [0.99646386, 0.19844358, 0.83802383, 0.63492711]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Un accès *très* utile : l'indexation par tableau de booléens\n", "a = np.random.random((5,4))\n", "a" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ True, True, False, False],\n", " [False, True, False, True],\n", " [False, False, False, False],\n", " [ True, False, False, False],\n", " [False, True, False, False]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Admettons : on veut tronquer les valeurs inférieures à 0.5.\n", "# On commence par se créer un \"masque\"\n", "small = a < 0.5\n", "small" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0. , 0.58893349, 0.62615365],\n", " [0.69863562, 0. , 0.85689817, 0. ],\n", " [0.73153055, 0.7271929 , 0.74053103, 0.70424826],\n", " [0. , 0.90004515, 0.83373539, 0.57301106],\n", " [0.99646386, 0. , 0.83802383, 0.63492711]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# On accède au array par le \"masque\"...\n", "a[small] = 0\n", "# ... et le tour est joué!\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arithmétique\n", "\n", "Les opérations arithmétiques sur `array` suivent la convention de l'algèbre linéaire (et sont donc plus intuitive)." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Créons un array\n", "x = np.arange(5)\n", "x" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 2.5, 5. , 7.5, 10. ])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Les opérations entre un array et un nombre sont effectuées sur tous les éléments du array\n", "# Exemple de la multiplication :\n", "# (pour rappel l'opération float * list dans Python duplique la liste)\n", "2.5 * x" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Les opérations entre array de même taille s'effectuent élément par élément.\n", "y = np.array([10, 11, 12, 13, 14])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 12, 14, 16, 18])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Les opérations entre array de même taille s'effectuent élément par élément.\n", "x + y" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 11, 24, 39, 56])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Les opérations entre array de même taille s'effectuent élément par élément.\n", "x * y" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Numpy dispose de nombreuses fonctions mathématiques : trigo, log, exp, ...\n", "# Les fonctions de Numpy peuvent être appelées sur des array, auquel cas l'opération est appliquée sur tous les éléments.\n", "np.sqrt(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Toutes les fonctions disponibles : http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 10, 20, 30, 40],\n", " [ 0, 11, 22, 33, 44],\n", " [ 0, 12, 24, 36, 48],\n", " [ 0, 13, 26, 39, 52],\n", " [ 0, 14, 28, 42, 56]])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Une dernière remarque : numpy peut traiter les opérations arithmétiques entre array de dimensions différentes,\n", "# on parle de \"broadcasting\".\n", "# Exemple d'application au produit tensoriel :\n", "# On redimensionne x pour avoir un vecteur ligne.\n", "x = x.reshape((1,5))\n", "# On redimensionne y pour avoir un vecteur colonne.\n", "y = y.reshape((5,1))\n", "# Leur produit donne un array de dimensions (5,5).\n", "x*y" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Image\n", "Image(width=600, url='https://scipy-lectures.github.io/_images/numpy_broadcasting.png')\n", "# source: http://scipy-lectures.github.io" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Changer la forme\n", "\n", "Il est possible de changer la forme (*shape*) d'un array sans faire de copie (mais pas toujours)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(6)\n", "x" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2],\n", " [3, 4, 5]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# on peut voir le contenu de x sous la forme d'un array 2d\n", "y = x.reshape((2, 3))\n", "y" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1 1 2 3 4 5]\n", "[[-1 1 2]\n", " [ 3 4 5]]\n" ] } ], "source": [ "# l'information est partagée, pas copiée, on parle de différentes 'views' sur la même donnée.\n", "# modifier le contenu de x a un effet sur y\n", "x[0] = -1\n", "print(x)\n", "print(y)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1],\n", " [ 1],\n", " [ 2],\n", " [ 3],\n", " [ 4],\n", " [ 5]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# on peut aussi utiliser l'indexation pour ajouter des dimensions\n", "x[:, np.newaxis]" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3, 4],\n", " [1, 2, 3, 4, 5],\n", " [2, 3, 4, 5, 6]])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ce comportement se combine bien avec le broadcasting\n", "a = np.arange(3)\n", "b = np.arange(5)\n", "a[:, np.newaxis] + b[np.newaxis, :]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1, 1],\n", " [ 2, 3],\n", " [ 4, 5]])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# on peut modifier directement la forme d'un array\n", "x.shape = (3, 2)\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Opérations sur les arrays\n", "\n", "Quelques fonctions utiles parmi d'autres : np.where(), np.sum(), np.maximum(), np.minimum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.where() : \"mélanger\" deux arrays suivant une condition" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(10)\n", "x" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where(x<5, 0, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.sum() : sommer un array selon un axe" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 9, 10, 11, 12]])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])\n", "x" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([15, 18, 21, 24])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(x, axis=0)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 26, 42])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(x, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.maximum(a, b) : construit un array composé du maximum entre a et b (avec du broadcasting)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(10)\n", "x" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 4, 4, 4, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.maximum(x, 4)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 4, 4, 4, 4, 4])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.minimum(x, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercices" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }