Deep learning Retele neuronale convolutive (Convolutional neural networks) Ruxandra Stoean rstoean@inf.ucv.ro http://inf.ucv.ro/~rstoean
Definitii For most flavors of the old generations of learning algorithms performance will plateau. deep learning is the first class of algorithms that is scalable. performance just keeps getting better as you feed them more data - Andrew Ng ZDNet The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning. Ian Goodfellow et al, Deep Learning, MIT Press, 2016, http://www.deeplearningbook.org/ Deep learning [is] a pipeline of modules all of which are trainable. deep because [has] multiple stages in the process of recognizing an object and all of those stages are part of the training - Yann LeCun At which problem depth does Shallow Learning end, and Deep Learning begin? Discussions with DL experts have not yet yielded a conclusive response to this question. [ ], let me just define for the purposes of this overview: problems of depth > 10 require Very Deep Learning. - Jurgen Schmidhuber https://machinelearningmastery.com/what-is-deep-learning/
Retele neuronale convolutive Convolutional Neural Networks (CNN) Extragere automata a trasaturilor: de la cele low-level la cele high-level Aplicatii importante in computer vision Clasificare Exista o cladire in aceasta imagine Segmentare semantica Acestia sunt pixelii cladirii Detectarea de obiecte Exista cladiri in aceasta imagine Segmentarea instantelor Exista cladiri in aceasta imagine si acestia sunt pixelii fiecareia
Arhitectura Straturi (layers) Invatarea trasaturilor Convolutie ReLU Pooling Clasificare Fully connected https://medium.com/@raghavprabhu/understanding-of-convolutional-neural-networkcnn-deep-learning-99760835f148
Convolutia Input Volum de forma Latime x Inaltime x Adancime Kernel (sau filtru) o multime partajata de ponderi (weights) Trecere inainte (Forward pass) convolutie intre filtru si volumul de intrare Patru hiperparametri: Marime kernel (Kernel size) Adancime kernel sau numar de filter (Kernel depth) Pas (Stride) Umplere cu zero (Zero-padding) http://cs231n.github.io/convolutional-networks/ https://www.youtube.com/watch?v=aqirpkraydg
ReLU & Pooling Rectified Linear Unit un strat de transfer pentru a adauga nonlinearitate (Max) Pooling micsoreaza volumul Marimea ferestrei Pas http://cs231n.github.io/convolutional-networks/ https://www.youtube.com/watch?v=aqirpkraydg
Practica Alegerea unei arhitecturi potrivite Dependenta parametrilor de problema Timp de rulare mare Putere computationala mare necesara Baze de date mici in realitate Overfitting Interpretare dificila a modelelor Nu sunt plug & play!
Parametrizarea Kernelele convolutive Marime, adancime, pasi Straturile de pooling Marimea ferestrelor, pasi Rata de dropout pentru straturile ce controleaza overfitting Marime lotului pentru batch normalization inspre cresterea vitezei de invatare Rata de invatare Numarul de unitati in straturile fully connected Numarul of epoci Ponderile initiale Topologia Optimizatorii Functiile de activare Functiile de pierdere Parametrizare Manuala Automata (printr-o euristica?)
Overfitting Cand modelul e prea complex pentru date Lucrul cu probleme reale Esantioane mici Mijloace de combatere: Augmentarea datelor Reorientare, rotire, scalare, crop, translatie, zgomot Gaussian Retele adversariale generative - Generative adversarial networks (GAN): o retea genereaza, alta evalueaza Straturi de dropout Regularizare Penalizarea pentru ponderi L1 si L2 Oprire timpurie Utilizare de puncte de control pentru a salva modelul la fiecare epoca Alegerea celui mai bun candidat din rezultatele pe validare dupa ultima epoca
Invatare prin transfer(transfer learning) CNN au nevoie de Big data Resurse computationale mari Se iau parametric de la o retea deja antrenata pe o multime mare de date Multimi mari de date: ImageNet, CIFAR Modele pre-antrenate: VGG, Inception, AlexNet, ResNet Straturile initiale invata trasaturi generale Se antreneaza straturile finale pentru problema curenta Se invata trasaturile specific problemei date Asadar Se rezolva problema datelor Sunt mai putin parametri de antrenat
Cadre pentru implementare Deep Learning https://towardsdatascience.com/deep-learning-framework-powerscores-2018-23607ddf297a
Instalare Keras in R Keras este un API high-level de retele neuronale care este scris in Python si foloseste TensorFlow ca backend. Pentru a lucra cu Keras sub R trebuie instalate pe calculator ultimele versiuni ale: R Rstudio Anaconda (pentru instalarea Python) Apoi se instaleaza pachetul keras din RStudio (se vor instala Keras si Tensorflow)
Problema recunoasterii cifrelor MNIST Vom aplica un CNN pentru recunoasterea cifrelor MNIST 1. library("keras") # incarcare BD MNIST mnist <- dataset_mnist() c(x_train, y_train) %<-% mnist$train c(x_test, y_test) %<-% mnist$test rbind(dim(x_train), dim(x_test)) 1 http://yann.lecun.com/exdb/mnist/
Transformare date de intrare in format CNN # transformare in formatul pentru CNN (Lungime x Latime x Numar de canale de culoare) x_train_original <- x_train x_test_original <- x_test x_train <- array_reshape(x_train, c(nrow(x_train), 28, 28, 1)) x_test <- array_reshape(x_test, c(nrow(x_test), 28, 28, 1)) input_shape <- c(28, 28, 1)
Transformare clase in codificare binara # transformare clase codificate prin cifrele 0-9 in codificare one-hot binara y_train_original <- y_train y_test_original <- y_test y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10)
Construirea arhitecturii CNN #un strat convolutiv cu 8 filtre de 3 x 3, urmat de ReLU, un al doilea strat convolutiv cu 16 filtre #de marime 5 x 5, urmat de RELU, un strat Max Pooling de marime 2 x 2 si stride 2, un strat de #dropout de 0.25, un strat fully connected cu 100 de unitati si activare ReLU, inca un dropout #de 0.5 si ultimul strat fully connected legat la cele10 clase de iesire cu activare softmax model <- keras_model_sequential() %>% layer_conv_2d(filters = 8, kernel_size = c(3,3), activation = 'relu', padding="same", input_shape = input_shape) %>% layer_conv_2d(filters = 16, kernel_size = c(5,5), activation = 'relu', padding="same") %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_dropout(rate = 0.25) %>% layer_flatten() %>% layer_dense(units = 100, activation = 'relu') %>% layer_dropout(rate = 0.5) %>% layer_dense(units = 10, activation = 'softmax') summary(model)
Compilare model CNN # compilare model cu functia de pierdere, optimizatorul si metrica folosite model %>% compile( loss = 'categorical_crossentropy', # pentru clasificare multi-class optimizer = 'adam', # optimizatorul metrics = c('accuracy') # acuratetea ca performanta a modelului )
Antrenare model # antrenare model history <- model %>% fit( x_train, y_train, epochs = 10, # numarul de epoci (10 treceri complete ale multimii de antrenament) batch_size = 25, # marimea lotului validation_split = 0.25 # se imparte multimea de date in 75% antrenament si 25% validare )
Vizualizare antrenament si evaluare pe test # vizualizare pierdere si acuratete de-a lungul antrenarii plot(history) # evaluarea modelului pe test model %>% evaluate(x_test, y_test) # predictiile modelului de CNN y_test_predicted <- model %>% predict_classes(x_test)
Istoria antrenamentului
Vizualizare test # vizualizare date de test cu predictiile gasite #par(mfcol=c(6,6)) par(mar=c(0, 0, 3, 0), xaxs='i', yaxs='i') for (idx in 1:36) { im <- x_test_original[idx,,] im <- t(apply(im, 2, rev)) if (y_test_predicted[idx] == y_test_original[idx]) { color <- '#008800' } else { color <- '#bb0000' } image(1:28, 1:28, im, col=gray((0:255)/255), xaxt='n', main=paste0(y_test_predicted[idx], " (", y_test_original[idx], ")"), col.main=color) }
Acuratete: 98.43% Predictie pe test. Vizualizare