Early experiences of noise-sensitivity performance analysis of a distributed deep learning framework

Rojas, Elvis; Knobloch, Michael; Daoud, Nour; Meneses, Esteban; Mohr, Bernd

Early experiences of noise-sensitivity performance analysis of a distributed deep learning framework

dc.contributor.author	Rojas, Elvis
dc.contributor.author	Knobloch, Michael
dc.contributor.author	Daoud, Nour
dc.contributor.author	Meneses, Esteban
dc.contributor.author	Mohr, Bernd
dc.date.accessioned	2023-10-21T22:00:35Z
dc.date.available	2023-10-21T22:00:35Z
dc.date.issued	2022-10-18
dc.description.abstract	Deep Learning (DL) applications are used to solve complex problems efficiently. These applications require complex neural network models composed of millions of parameters and huge amounts of data for proper training. This is only possible by parallelizing the necessary computations by so-called distributed deep learning (DDL) frameworks over many GPUs distributed over multiple nodes of a HPC cluster. These frameworks mostly utilize the compute power of the GPUs and use only a small portion of the available compute power of the CPUs in the nodes for I/O and inter-process communication, leaving many CPU cores idle and unused. The more powerful the base CPU in the cluster nodes, the more compute resources are wasted. In this paper, we investigate how much of this unutilized compute resources could be used for executing other applications without lowering the performance of the DDL frameworks. In our experiments, we executed a noise-generation application, which generates a very-high memory, network or I/O load, in parallel with DDL frameworks, and use HPC profiling and tracing techniques to determine whether and how the generated noise is affecting the performance of the DDL frameworks. Early results indicate that it might be possible to utilize the idle cores for jobs of other users without affecting the performance of the DDL applications in a negative way.	es_ES
dc.description.abstract	Las aplicaciones de aprendizaje profundo (Deep Learning, DL) se utilizan para resolver problemas complejos de forma eficiente. Estas aplicaciones requieren modelos de redes neuronales complejos compuestos por millones de parámetros y enormes cantidades de datos para su correcto entrenamiento. Esto solo es posible paralelizando los cálculos necesarios mediante los llamados marcos de aprendizaje profundo distribuido (DDL) en muchas GPU distribuidas en múltiples nodos de un clúster HPC. Estos marcos utilizan sobre todo la potencia de cálculo de las GPU y solo utilizan una pequeña parte de la potencia de cálculo disponible de las CPU en los nodos para E/S y comunicación entre procesos, dejando muchos núcleos de CPU ociosos y sin utilizar. Cuanto más potente sea la CPU base de los nodos del clúster, más recursos informáticos se desperdiciarán. En este artículo, investigamos cuántos de estos recursos informáticos no utilizados podrían emplearse para ejecutar otras aplicaciones sin reducir el rendimiento de los marcos DDL. En nuestros experimentos, ejecutamos una aplicación de generación de ruido, que genera una carga muy elevada de memoria, red o E/S, en paralelo con marcos DDL, y utilizamos técnicas de perfilado y rastreo HPC para determinar si el ruido generado afecta al rendimiento de los marcos DDL y de qué manera. Los primeros resultados indican que podría ser posible utilizar los núcleos inactivos para trabajos de otros usuarios sin afectar negativamente al rendimiento de las aplicaciones DDL.	es_ES
dc.description.procedence	Sede Regional Brunca, Campus Pérez Zeledón	es_ES
dc.description.sponsorship	Universidad Nacional, Costa Rica	es_ES
dc.description.sponsorship	Centro Nacional de Alta Tecnología, Costa Rica
dc.description.sponsorship	Instituto Tecnológico de Costa Rica
dc.description.sponsorship	Jülich Supercomputing Centre (JSC)
dc.identifier.doi	10.1109/CLUSTER51413.2022.00066
dc.identifier.uri	http://hdl.handle.net/11056/26730
dc.language.iso	eng	es_ES
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	es_ES
dc.rights	Acceso embargado	es_ES
dc.source	2022 IEEE International Conference on Cluster Computing (CLUSTER)	es_ES
dc.subject	APRENDIZAJE PROFUNDO DISTRIBUIDO	es_ES
dc.subject	PROCESAMIENTO ELECTRÓNICO DE DATOS	es_ES
dc.subject	RENDIMIENTO	es_ES
dc.subject	ANÁLISIS DE DATOS	es_ES
dc.subject	APLICACIONES DEL COMPUTADOR	es_ES
dc.subject	RUIDO	es_ES
dc.subject	DISTRIBUTED DEEP LEARNING	es_ES
dc.subject	ELECTRONIC DATA PROCESSING	es_ES
dc.subject	PERFORMANCE	es_ES
dc.subject	DATA ANALYSIS	es_ES
dc.subject	COMPUTER APPLICATIONS	es_ES
dc.subject	NOISE ENVIRONMENTS	es_ES
dc.title	Early experiences of noise-sensitivity performance analysis of a distributed deep learning framework	es_ES
dc.type	http://purl.org/coar/resource_type/c_8544	es_ES

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Early Experiences of Noise Sensitivity Performance.pdf
Tamaño:: 853.31 KB
Formato:: Adobe Portable Document Format
Descripción:

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 919 B
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Ponencias