Check out the Parallel Universe e-publication

Code source libre

Optimisation

Intel® Fortran Composer XE

Vectorisation

Intel® Composer XE

Intel® Parallel Composer

Intel® Inspector XE

Intel® VTune™ Amplifier

Intel® Parallel Studio XE Professional Edition

Intel® Parallel Inspector

Outil Intel® Platform Modeling with Machine Learning

Zone des thèmes:

IDZone

Inclure dans RSS:

↧

Diseño de aplicaciones para la tecnología Intel® RealSense™

March 20, 2015, 6:19 am

Latest and popular articles on Intel Technologies

≫ Next: The Internet of Things—Analytics: Using the Intel® IoT Analytics Website for Data Mining

≪ Previous: Check out the Parallel Universe e-publication

Por Ryan Clark, Chronosapien Interactive

Download PDF[PDF 571KB]

Introducción

Cuando diseñamos para tecnologías de medios emergentes (por ejemplo, control de gestos), nuestra meta como desarrolladores de aplicaciones es que la experiencia sea entretenida para el usuario, además de resultarle intuitiva y familiar, y que le despierte entusiasmo. El diseño de navegación debe ser lo suficientemente intuitivo como para que, cuando los usuarios inicien una aplicación por primera vez, comiencen de inmediato a explorar sus funcionalidades. En nuestros experimentos más recientes con la tecnología Intel® RealSense™, nos esforzamos por crear una aplicación en la cual el usuario pudiera meterse de lleno, e intercalar a la vez suficientes funcionalidades interesantes de gestos como para mantenerlo cautivado. En lugar de pensar en la tecnología Intel RealSense como reemplazo de la entrada estándar de información, nos concentramos en las ventajas de los gestos naturales y en las funcionalidades exclusivas que ofrece el Kit de Desarrollo de Software (SDK) de Intel RealSense. Nuestra primera aplicación, Space Between, se centra en el seguimiento de manos y rostro, mientras que nuestra segunda aplicación incorpora más exclusividades del SDK, incluidas la detección de emociones y la segmentación de usuarios. En nuestro trabajo, aprendimos varias lecciones que a los desarrolladores podrían resultarles útiles, como diseñar gestos para facilitar el uso, hacer corresponder gestos con la mecánica de juego diseñada, desarrollar interfaces que se vuelvan familiares para el usuario y crear menús fáciles de usar y entender.

Diseño de información de entrada para la tecnología Intel® RealSense™

Figura 1:Space Between, desarrollada para usar la tecnología Intel® RealSense™.

Cuando creamos nuestra primera aplicación con Intel RealSense, comenzamos el proceso de diseño con la plataforma en mente. En lugar de determinar cómo portaríamos un estilo de juego al control de gestos, pensamos en las interacciones singulares disponibles mediante el control de gestos y qué experiencias podríamos concebir en función de ellos. Como nuestro desarrollo comenzó con el SDK de Intel® Perceptual Computing (el predecesor de la tecnología Intel RealSense), dirigimos nuestra atención a la posición bidimensional de las manos y su apertura como principales interacciones del usuario, las cuales forman la base de nuestra mecánica de juego. Con solo estas dos interacciones simples, queríamos dar a los usuarios una amplia variedad de interacciones posibles de juego. La mayoría de los cambios en las interacciones provenían simplemente de modificar la orientación de la mano del usuario, lo cual daba una sensación diferente a los gestos, aunque los valores medidos eran los mismos.

La principal aplicación que desarrollamos con la tecnología Intel RealSense es Space Between. Se trata de un juego desarrollado en Unity* en el que el jugador controla diferentes criaturas para explorar las profundidades del océano [Fig. 1]. Está dividido en múltiples minijuegos (que siguen un orden) y cada uno se centra en una criatura y una modalidad de entrada de información diferente. Cada gesto se usa de una manera que imita el movimiento de la criatura correspondiente y dirige a los personajes en forma directa. Por lo general se usan “uno a uno”: la mano se orienta de modo tal que se alinea con la criatura y tiene un efecto inmediato en el movimiento del personaje, por eso los controles son fáciles de entender y aprender.

Al diseñar estos minijuegos, sabíamos que necesitábamos comenzar con la entrada de gestos en mente. A partir de allí, iteramos en cada uno hasta que se adecuara. Después de usar el seguimiento de manos, el seguimiento de rostro y el reconocimiento de voz del SDK de Intel Perceptual Computing, llegamos a la conclusión que el módulo de seguimiento de mano era lo que más nos entusiasmaba. Al hacer la transición al SDK de Intel RealSense, nos encontramos con que los mejores módulos eran los relacionados con el seguimiento de manos, aunque la fortaleza del SDK radica en la cantidad de módulos que ofrece. Los minijuegos comenzaron todos con el seguimiento de manos como control principal, y se usaba el seguimiento de cabeza para atenuar los problemas con los gestos prolongados (hablaremos de esto más adelante).

Figura 2:Movimiento de ola en la etapa The Sunlight Zone.

En nuestro primer minijuego, The Sunlight Zone, el jugador controla una tortuga de mar en la vista de perfil. El diseño del juego comenzó con la idea de usar un gesto que imitara sacar la mano por la ventanilla de un automóvil; es decir, mover con fluidez la mano hacia arriba y hacia abajo con un movimiento similar al de una ola [Fig. 2]. La tortuga imita el movimiento de la mano del jugador y gana velocidad con cada ola que se completa. Originalmente, la única información de entrada era la posición de la mano del usuario sobre el eje y en la ventana de visualización como meta del personaje que controlaba el jugador. Después de la etapa de elaboración de prototipos, pudimos lograr un gesto más preciso con el uso del ángulo de la mano. Con este método, podíamos hacer que la tortuga reaccionara según el ángulo de la mano del usuario, lo cual daba la sensación de que la interacción era más sensible. Para obtener el ángulo de la mano a partir de la orientación de la palma del módulo de seguimiento de manos, seleccionamos un eje [Fig. 3].

Figura 3:Ejemplo de código de selección de un eje a partir de los datos de la mano.

Este era un gesto fácil de enseñar a nuevos jugadores, pero después de las pruebas observamos que resultaba extenuante después de menos de un minuto. Por esta consecuencia física, aprendimos acerca de la “resistencia consumida” [Fig. 4], que es la medida de cuánto se cansan los brazos cuando están levantados mientras realizan gestos. El problema era que el codo debía levantarse de manera perpendicular al cuerpo, de modo que aquel no podía sostener el resto del brazo. Este resulta ser uno de los gestos más agotadores.

Figura 4:Fórmula de resistencia consumida (fuente: Consumed Endurance (CE) – Measuring Arm Fatigue during Mid-Air Interactions, de http://blog.jhincapie.com/projects/consumed-endurance-ce-measuring-arm-fatigue-during-mid-air-interactions/).

Nos seguía gustando el movimiento de ola para controlar el personaje, pero para jugar con comodidad (y durante períodos prolongados), los usuarios tenían que poder apoyar los codos. Agregamos al juego una velocidad crucero, en la cual el personaje no avanza tan rápido y se lo puede controlar exclusivamente con el ángulo de la mano. Esto permite seguir jugando sin sentirse penalizados ni con la obligación de hacer el gesto de movimiento de ola durante mucho tiempo seguido.

Incluso después de agregar el ángulo de la mano para reducir el cansancio, los jugadores seguían necesitando tiempo para recuperarse antes del siguiente minijuego, The Midnight Zone. La manera de darles un respiro fue agregar un minijuego que no usaba gestos de mano. Para controlar el personaje en The Twilight Zone, el jugador simplemente se inclina en cualquiera de las cuatro direcciones, imitando el movimiento del personaje (una ballena). En cuanto al código, el seguimiento de estos movimientos de inclinación proviene del seguimiento de la posición central de la cabeza, tanto la profundidad como la posición en x en la ventana de visualización.

Diseño de interfaces para la tecnología Intel® RealSense™

No nos llevó mucho tiempo darnos cuenta de que diseñar aplicaciones basadas en gestos no era un proceso directo. Para nuestra versión de demostración de Space Between, necesitábamos incluir controles simples para la selección de los minijuegos. El caso de uso para estos controles era el de botones simples: solo necesitábamos una manera de seleccionar una opción y aceptarla. Nuestra primera interfaz basada en gestos reemplazó al control por mouse [Fig. 5]. La posición de la mano se utilizó para seleccionar, mientras que el gesto de pulsar (y más adelante, pulgares arriba) se usó para aceptar, con control por voz como respaldo. Si bien se trataba de una solución rápida (y temporal), vimos que usar la mano para seleccionar de un menú con este método era difícil y tedioso. A menudo, al hacer un gesto el cursor cambiaba de posición, por lo que se necesitaban botones con áreas de selección grandes. Nuestra iteración de esto fue dividir la ventana de visualización en tres y, para seleccionar, solo usar la posición de la mano sobre el eje x.

Figura 5:Nuestro menú inicial para seleccionar minijuegos de la demo de Space Between.

Nuestra siguiente iteración fue agregar un gesto de deslizamiento horizontal a la derecha o la izquierda [Fig. 6] para cambiar de juegos mediante el giro de un menú circular. Con un gesto de pulsar (o pulgares arriba) se seleccionaba el menú activo. Fue una mejora desde el aspecto visual (realmente alentaba la interacción del usuario) y redujo los falsos positivos y las selecciones accidentales. Descubrimos que al diseñar interfaces para control por gestos, es importante hacer hincapié en la sensibilidad mediante respuestas visuales y sonoras. Esto ayudó a compensar la pérdida de respuesta táctil.

Figura 6:La versión siguiente del menú de selección de minijuegos de la demo de Space Between.

Cuando diseñamos interfaces intuitivas, con frecuencia aplicamos ideas de interfaces móviles más que de las usadas para entornos de PC. Deslizar horizontalmente y tocar son gestos simples ya familiares para los usuarios, por eso continuamos analizando maneras de convertirlos al medio de gestos. Algo que se debe tener en cuenta cuando se usa el SDK para Intel RealSense es que los deslizamientos horizontales hacen referencia a algo específico: el movimiento de cada mano en una dirección explícita [Fig. 7]. Pero los gestos de saludo no tienen una dirección definida. Si se quiere que una mano haga el movimiento de saludo en ambos sentidos, hay que seguir la posición de la mano y determinar su velocidad. La ventaja de hacer esto es que como la mano del usuario comienza por reconocerse como en movimiento de deslizamiento horizontal, el tiempo y la velocidad del deslizamiento se pueden determinar con exactitud. Esto posibilita agregar impulso a las selecciones, de manera similar a lo que los usuarios están acostumbrados a hacer en los dispositivos móviles.

Figura 7:De la documentación de Intel: gesto de deslizar horizontalmente y gesto de saludar.

Aunque estas soluciones funcionan bien para navegar menús, nos dimos cuenta de que a veces los menús se vuelven directamente innecesarios en nuestra aplicación. Cuando diseñamos nuestro juego, a menudo tomamos como referencia Journey. Para aquellos que no lo conocen, se trata de un espléndido juego de aventura artístico creado por esa empresa de juegos que recurre al minimalismo para que se destaquen los elementos del juego. La pantalla de inicio tiene de fondo un desierto y las palabras “Start: New Journey” (en castellano: “Iniciar: nuevo viaje”). Se usa la mínima cantidad posible de menús y los controles se enseñan mediante animaciones transparentes [Fig. 8]. Al diseñar la pantalla de inicio de Space Between, decidimos omitir por completo la selección de etapas y concentrarnos en ofrecer como primera interacción del usuario una experiencia de juego. Cuando se reconoce la mano del usuario, los movimientos comienzan a agitar el aire frente a ellos, formando ráfagas de viento. Mientras el usuario juega con esa escena sencilla que tiene enfrente, las ráfagas balancean el bote y se inicia la experiencia del juego. En lugar de obligar al jugador a seleccionar una etapa específica, se juega cada uno de los minijuegos uno detrás del otro.

Figura 8: Captura de pantalla del juego Journey que muestra el uso minimalista de la interfaz de usuario para las instrucciones.

Cuando se diseñan menús (o mecánicas de juego) que requieren gestos, es importante agregar representaciones gráficas. Parece algo obvio, pero permite al usuario interactuar rápido sin tener que aprender cada una de las opciones. Es especialmente importante cuando no siempre es posible usar un gesto intuitivo para seleccionar opciones de menú. Al enseñar al jugador cómo realizar los gestos necesarios para la mecánica de nuestro juego, mantuvimos las representaciones gráficas como hojas de sprites simples animadas [Fig. 9]. A partir de ellas, el jugador puede determinar la orientación de la mano (o la cabeza), qué mano usar (o, en algunos casos, ambas) y el movimiento necesario para hacerlos. Como nuestro juego comienza sin repercusiones, hacer que el usuario aprenda qué acciones manejan los gestos no fue un problema. Optamos por un enfoque explorador del juego, puesto de relieve por el peligro cada vez mayor de las etapas. Debido a que el jugador aprende los gestos en los primeros minijuegos, usamos los mismos íconos en los posteriores para que las interacciones continúen resultando familiares.

Figura 9:Instrucción de hoja de sprites para realizar un movimiento de ola en Space Between.

Como los usuarios no están familiarizados con la mayoría de las interacciones, la comunicación de respuestas a cada acción es importante. El reconocimiento de gestos no es perfecto, pero eso el usuario necesita saber cuando un movimiento no se ha reconocido. En nuestra versión de demostración de Space Between, estas respuestas eran evidentes para el usuario: se mostraban en la parte superior de la pantalla en todo momento [Fig. 10]. A medida que se reconocían las manos, la cabeza y algunos gestos, los íconos correspondientes se iban desvaneciendo. En la versión completa de la aplicación, nos decidimos por un enfoque más integrado. Cuando el usuario ya no proporciona información de entrada, las criaturas vuelven a un estado predeterminado. Como ejemplo, en Sunlight Zone, cuando no se reconoce la mano del usuario, la tortuga de mar que este controla gira hacia atrás para nadar recto y cambia su estado de animación. Diseñamos todos los personajes de manera que cuando los estuviera controlando el jugador resplandecieran con un color específico. Para los juegos que usan cursores, logramos hacer que se desvanecieran o se hicieran sólidos, y complementamos esto con indicaciones sonoras cuando se recibe o pierde la información de entrada.

Figura 10:Respuesta visual de detección de manos y cabeza en la versión demo de Space Between.

Al integrar menús complejos, hallamos que no siempre es necesario integrar gestos como control principal. Si el uso de la aplicación lo permite, recurrir al mouse y el teclado para los elementos más tediosos (controles deslizantes e ingreso de datos) es mucho menos frustrante para el usuario. Es cierto que los gestos funcionan bien para botones y para alternar estados, pero usar datos de posición que requieren de varios ejes puede ser difícil de controlar para el usuario. Una manera de remediarlo es implementar un modo de entrada de datos que use un eje de movimiento cuando se realice un gesto de agarre (apertura o pellizco), pero esto no resuelve el problema de base. Aunque la tecnología de gestos está mejorando mucho, la mayoría de los usuarios todavía no la han utilizado. Si no es posible tener modalidades comunes para el ingreso de datos principales, la mejor solución es hacer los menús grandes. Tener una modalidad común de ingreso de datos como plan b no es una mala opción.

Al tomar decisiones sobre gestos para controlar menús que no siempre se pueden mostrar, seleccionar el gesto es sumamente importante. Pero, como mencionamos antes, muchas de estas acciones todavía no tienen movimientos o gestos asociados en la base de conocimientos del usuario. Como estudio de caso, uno de los ejemplos más notables es un menú de pausa (u opciones). Mostrar un menú de pausa es importante en la mayoría de los juegos y debería ser uno de los gestos que más rápido pudiese realizar para el usuario y reconocer la aplicación. Pero esto trae muchos problemas de diseño. Los gestos de otros medios conocidos (aplicaciones de mouse y teclado, tabletas, dispositivos móviles) no tienen nada en común. En los teclados se usa la tecla “Escape”, mientras que en los smartphones tiende a deslizarse el dedo desde el borde izquierdo de la pantalla (pero incluso esto no es algo que valga siempre). Por lo general, en esta acción interviene el extremo superior izquierdo, pero aun así muchos usuarios lo relacionan con el botón “Cerrar” de las aplicaciones de escritorio y buscan el extremo superior derecho. Usar esquinas específicas de la pantalla o gestos de deslizar no da buen resultado, por que se pierde seguimiento y por la activación accidental, respectivamente. En aplicaciones Intel RealSense, Intel recomienda usar el signo “v” [Fig. 11] para que aparezca un menú principal. La justificación es que se trata de un gesto fácil de reconocer y con poca probabilidad de realizarlo por accidente. Si bien no es intuitivo ni resulta familiar a los usuarios, la respuesta parece ser apostar a que con el tiempo comience a hacerse la relación. Además de implementar este gesto para el menú de pausa, agregamos múltiples sistemas redundantes. Si se pierde el seguimiento (las manos del usuario salen de los límites de la cámara) durante una cantidad de tiempo específica, aparece un menú (y métodos familiares con mouse y teclado).

Figura 11:El signo “v” de la documentación de Intel sobre RealSense, sugerido para llamar menús.

Múltiples módulos con la tecnología Intel® RealSense™

Cuando se implementan varios módulos del SDK de Intel RealSense, hay que considerar algo más que la facilidad de uso y la familiaridad: el rendimiento también pasa a tener importancia. Al trabajar con varios módulos, es importante hacer pausas y esperar a inicializar los módulos. Para Space Between, intercambiamos de módulo activo durante los cambios de escena, para que el usuario no note una reducción en la velocidad de fotogramas ni pérdida de seguimiento. Antes de cargar la escena, revisamos si hay diferencias en cuanto a los módulos requeridos, y si las hay, ejecutamos la inicialización. Intercambiar módulos activos con el SDK es sencillo: se inicializan los nuevos módulos y se llama a la función SenseManager del SDK. En nuestra aplicación, pausamos módulos cuando ya los usamos (p. ej., reconocimiento facial) o cuando el usuario no tiene control sobre la aplicación (p. ej., desactivar el seguimiento de rostro cuando se muestra un menú).

Al trabajar con los módulos del SDK, en especial los que usan fuentes de cámara, hay que lograr un equilibrio entre la velocidad de fotogramas y la regularidad de los datos. Si se utiliza AcquireFrame para recopilar nuevos datos, apagar la espera de todos los módulos y ajustar el tiempo de espera máximo sirve para reducir las fluctuaciones generales y aumentar la velocidad de fotogramas, al costo de perder algunos datos si el tiempo de espera cae demasiado. Las computadoras lentas necesitan que se les dé más tiempo para procesar los datos de los fotogramas, mientras que las computadoras rápidas no necesitan tanto tiempo. En Unity, esto se puede simplificar de modo tal que las configuraciones de juego más rápidas (menor complejidad gráfica) deriven en la asignación de más tiempo para procesar datos, y lo opuesto para las configuraciones gráficas más complejas. Una herramienta para este fin es QualitySettings, que viene integrada en Unity [Fig. 12].

Figura 12:Ejemplo de código que muestra RealSense ejecutándose en el subproceso de Unity con el tiempo de espera que depende de los ajustes de calidad.

Conclusión

La tecnología de gestos todavía es muy nueva. Por esa razón, diseñar aplicaciones basadas en gestos requiere de mayor iteración de lo normal, aunque usar una aplicación con reconocimiento de gestos bien diseñada bien vale la pena. Hay que tener siempre presente los conocimientos del usuario y tomar elementos de aplicaciones y medios con los cuales aquel esté familiarizado. El uso de menús debe reducirse al mínimo. Y sobre todo, no hay que temer probar cosas nuevas, aunque después se termine cambiándolas.

Futuras mejoras y cambios para Space Between

Hemos aprendido mucho del desarrollo de la aplicación de demostración y la versión completa de Space Between, y lo emplearemos para continuar mejorándola. Si bien se invirtió mucho trabajo en lograr que la mecánica de juego fuese lo más intuitiva y fácil posible, todavía se pueden hacer cosas para mejorarla más. Por ejemplo, la demo tenía respuestas visuales en la interfaz de usuario cuando se detectaban las manos y la cabeza del usuario. Con la idea de que la interfaz de usuario tuviera un diseño aún más minimalista, las dejamos de lado, pero nunca llegamos a incluir su reemplazo: respuestas visuales integradas en los personajes y el entorno mismo. Nuestra idea era que en lugar de tener una interfaz gráfica de usuario fija en la parte superior de la pantalla, que se viera todo el tiempo, haríamos que se iluminaran partes de los personajes para indicar que el usuario ahora las controlaba. Esto resuelve el problema de informar al usuario que el sistema ha reconocido su información de entrada, sin abarrotar el juego y manteniendo el entorno como centro de la atención.

Además de las funcionalidades relacionadas con Intel RealSense, hay otras que no quedaron en la versión actual de Space Between. Cuando diseñamos la versión completa del juego, investigamos mucho sobre la vida marina, en especial en las grandes profundidades. Algo que nos cautivó fue el mundo de la bioluminiscencia y cuánto dependen de ella las criaturas de los océanos. Teníamos muchas ganas de incorporar esto al juego porque sentíamos que era necesario para contar la historia de los océanos, pero también porque era fabuloso. En la versión actual del juego, pueden verse algunos de nuestros intentos de integrar la bioluminiscencia al entorno: los puntos que se acumulan son representaciones vagas de ella, las anémonas de mar la liberan en The Midnight Zone, hay criaturas que la liberan al morir en The Trenches. Sin embargo, esto no llega a ser como la versión completa de bioluminiscencia que teníamos en el juego y no le hace justicia a su belleza en la naturaleza.

Acerca del autor

Ryan Clark es uno de los fundadores de Chronosapien Interactive, una empresa de Orlando. Chronosapien Interactive desarrolla software dirigido a medios interactivos y se especializa en tecnologías emergentes. Actualmente está trabajando en una demostración para The Risen, se segunda aplicación en usar la tecnología Intel RealSense. Se puede seguir a la empresa en chronosapien.reddit.com, o comunicarse con ella a theoracle@chronosapien.com.

Space Between

Chronosapien Interactive

Expérience utilisateur

Unity

SDK Intel® RealSense™

Zone des thèmes:

RealSense

↧

The Internet of Things—Analytics: Using the Intel® IoT Analytics Website for Data Mining

March 20, 2015, 4:18 pm

Latest and popular articles on Intel Technologies

≫ Next: CGCC Healthy Kids in Motion Hackathon

≪ Previous: Diseño de aplicaciones para la tecnología Intel® RealSense™

to be added

Zone des thèmes:

IDZone

Internet des objets

↧

CGCC Healthy Kids in Motion Hackathon

March 21, 2015, 2:17 pm

Latest and popular articles on Intel Technologies

≫ Next: Check out the Parallel Universe e-publication

≪ Previous: The Internet of Things—Analytics: Using the Intel® IoT Analytics Website for Data Mining

In one of our best hackathons yet, 55 students from Chandler-Gilbert Community College met on campus to create games teaching impressionable grade-school kids healthy lifestyle choices and information regarding fitness and wellness. Those students broke into 10 teams and created the proof-of-concept demos below from scratch over the course of this fast-paced 24-hour gamejam. Tutorials and breaks aside, these students had a mere 16 hours of development time.

Volunteers from Intel, faculty from CGCC, hackathon veteran student mentors, and subject matter experts guided these creative students – most having little to no previous experience with JavaScript – through the self-driven learning process, with many of them getting their games playable on mobile devices!

Huge thanks to all the volunteers who made this event a success:

Intel Volunteers
- David Baker – organizer, ideation leader
- Erica McEachern – project manager
- Ashish Datta – lead mentor, demo leader
- Shafiul Islam – room-specific mentor
- Gigi Marsden – logistics (and her son as a mentor)
- Sowmya Ravichandran – floating mentor
- Suresh Golwalkar – floating mentor
- Robert Alvarez – floating mentor
- Ed Langlois – floating mentor
CGCC Volunteers
- Patricia Baker – facilitator, gracious hostess
- Cindy Barnes Pharr – facilitator, facility queen
- Margie Gomez – fresh-perspective blogger
- Richard Woodward-Roth – room-specific mentor
- Mark Underwood – floating mentor
- Colton Riffel – student mentor
Other Volunteers
- Chris Moody – lead mentor, co-blogger
- Andrew Datta – feedback wrangler, co-blogger
- Fabian Hinojosa – photographer
- Sunny Liu – room-specific mentor
Subject Matter Experts
- Robin Sprouse – nutrition expert
- Amy Widmeyer – nutrition expert
- Dr. Greg Trone – fitness expert

Some pictures from the event are posted to the Facebook page.

The gallery below with playable versions of the demos is available as a zip file (download, unzip to a folder, run the cgccHKHgames.html file in the top directory – preferably in Google Chrome).

CGCC Healthy Kids Hackathon

Nov 21-22 2014

These games were created by students from Chandler Gilbert Community College for the purpose of helping children learn healthy habits in nutrition and fitness

"Froot"

This app is meant to teach children the importance of good foods and portion control. It's similar to Fruit Ninja where you slash the good items and disregard the bad items.

"Health RPG"

An RPG centered around healthy eating and living for kids 6-11 years old.

"RUN"

Shopping for healthy food give more energy

"PAC2"

Our idea was to have PacMan eating good foods for a buff, junk foods result in slower movement speed

"Unstoppable Weight Loss Tactics"

A simple fitness tracker aimed at a young audience, "Unstoppable" features a leveling system on the fitness tracker and progress saving.

"Food Facts"

Drag food onto character for points and information

"Little Chef's"

Game on picking healthy food on your plate for kids of 4 - 8 years old.The game displays healthy and unhealthy food choices. The player gets to pick the food choices. Sad and happy faces appear based on their choice.

"World of Storecraft"

Aid youth in benefits and consequences of their dietary choices

"Gone Bananas"

Our app is created to help children ages 6-11 to have fun learning about a and living a healthy lifestyle including the right diet and physical activity

"Food Ninja"

Catching healthy foods for points.

Brad Hill

Engineering Director of Student/Indie Hackathons

Intel – SSG-DRD Core Client Scale Engineering

richard.b.hill@intel.com

Image de l’icône:

Zone des thèmes:

HTML5

Code produit

Inclure dans RSS:

The Parallel Universe Archive

↧

Check out the Parallel Universe e-publication

March 18, 2015, 2:52 pm

Latest and popular articles on Intel Technologies

≫ Next: Using Intel® HAXM for Developing Android* Wear and TV Apps

≪ Previous: CGCC Healthy Kids in Motion Hackathon

The Parallel Universe is a quarterly publication devoted to exploring inroads and innovations in the field of software development, from high performance computing to threading hybrid applications.

Sign-up for future issues

Image de l’icône:

Data Center

Code source libre

Optimisation

Intel® Fortran Composer XE

Vectorisation

Intel® Composer XE

Intel® Parallel Composer

Intel® Inspector XE

Intel® VTune™ Amplifier

Intel® Parallel Studio XE Professional Edition

Intel® Parallel Inspector

Outil Intel® Platform Modeling with Machine Learning

Zone des thèmes:

IDZone

Inclure dans RSS:

https://play.google.com/store/apps/details?id=com.google.android.wearable.app

↧

Using Intel® HAXM for Developing Android* Wear and TV Apps

March 23, 2015, 11:39 am

Latest and popular articles on Intel Technologies

≫ Next: Intel(R) System Studio Developer Story : With XDB and Minnow board how to debug exception errors in the Android-Linux-Kernel.

≪ Previous: Check out the Parallel Universe e-publication

Android* has come a long way initially starting with phones, then tablets, Google TV*, Android Wear*, Android TV* (replaces Google TV), and Android Auto*. It can be challenging for developers to build and test their apps to run on all these device types. Add to this different device form factors and display resolutions, and it can quickly become a complex app verification and testing problem. We have Intel® HAXM to the rescue.

Intel® Hardware Accelerated Execution Manager (HAXM) is a hardware-assisted Android emulator with low-overhead, excellent performance, and low latency. You can learn more about it here: https://software.intel.com/en-us/android/articles/intel-hardware-accelerated-execution-manager

With Intel HAXM, developers can have multiple Android emulator instances running on their development system without having to worry too much about performance, load or latency issues. This can be very helpful in the iterative process of app development and testing, resulting in huge developer productivity.

Non-x86 Android emulator images can have slow start-up time and sluggish UI responsiveness. Unlike some third-party Android emulators, with Intel HAXM you can use all the latest Android API versions and platforms as soon as they are released.

For detailed instructions on using Intel HAXM please see https://software.intel.com/en-us/android/articles/speeding-up-the-android-emulator-on-intel-architecture

In this blog post we will look at how developers can take advantage of the Intel HAXM emulator when developing a universal Android app that targets different Android platforms like Android Wear and TV, and device variations.

Using the Universal Android Sample App

Google recently released a sample universal app to show how developers can target multiple form factors with the same code base. Please see the following link to learn more: https://github.com/googlesamples/android-UniversalMusicPlayer

This sample app showcases some of the best practices for targeting multiple form factors with the same code base. Follow the instructions in the above link to build the app. We will be using it to load x86 HAXM Emulator instances for TV, Wear and Phone in this article.

The project can be directly imported into Android Studio* and developers can take advantage of the integrated emulator features. If you prefer to use other IDEs, the following can be helpful.

If you are comfortable with cmd line, just invoke the gradle build script from sample source directory.

gradlew assembleDebug

The apk will be available at “mobile/build/outputs/apk/mobile-debug.apk” folder.

Create the AVDs for Android TV and Wear

We need to ensure we downloaded the latest Android SDK emulator images for TV and Wear, along with the standard Android image for phone/tablet.

Open the android SDK manager. You can invoke from cmd line (<Android-SDK>/tools folder should be in your path):

> android

Next, we need to create the emulator configurations (AVDs) to use the above images.

Open the Android Virtual Device manager. You can invoke from cmd line

> android avd

Android Wear Emulation

Create an Android Wear AVD configuration as shown.

Click “OK”, and start the Wear emulator by clicking “Start…” in the AVD Manager Window. A screenshot of the Wear emulator after first boot, is shown below.

Android Wear requires a companion app to be installed on your phone. This app is only available on the Google Play store, so it requires us to have a device with access to the store.

We need the device configured with adb debugging, and both Wear emulator and the device show up in adb devices list:

Finally, we need to forward tcp ports with,

> adb -d forward tcp:5601 tcp:5601

The Wear emulator should now be able to connect to your device. The below screenshots show the Wear emulator in connected and disconnected states.

Detailed instructions for creating Wear apps can be found at: http://developer.android.com/training/wearables/apps/creating.html

Like any android apk, we can manually install our sample app apk on the Wear emulator using the adb,

> adb -s emulator-5554 install -r mobile\build\outputs\apk\mobile-debug.apk

Verify that it is in fact installed and available on the Wear emulator using,

> adb -s emulator-5554 shell pm list packages | grep example

The package name for the sample app, com.example.android.uamp is listed.

We can even manually run the sample app on the Wear emulator using,

> adb -s emulator-5554 shell monkey -p com.example.android.uamp -c android.intent.category.LAUNCHER 1

We now have the sample app running on the Wear emulator device.

Android TV Emulation

Create an Android TV emulator configuration (AVD) as shown.

Click “OK”, and start the TV emulator by clicking “Start…” in the AVD Manager Window.

We can verify if the emulator is accessible from adb using

> adb devices

Note down the emulator id (eg: emulator-55xx), which you can use as the target for adb commands. Install the apk using

> adb -s emulator-55xx install -r mobile\build\outputs\apk\mobile-debug.apk

Finally, start the app on the Android TV emulator instance using,

> adb -s emulator-55xx shell monkey -p com.example.android.uamp -c android.intent.category.LAUNCHER 1

The sample app running on the Android TV emulator instance:

Developers can create and start as many emulator configurations/instances as needed.

Intel HAXM can be configured with appropriate memory size at installation time.

The below screenshot shows the Wear, TV and phone AVD configurations.

Here is the universal sample app running on all 3 (TV, Phone, and Wear) along with their CPU utilizations (notice the low CPU overhead):

Developers can tweak memory allocation for further optimization. We have barely scratched the surface of emulator features in this article. Please refer to http://developer.android.com/tools/help/emulator.html for all the available config options.

References

To ensure HAXM is properly installed, and if it’s being used, please see
- https://software.intel.com/en-us/blogs/2014/10/20/how-to-ensure-you-are-using-intel-haxm-for-android-emulator

*Other names and brands may be claimed as the property of others

#android #haxm #androidwear #androidtv

Image de l’icône:

Débogage

Outils de développement

Développement de jeu

Processeurs Intel® Atom™

Zone des thèmes:

IDZone

Inclure dans RSS:

Intel® 64 and IA-32 Architectures Software Developer’s Manual

↧

Intel(R) System Studio Developer Story : With XDB and Minnow board how to debug exception errors in the Android-Linux-Kernel.

March 23, 2015, 1:15 am

Latest and popular articles on Intel Technologies

≫ Next: Avoid frequency drop in GPU cores when executing applications in Heterogeneous mode

≪ Previous: Using Intel® HAXM for Developing Android* Wear and TV Apps

Intel(R) System Studio Developer Story : With XDB and Minnow board, how to debug exception errors in the Android-Linux-Kernel.

In this article, we can see how to debug and check the exception error in Android Linux Kernel in Intel ® X86 system with XDB JTAG debugger which is a part of tool Intel System Studio ® Tool Suite. In doing so, we are supposed to see what is the JTAG and XDB and some information of the exception handling of Intel ® X86 architecture as well.

1. JTAG overview

JTAG stands for Joint Test Action Group and is pronounced to jay-tag but, which is normally meaning IEEE std 1149.1-1990 IEEE Stadard Test Access Port and Boundary-Scan Architecture. This standard is to do debug and test SoC (System On Chip) and Microprocessor Software.

The configuration of a JTAG debugging is consist of three parts ; Debugger Software in a host machine, JTAG adapter and On chip debug(OCD) in SoC.

1.1 Debugger SW

It is getting addresses and data from JTAG adapter and showing it to user and user can send data and address to JTAG adapter via USB as vice versa. By using this tool, user can run control and do source line debug with loading symbol of the image which is downloaded to target system such as run, stop, step into, step over, set break point. And an accessing memory is possible as well. So user can easily do debugging the SW of target system and inspect system memory and registers. XDB is a host side debugger SW in Intel system Studio.

1.2 JTAG Adapter (Probe)

JTAG adapter is the HW box which converts JTAG signals to PC connectivity signals such as USB, parallel, RS-232, Ethernet. USB is most popular one and many of adapter is using the USB as a connection to host PC. While target side interface has many variation nevertheless there is minimal standard JTAG pin numbers, e.g. ARM 10-pin, ST 14-pin, OCDS 16-pin, ARM 20-pin. For XDB and Minnow Max configurations which is used in this article has 60-pin connection with a target. ITP-XDP3 (a.k.a. Intel Blue Box) is used for JTAG adapter of Minnow debugging. XDB is also compatible with some other JTAG debugger such as Macraigor® Systems usb2Demon® , OpenOCD.

1.3 On Chip Debug (Target SoC)

The main component of OCD is TAP (Test Access Point) and TDI(Test Data In) / TDO(Test Data Out). By using TAP we can reset or read/write register and bypass and with TDI/TDO we can do Boundary Scan (Click for more details and picture).

< Figure 1-1> Configuration of JTAG probe and target system - Lure is the small pin adapter for ITP-XDP3 and Minnow Board.

2. Overview of Exception in Intel Architecture

An exception is a synchronous event that is generated when the processor detects one or more predefined
conditions while executing an instruction. The IA-32 architecture specifies three classes of exceptions: faults,
traps, and aborts. Normally faults and traps are recoverable while abort does not allow a restart of the program. When there is exception, it is processed as same way as interrupt handling. Which means that after halting and save current process then system switches to the exception handler and comes back again once an exception handling is done.

< Table 2-1 > Protected-Mode Exceptions and Interrupts

3. Prepare the Minnow board and ITP-XDP3 with a host PC connection via USB

You need to set up Minnow board with Android OS. For this, please see the "Intel(R) System Studio Developer Story : How to configure, build and profile the Linux Kernel of Android by using the VTune" article (Please click the link). It has the introduction of Minnow board and how to set up / build / download Android OS in Minnow boards.

Connect Minnow board with the lure (which is small PCB with 60 pin JTAG connector) to ITP-XDP3 JTAG probe and ITP-XDP3 to a host PC via USB which has already been installed Intel System Studio first for USB driver of ITP-XDP3. You can check the device manager of your Windows host if the USB driver of XDP3 is installed correctly. And finally, run the XDB.

<Figure 3-1> Connections of Minnow target board, ITP-XDP3 JTAG probe and XDB on the host PC.

4. Using a XDB for exceptions of Android Kernel on the IA (Minnow board).

We see the step by step procedure of using XDB to check and debug the exception in a Kernel.

(1) Run XDB : Go to the Installed directory and run the batch file. (e.g. start_xdb_legacy_products.bat).

(2) Connect to the target : Go to the XDB menu - File - Connect and select ITP-XDP3 and Z3680, Z37xx.

(3) Load the symbol files and set the directory of source files. Go to the XDB menu - File - Load / Unload Symbol and set the symbol files. Per source files, go to the XDB menu - Options - Source Directories and set the rule and directories. Rule is to adjust files directory between current source path and path in the symbol file which recorded in compile time.

(4) Browse to the entry file which has exception handler : XDB menu - View - Source files and open the entry_64.S file.

(5) Set break point in the exception entry point : Go and find the ENTRY(error_entry) which is entry point of exception with an error code in rax register. And each exception handler is defined as zeroentry or errorentry macros, so you can set break point in the error_entry or some specific handler. In this article, we are using the "zeroentry invalid_op do_invalid_op" for testing.

ENTRY(error_entry)
	XCPT_FRAME
	CFI_ADJUST_CFA_OFFSET 15*8
	/* oldrax contains error code */
	cld
	movq_cfi rdi, RDI+8
	movq_cfi rsi, RSI+8
	movq_cfi rdx, RDX+8
	movq_cfi rcx, RCX+8
	movq_cfi rax, RAX+8
	movq_cfi  r8,  R8+8
	movq_cfi  r9,  R9+8
	movq_cfi r10, R10+8
	movq_cfi r11, R11+8
	movq_cfi rbx, RBX+8
	movq_cfi rbp, RBP+8
	movq_cfi r12, R12+8
	movq_cfi r13, R13+8
	movq_cfi r14, R14+8
	movq_cfi r15, R15+8
	xorl %ebx,%ebx
	testl $3,CS+8(%rsp)
	je error_kernelspace
error_swapgs:
	SWAPGS
error_sti:
	TRACE_IRQS_OFF
	ret

zeroentry divide_error do_divide_error
zeroentry overflow do_overflow
zeroentry bounds do_bounds
zeroentry invalid_op do_invalid_op
zeroentry device_not_available do_device_not_available
paranoiderrorentry double_fault do_double_fault
zeroentry coprocessor_segment_overrun do_coprocessor_segment_overrun
errorentry invalid_TSS do_invalid_TSS
errorentry segment_not_present do_segment_not_present
zeroentry spurious_interrupt_bug do_spurious_interrupt_bug
zeroentry coprocessor_error do_coprocessor_error
errorentry alignment_check do_alignment_check
zeroentry simd_coprocessor_error do_simd_coprocessor_error

(6) Examples : make an exception and check if the handler got it when we set break point : Set break point to the "zeroentry invalid_op do_invalid_op" and call the BUG() which makes the "Invalid Opcode" fault by ud2 instruction.

#define BUG()							\
do {								\
	asm volatile("ud2");					\
	unreachable();						\
} while (0)

< Call the BUG() >

< Stop at the Invalid_op of break point >

5. Conclusion

Some exceptions are critical error of system HW and SW, so it is important what / why / where these kind of exceptions occur. By using XDB, you can easily check it and can do more investigation of these issues. Because XDB provide power features like easily accessing the assembly code and source code and checking the call stack and registers.

6. References

jtag 101 ieee 1149.x and software debug

embedded system debugging, joint test action group

embedded system debugging

Développeurs

Développeurs Intel AppUp®

Débogueur Intel® JTAG

Intel® System Debugger

Produits intégrés

Intel® System Studio

Débogage

Outils de développement

Processeurs Intel® Atom™

Processeurs Intel® Core™

Processeurs Intel® Pentium®

Zone des thèmes:

IDZone

Android*

↧

Avoid frequency drop in GPU cores when executing applications in Heterogeneous mode

March 23, 2015, 12:15 pm

Latest and popular articles on Intel Technologies

≫ Next: Ejemplo de código de Intel® RealSense™: "Sketch"

≪ Previous: Intel(R) System Studio Developer Story : With XDB and Minnow board how to debug exception errors in the Android-Linux-Kernel.

Introduction

Intel(R) C++ Compiler 15.0 provides a feature which enables offloading general purpose compute kernels to processor graphics. This feature enables the processor graphics silicon area for general purpose computing. The key idea is to utilize the compute power of both CPU cores and GPU execution units in tandem for better utilization of available compute power.

Target OS requirements:

Windows* 32 and 64 bit support. Compute offload feature on Microsoft Windows 7* will only work with an active display (no locked screen). This restriction is imposed by DirectX 9 but relaxed in DirectX 11 (on Microsoft Windows 8* and Microsoft Windows 8.1*).

Linux 64 bit:

Ubuntu 12.04 (Linux kernel numbers: 3.2.0-41 for 3rd generation Intel® Core™ Processors and 3.8.0-23 for 4th generation Intel® Core™ Processors)
SUSE Linux Enterprise Server 11 SP3 (Linux kernel numbers: 3.0.76-11 for both 3rd and 4th generation Intel® Core™ Processors)

Heterogeneous mode

When executing an application in heterogeneous mode (CPU+GPU cores in action), the processor is running in full throttle. Every processor has an operating TDP limit and the power sharing algorithm implemented in hardware will take the necessary action to keep the processor within the TDP limits. Modern processors having both Intel(R) Turbo Boost Technology and Intel(R) HD Graphics Dynamic Frequency Technology. Intel(R) Turbo Boost Technology is used for increasing the frequency of the CPU cores when needed. Intel(R) HD Graphics Dynamic Frequency Technology is something similar to Turbo boost for GPU cores. When both CPU and GPU are in action simultaneously, that's when processor is hitting TDP limits relatively quicker. In these cases, for the default power settings on the system ("Maximize Performance" or "balanced" for CPU, Turbo Boost turned ON and "Balanced" for GPU), the power sharing algorithm gives preference to CPU core's frequency. Section 2.3 in http://www.intel.com/Assets/PDF/whitepaper/323324.pdf describes how to avoid drop in GPU core frequency. In short, do the following:

1. Turn off Intel(R) Turbo Boost Technology
2. Switch the power option of Graphics processor from "Balanced" to "Maximize Performance"

This user control helps in giving the priority for GPU cores for workloads which performs better on GPU.

This article applies to:
    Products: Intel® System Studio
    Host OS/Platform: Windows (IA-32 or Intel® 64); Linux (Intel® 64)
    Target OS/platform: Windows (IA-32 or Intel® 64); Ubuntu 12.04 (Intel® 64)

Intel system studio 2015

Linux*

Microsoft Windows* (XP, Vista, 7)

Descargar el ejemplo de código Sketch

Zone des thèmes:

IDZone

↧

Ejemplo de código de Intel® RealSense™: "Sketch"

March 25, 2015, 6:45 am

Latest and popular articles on Intel Technologies

≫ Next: Ejemplo de código de Intel® RealSense™: "Blockhead"

≪ Previous: Avoid frequency drop in GPU cores when executing applications in Heterogeneous mode

Sinopsis

En este ejemplo de código se utiliza el SDK de Intel® RealSense™ para Windows con el fin de crear una sencilla aplicación de dibujo virtual llamada Sketch. Esta aplicación de escritorio de Windows, desarrollada en C#/WPF, demuestra varias de las capacidades de seguimiento de manos y reconocimiento de gestos del SDK de Intel RealSense:

Obtención de coordenadas (globales) x-y-z de articulaciones de manos
Captura selectiva de datos de gestos
Adquisición de información de estado de alertas (es decir, detección de manos, calibración y estado de límites)

(Nota: para que la funcionalidad de esta aplicación de ejemplo sea total, se debe contar con una cámara 3D frontal).

MIRE aquí el resumen en video de Sketch.

Introducción a Sketch

Sketch es una aplicación de dibujo simple que permite al usuario simular que dibuja en un lienzo por medio de gestos y movimientos de las manos. En la Figura 1 se muestra la interfaz de usuario de Sketch (desarrollada en WPF/XAML).

Figura 1. Interfaz de usuario de Sketch

Hay habilitados tres gestos (que se representan en la pantalla con la acción que los acompaña) para interactuar con el lienzo virtual:

Pellizco (“Dibujar”): hace que el cursor se vuelva sólido y dibuje una línea sobre el lienzo. La posición del cursor en el lienzo es controlada por las coordenadas x e y de la punta del dedo mayor del usuario. El grosor de la línea lo controla el eje z de la punta del dedo mayor del usuario (si se la aleja de la cámara, la línea se vuelve más delgada, como si se redujera la presión sobre un lápiz o un pincel).

Separar los dedos (“Desplazarse”): desactiva el lápiz y hace que el cursor se convierta en un círculo vacío. Permite al lápiz ir a otras partes del lienzo sin dibujar ninguna línea. También permite elegir colores de la paleta situada a la derecha, simplemente con desplazarse sobre ellos.

Saludo (“Borrar”): limpia el lienzo de dibujo y lo deja listo para volver a dibujar.

Detalles

La aplicación Sketch simula el acto de dibujar sobre un lienzo cuando el usuario hace el gesto "two_fingers_pinch_open". Se eligió este gesto porque se aproxima a la postura que tendría la mano si sostuviera un lápiz o un pincel. Es el gesto que se muestra en la Figura 2.

Figura 2.Gesto para dibujar

Para determinar la posición del lápiz y el grosor del trazo, se hace un seguimiento de la punta del dedo mayor del usuario, lo cual podría parecer antiintuitivo si consideramos que el gesto para dibujar es el pellizco. La razón por la cual se sigue el dedo mayor es para evitar posibles oclusiones cuando se presiona el pulgar contra el índice. Cuando se sigue el dedo mayor en lugar del índice o el pulgar, se obtiene un mejor rendimiento.

La aplicación Sketch también demuestra cómo adquirir y mostrar información de estado de alerta de manos (en este caso, detección de manos, calibración y estado de límites). Este tipo de comunicación de respuesta ayuda a los usuarios, de una forma o de otra, a posicionar las manos correctamente frente a la cámara. Aunque la presentación de esta información es simplista en esta aplicación de ejemplo, alentamos a los desarrolladores a proporcionar indicaciones similares para mejorar la experiencia general del usuario.

A descargarla

Para experimentar con esta aplicación y conocer más sobre su funcionamiento, descárguela de aquí.

Acerca de la tecnología Intel® RealSense™

Para dar los primeros pasos y conocer más del SDK de Intel RealSense para Windows, vaya a https://software.intel.com/en-us/realsense/intel-realsense-sdk-for-windows.

Acerca del autor

Bryan Brown es ingeniero de aplicaciones de software de la División de Relaciones con los Desarrolladores de Intel. Su experiencia profesional es variada: en ingeniería de software, electrónica y de diseño de sistemas. Sus intereses técnicos se centran en aplicaciones de interacción natural y tecnologías de interfaz cerebro-computadora; participa activamente en varios programas de desarrollo alfa relacionados con diversas tecnologías emergentes en estas disciplinas.

Intel® RealSense™ Technology

Intel® RealSense™

Intel RealSense. RealSense SDK

Développeurs

Microsoft Windows* 8

SDK Intel® RealSense™

Descargar el ejemplo de código Blockhead

Zone des thèmes:

IDZone

↧

Ejemplo de código de Intel® RealSense™: "Blockhead"

March 25, 2015, 6:59 am

Latest and popular articles on Intel Technologies

≫ Next: Interview with James Reinders: future of Intel MIC architecture, parallel programming, education

≪ Previous: Ejemplo de código de Intel® RealSense™: "Sketch"

Sinopsis

Este ejemplo de código muestra cómo se usa el SDK de Intel® RealSense™ para Windows*, en una aplicación de escritorio C#/WPF. Se trata de una aplicación simple, llamada BlockHead, en la que se utilizan tres características interesantes del SDK para Intel RealSense:

Captura y muestra la transmisión en color de la cámara RGB.
Recupera datos de ubicación del rostro y postura aproximada de la cabeza.
Recupera y evalúa datos de expresión facial.

(Nota: para que la funcionalidad de esta aplicación de ejemplo sea total, se debe contar con una cámara 3D frontal).

MIRE aquí el resumen en video de BlockHead.

Introducción a Blockhead

Como se muestra en la Figura 1, la aplicación muestra la transmisión en color en un control de imagen WPF y superpone en tiempo real una imagen animada sobre el rostro real del usuario.

Superimposed cartoon image
Figura 1.Imagen animada superpuesta sobre el rostro del usuario

La imagen animada se manipula programáticamente en tiempo real a partir de datos adquiridos del SDK.

Modifica el tamaño para que coincida con el del rostro del usuario (se achica y agranda cuando el usuario se aleja de la cámara o se acerca) sobre la base de información del rectángulo de la cara.
Gira a la izquierda y a la derecha como respuesta a la orientación de la cabeza del usuario (alabeo).
Intercambia el contenido de control de la imagen a partir de la adquisición y la calificación de datos de expresión (ver Figura 2).

Expressions Detected in Real Time
Figura 2.Expresiones de sonrisa, lengua afuera, beso y boca abierta detectadas en tiempo real

Detalles

Para esta sencilla aplicación de ejemplo, los gráficos se crearon en un programa de dibujo y se capturaron como archivos de gráficos de red portátiles (.png). Estas imágenes se podrían reemplazar fácilmente con transparencias representadas artísticamente, o incluso con capturas de pantalla de amigos, caricaturas, etc., con el fin de lograr un efecto visual más atractivo.

Se aplican diferentes “transforms” (p. ej., ScaleTransform, RotateTransform) al objeto de imagen para posicionarlo en respuesta a entradas de seguimiento de cabeza del SDK de Intel RealSense. Estas entradas pueden ser ubicación del rostro, cálculo de postura y datos de reconocimiento de expresiones.

El SDK es capaz de capturar alrededor de 20 expresiones distintas que luego se pueden evaluar en una aplicación. Esta en particular se centra en las expresiones de la boca: EXPRESSION_KISS, EXPRESSION_MOUTH_OPEN, EXPRESSION_SMILE y EXPRESSION_TONGUE_OUT. Sin embargo, podría extenderse con facilidad al uso de información de expresiones de las cejas, los ojos y la cabeza.

A descargarla

Para aprender más sobre esta aplicación, estudiar el código y extenderlo a casos más interesantes que aprovechen el SDK de Intel RealSense, descárguela desde aquí.

Para dar los primeros pasos y conocer más del SDK de Intel RealSense para Windows, vaya a https://software.intel.com/en-us/realsense/intel-realsense-sdk-for-windows.

Acerca del autor

Intel® RealSense™ Technology

SDK Intel® RealSense™

Zone des thèmes:

RealSense

↧

Interview with James Reinders: future of Intel MIC architecture, parallel programming, education

March 25, 2015, 10:04 am

Latest and popular articles on Intel Technologies

≫ Next: Meshcentral - Mesh Agent v193 + Java API's and samples

≪ Previous: Ejemplo de código de Intel® RealSense™: "Blockhead"

During the conversation between James Reinders, the Director and Chief Evangelist at Intel Corporation, and Vadim Karpusenko, Principal HPC Research Engineer at Colfax International, recorded on January 30, 2015 at Colfax International in Sunnyvale, CA, we discussed the future of parallel programming and Intel MIC architecture products: Intel Xeon Phi coprocessors, Knights Landing (KNL), and forthcoming 3rd generation - Knights Hill (KNH). We also talked about how students can learn parallel programming and optimization of high performance applications.

Check out this YouTube video

Image de l’icône:

Zone des thèmes:

IDZone

Inclure dans RSS:

Microsoft Windows* (XP, Vista, 7)

↧

Meshcentral - Mesh Agent v193 + Java API's and samples

March 27, 2015, 12:32 pm

Latest and popular articles on Intel Technologies

≫ Next: Abaqus/Standard Performance Case Study on Intel® Xeon® E5-2600 v3 Product Family

≪ Previous: Interview with James Reinders: future of Intel MIC architecture, parallel programming, education

We regularly update the Mesh Agent with many new features, but this week Bryan Roe had quite an impact with a complete suite of new features that are being rolled out in Mesh Agent v193. For people using Meshcentral.com or for many running their own server, the agent update is automatic. The new agent provides added browser compatibility, features and security. It’s all the more impressive when you know that the agent is being released on so many platforms: Windows XP, Windows IT, Linux, OSX… In addition, Bryan Roe also opened up a completely new avenue for developers with new Mesh Agent API Java library. So, let’s break it all down. First, the what is new in Mesh Agent v193:

Latest WebRTC Microstack. The latest mesh agent has a significant upgrade of its WebRTC stack. The stack had not changed in almost a year and now, thanks for Bryan Roe’s work. The stack has better performance due to larger window size, better packet drop recovery, round trip time calculation and much more. The new stack can also both receive and initiate WebRTC and has TURN support, but the mesh agent does not use these two features yet. The new stack does allow the agent much more flexibility in what we can do and support moving forward.
Microstack Websocket support. The Mesh Agent tiny web server now has web socket support. In the past we used WebRTC for traffic between the browser and the mesh agent local web site (HTTPS port 16990). This works well, but work only on WebRTC compatible browsers. Now, we moved the local site to use web and get IE and Safari browser compatibility.
Microstack HTTP digest support. The tiny web server added HTTP digest support and we changed the local web site to use this system for authentication. This technique of authentication is a bit more secure since the browser, not the web application, gets to handle the password.
OpenSSL 1.0.2 branch. In this version of the agent, we switched to the latest OpenSSL branch. We are now using the latest OpenSSL 1.0.2a and will continue to follow the 1.0.2 branch moving forward. This also makes the agent support DTLS 1.2 which is used for WebRTC. The agent’s uses of the latest OpenSSL fixes for the latest vulnerabilities identified last week.

The new mesh agent is pretty amazing and it’s being releasing a many platforms all at once. It’s not all, Bryan Roe also released a new Java library for interacting with the Mesh Agent to do peer-to-peer messaging and application data storage. The new library comes with two samples applications, one with a GUI and one text-only. It’s all part of the latest Mesh Agent API package, available on info.meshcentral.com. This latest package is specifically targeted at IoT usages, where you can now have peer-to-peer discovery and messaging fully and automatically enabled. This continues the tradition of making Meshcentral an outstanding solution for embedded and IoT usages.

Questions and feedback appreciated,
Ylian Saint-Hilaire
info.meshcentral.com
Meshcentral on Twitter

The latest Mesh Agent v193 has HTTP digest authentication and websocket support. So the local
web site on HTTPS port 16990 is more secure and compatible with more browsers.

The Mesh Agent v193 has many improvements over the previous versions. It’s all the more impressive
when you know it runs on so many platforms: Windows XP, Windows IT, Linux, OSX, Android…

The all new Mesh Agent API Java Library allows developers to quickly build Java applications that make
use of the Mesh Agent’s peer-to-peer capability and application data storage system.

Image de l’icône:

Actualités

Outils de développement

Microsoft Windows* 8

Zone des thèmes:

IDZone

Inclure dans RSS:

Avancé

[2] http://www.3ds.com/products-services/simulia/

↧

Abaqus/Standard Performance Case Study on Intel® Xeon® E5-2600 v3 Product Family

March 27, 2015, 1:55 pm

Latest and popular articles on Intel Technologies

≫ Next: Easy SIMD through Wrappers

≪ Previous: Meshcentral - Mesh Agent v193 + Java API's and samples

Background

The whole point of simulation is to model the behavior of a design and potential changes against various conditions to determine whether we are getting an expected response; and simulation in software is far cheaper than building hardware and performing a physical simulation and modifying the hardware model each time.

Dassault Systèmes ^[1] through its SIMULIA^* brand, is creating a new paradigm to establish Finite Element Analysis and mulitphysics simulation software as an integral business process in the engineering value chain. More information about SIMULIA can be found here^[2].

The Abaqus^* Unified Finite Elements Analysis product suite, from Dassault Systèmes^* SIMULIA, offers powerful and complete solutions for both routine and sophisticated engineering problems covering a vast spectrum of industrial applications in Automotive, Aerospace, Consumer Packaged Goods, Energy, High Tech, Industrial Equipment and Life Sciences. As an example, automotive industry engineering work groups are able to consider full vehicle loads, dynamic vibration, multibody systems, impact/crash, nonlinear static, thermal coupling, and acoustic-structural coupling using a common model data structure and integrated solver technology.

What is Finite Element Analysis (FEA)?

FEA is a computerized method of simulating the behavior of engineering structures and components under a variety of conditions. It is the application of the Finite Element method (FEM)^{[3] [8]}. It works by breaking down an object into a large number of finite elements and each element is represented by an equation. By integrating all the element’s equations, the whole object can be mathematical modeled.

How Abaqus/Standard take advantage of Intel® AVX2

Abaqus/Standard is general purpose FEA. It includes many analysis capabilities. According to Dassault Systèmes web site, it “employs solution technology ideal for static and low-speed dynamic events where highly accurate stress solutions are critically important. Examples include sealing pressure in a gasket joint, steady-state rolling of a tire, or crack propagation in a composite airplane fuselage. Within a single simulation, it is possible to analyze a model both in the time and frequency domain. For example, one may start by performing a nonlinear engine cover mounting analysis including sophisticated gasket mechanics. Following the mounting analysis, the pre-stressed natural frequencies of the cover can be extracted, or the frequency domain mechanical and acoustic response of the pre-stressed cover to engine induced vibrations can be examined.” More information about Abaqus/Standard can be found at [9].

According to Dassault Systèmes web site, Abaqus/Standard uses Hilber-Hughes-Taylor time^[12] integration by default. The time integration is implicit, meaning that the operator matrix must be inverted and a set of simultaneous nonlinear dynamic equilibrium equations must be solved at each time increment. This solution is done iteratively using Newton’s [^13] method. This solution utilizes a function called DGEMM^[5] (Double-Precision General Matrix Multiplication) in the Intel® Math Kernel Libraries (Intel® MKL^[4]) to handle matrix multiplication involving double-precision values.

Analysis of Abaqus workloads using performance monitoring tools, such as Intel® VTune™, showed a significant number of them spend 40% to 50% of their runtime time in DGEMM. Further analysis of the DGEMM function showed that it makes extensively used of the multiply-add operation since DGEMM is, basically, matrix multiplication.

One of the new feature of the Intel® Xeon® E5-2600 v3 Product Family is the support of a new extension set called Intel AVX2 ^[7]. One of the new instructions in Intel AVX2 is the three-operand fused multiply-add (FMA3 ^[6]). By implementing the combined multiply-addition operation in the hardware, the speed of this operation is considerably improved.

Abaqus/Standard uses Intel® MKL’s DGEMM implementation. It should also be noted that in Intel MKL version 11 update 5, and later versions, DGEMM was optimized to use Intel AVX2 extensions, thus allowing DGEMM to run optimally on the Intel® Xeon® E5-2600 v3 Product Family.

Performance test procedure

To prove the performance improvement brought forth by using a newer DGEMM implementation that takes advantage of Intel AVX2, we performed tests on two platforms. One system was equipped with Intel Xeon E5-2697 v3 and the other with Intel Xeon E5-2697 v2. The duration of the tests were measured in seconds.

Performance test Benchmarks

The following four benchmarks from Abaqus/Standard were used: s2a, s3a, s3b and s4b.

Figure 1. S2a is a nonlinear static analysis of a flywheel with centrifugal loading.

Figure 2. S3 extracts the natural frequencies and mode shapes of a turbine impeller.

S3 has three versions.

S3a is a 360,000 degrees of freedom (DOF) using Lanczos Eigensolver ^[11] version.

S3b is a 1,100,000 degrees of freedom (DOF) using Lanczos Eigensolver version.

Figure 3. S4 is a benchmark that simulates the bolting of a cylinder head onto an engine block.

S4b is S4 version with 5,000,000 degrees of freedom (DOF) using direct solver version.

Note that these pictures are properties of Dassault Systèmes*. They are reprinted with the permission from Dassault Systèmes.

Test configurations

System equipped with Intel Xeon E5-2697 v3

System: Pre-production
Processors: Xeon E5-2697 v3 @2.6GHz
Memory: 128GB DDR4-2133MHz

System equipped with Intel Xeon E5-2697 v2

System: Pre-production
Processors: Xeon E5-2697 v2 @2.7GHz
Memory: 64GB DDR4-1866MHz

Operating System: Red Hat* Enterprise Linux Server release 6.4

Application: Abaqus/Standard benchmarks version 6.13-1

Note:

1) Although the system equipped with the Intel® Xeon® E5-2697 v3 processor has more memory, the memory capacity does not affect the tests results, as the largest workload only used 43GB of memory.

2) The duration was measured by wall-clock time in seconds.

Test Results

Figure 4. Comparison between Intel Xeon E5-2697 v3 and E5-2697 v2

Figure 4 shows the benchmarks running on a system equipped with Intel Xeon E5-2697 v3 and on a system equipped with E5-2697 v2. Performance improvement due to Intel AVX2 and hardware advantage ranging from 1.11X to 1.39X.

Figure 5. Comparison between benchmarks with Intel AVX2 enabled and disabled

Figure 5 shows the results of benchmarks with Intel AVX2 enabled and disabled on a system equipped with Intel Xeon E5-2697 v3. Using Intel AVX2 allows benchmarks to finish faster than without using Intel AVX2. The performance increase due to Intel AVX2 is ranging from 1.03X to 1.11X.

Note: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance

Conclusion

Simulation software performance is very critical since it can significantly reduce the model development and analysis time. Abaqus/Standard is well-known for FEA that relies on DGEMM for its solvers. As a result of the introduction of Intel® AVX2 in the Intel® Xeon® E5-2600 v3 Product Family, and as a result of the Intel MKL augmentation to take advantage of Intel AVX2, a simple change to the Abaqus/Standard to use the latest libraries yielded a considerable performance improvement.

References

[1] www.3ds.com

[3] http://en.wikipedia.org/wiki/Finite_element_method

[4] http://en.wikipedia.org/wiki/Math_Kernel_Library

[5] https://software.intel.com/en-us/node/429920

[6] http://en.wikipedia.org/wiki/FMA_instruction_set

[7] http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

[8] http://people.maths.ox.ac.uk/suli/fem.pdf

[9] http://www.3ds.com/products-services/simulia/products/abaqus/abaqusstandard/

[10] http://www.simulia.com/support/v66/v66_performance.html#s2

[11] http://en.wikipedia.org/wiki/Lanczos_algorithm

[12] http://sbel.wisc.edu/People/schafer/mdexperiments/node13.html

[13] http://en.wikipedia.org/wiki/Newton%27s_method

Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. Intel, the Intel logo, Intel Core, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. Copyright © 2015 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.

Microsoft Windows* (XP, Vista, 7)

Amélioration des performances

Serveur

URL

Zone des thèmes:

IDZone

↧

Easy SIMD through Wrappers

March 27, 2015, 5:10 pm

Latest and popular articles on Intel Technologies

≫ Next: Videos - Parallel Programming with Intel Xeon Phi Coprocessors

≪ Previous: Abaqus/Standard Performance Case Study on Intel® Xeon® E5-2600 v3 Product Family

By Michael Kopietz

Download PDF

1. Introduction

This article aims to change your thinking on how SIMD programming can be applied in your code. By thinking of SIMD lanes as functioning similarly to CPU threads, you will gain new insights and be able to apply SIMD more often in your code.

Intel has been shipping CPUs with SIMD support for about twice as long as they have been shipping multi core CPUs, yet threading is more established in software development. One reason for this increased support is an abundance of tutorials that introduce threading in a simple “run this entry function n-times” manner, skipping all the possible traps. On the other side, SIMD tutorials tend to focus on achieving the final 10% speed up that requires you to double the size of your code. If these tutorials provide code as an example, you may find it hard to focus on all the new information and at the same time come up with your simple and elegant way of using it. Thus showing a simple, useful way of using SIMD is the topic of this paper.

First the basic principle of SIMD code: alignment. Probably all SIMD hardware either demands or at least prefers some natural alignment, and explaining the basics could fill a paper [1]. But in general, if you're not running out of memory, it is important for you to allocate memory in a cache friendly way. For Intel CPUs that means allocating memory on a 64 byte boundary as shown in Code Snippet 1.

inline void* operator new(size_t size)
{
	return _mm_malloc(size, 64);
}

inline void* operator new[](size_t size)
{
	return _mm_malloc(size, 64);
}

inline void operator delete(void *mem)
{
	_mm_free(mem);
}

inline void operator delete[](void *mem)
{
	_mm_free(mem);
}

Code Snippet 1: Allocation functions that respect cache-friendly 64 byte boundaries

2. The basic idea

The way to begin is simple: assume every lane of a SIMD register executes as a thread. In case of Intel® Streaming SIMD Extensions (Intel® SSE), you have 4 threads/lanes, with Intel® Advanced Ventor Extensions (Intel® AVX) 8 threads/lanes and 16 threads/lanes on Intel® Xeon-p Phi coprocessors .

To have a 'drop in' solution, the first step is to implement classes that behave mostly like primitive data types. Wrap 'int', 'float' etc. and use those wrappers as the starting point for every SIMD implementation. For the Intel SSE version, replace the float member with __m128, int and unsigned int with __m128i and implement operators using Intel SSE intrinsics or Intel AVX intrinsics as in Code Snippet 2.

// SEE 128-bit
inline	DRealF	operator+(DRealF R)const{return DRealF(_mm_add_ps(m_V, R.m_V));}
inline	DRealF	operator-(DRealF R)const{return DRealF(_mm_sub_ps(m_V, R.m_V));}
inline	DRealF	operator*(DRealF R)const{return DRealF(_mm_mul_ps(m_V, R.m_V));}
inline	DRealF	operator/(DRealF R)const{return DRealF(_mm_div_ps(m_V, R.m_V));}

// AVX 256-bit
inline	DRealF	operator+(const DRealF& R)const{return DRealF(_mm256_add_ps(m_V, R.m_V));}
inline	DRealF	operator-(const DRealF& R)const{return DRealF(_mm256_sub_ps(m_V, R.m_V));}
inline	DRealF	operator*(const DRealF& R)const{return DRealF(_mm256_mul_ps(m_V, R.m_V));}
inline	DRealF	operator/(const DRealF& R)const{return DRealF(_mm256_div_ps(m_V, R.m_V));}

Code Snippet 2: Overloaded arithmetic operators for SIMD wrappers

3. Usage Example

Now let’s assume you're working on two HDR images, where every pixel is a float and you blend between both images.

void CrossFade(float* pOut,const float* pInA,const float* pInB,size_t PixelCount,float Factor)

void CrossFade(float* pOut,const float* pInA,const float* pInB,size_t PixelCount,float Factor)
{
	const DRealF BlendA(1.f - Factor);
	const DRealF BlendB(Factor);
	for(size_t i = 0; i < PixelCount; i += THREAD_COUNT)
		*(DRealF*)(pOut + i) = *(DRealF*)(pInA + i) * BlendA + *(DRealF*)(pInB + i) + BlendB;
}

Code Snippet 3: Blending function that works with both primitive data types and SIMD data

The executable generated from Code Snippet 3 runs natively on normal registers as well as on Intel SSE and Intel AVX. It's not really the vanilla way you'd write it usually, but every C++ programmer should still be able to read and understand it. Let’s see whether it's the way you expect. The first and second line of the implementation initialize the blend factors of our linear interpolation by replicating the parameter to whatever width your SIMD register has.

The third line is nearly a normal loop. The only special part is “THREAD_COUNT”. It's 1 for normal registers, 4 for Intel SSE and 8 for Intel AVX, representing the count of lanes of the register, which in our case resembles threads.

The fourth line indexes into the arrays and both input pixel are scaled by the blend factors and summed. Depending on your preference of writing it, you might want to use some temporaries, but there is no intrinsic you need to look up, no implementation per platform.

4. Drop in

Now it's time to prove that it actually works. Let's take a vanilla MD5 hash implementation and use all of your available CPU power to find the pre-image. To achieve that, we'll replace the primitive types with our SIMD types. MD5 is running several “rounds” that apply various simple bit operations on unsigned integers as demonstrated in Code Snippet 4.

#define LEFTROTATE(x, c) (((x) << (c)) | ((x) >> (32 - (c))))
#define BLEND(a, b, x) SelectBit(a, b, x)

template<int r>
inline DRealU Step1(DRealU a,DRealU b,DRealU c,DRealU d,DRealU k,DRealU w)
{
	const DRealU f = BLEND(d, c, b);
	return b + LEFTROTATE((a + f + k + w), r);
}

template<int r>
inline DRealU Step2(DRealU a,DRealU b,DRealU c,DRealU d,DRealU k,DRealU w)
{
	const DRealU f = BLEND(c, b, d);
	return b + LEFTROTATE((a + f + k + w),r);
}

template<int r>
inline DRealU Step3(DRealU a,DRealU b,DRealU c,DRealU d,DRealU k,DRealU w)
{
	DRealU f = b ^ c ^ d;
	return b + LEFTROTATE((a + f + k + w), r);
}

template<int r>
inline DRealU Step4(DRealU a,DRealU b,DRealU c,DRealU d,DRealU k,DRealU w)
{
	DRealU f = c ^ (b | (~d));
	return b + LEFTROTATE((a + f + k + w), r);
}

Code Snippet 4: MD5 step functions for SIMD wrappers

Besides the type naming, there is really just one change that could look a little bit like magic — the “SelectBit”. If a bit of x is set, the respective bit of b is returned; otherwise, the respective bit of a; in other words, a blend. The main MD5 hash function is shown in Code Snippet 5.

inline void MD5(const uint8_t* pMSG,DRealU& h0,DRealU& h1,DRealU& h2,DRealU& h3,uint32_t Offset)
{
	const DRealU w0  =	Offset(DRealU(*reinterpret_cast<const uint32_t*>(pMSG + 0 * 4) + Offset));
	const DRealU w1  =	*reinterpret_cast<const uint32_t*>(pMSG + 1 * 4);
	const DRealU w2  =	*reinterpret_cast<const uint32_t*>(pMSG + 2 * 4);
	const DRealU w3  =	*reinterpret_cast<const uint32_t*>(pMSG + 3 * 4);
	const DRealU w4  =	*reinterpret_cast<const uint32_t*>(pMSG + 4 * 4);
	const DRealU w5  =	*reinterpret_cast<const uint32_t*>(pMSG + 5 * 4);
	const DRealU w6  =	*reinterpret_cast<const uint32_t*>(pMSG + 6 * 4);
	const DRealU w7  =	*reinterpret_cast<const uint32_t*>(pMSG + 7 * 4);
	const DRealU w8  =	*reinterpret_cast<const uint32_t*>(pMSG + 8 * 4);
	const DRealU w9  =	*reinterpret_cast<const uint32_t*>(pMSG + 9 * 4);
	const DRealU w10 =	*reinterpret_cast<const uint32_t*>(pMSG + 10 * 4);
	const DRealU w11 =	*reinterpret_cast<const uint32_t*>(pMSG + 11 * 4);
	const DRealU w12 =	*reinterpret_cast<const uint32_t*>(pMSG + 12 * 4);
	const DRealU w13 =	*reinterpret_cast<const uint32_t*>(pMSG + 13 * 4);
	const DRealU w14 =	*reinterpret_cast<const uint32_t*>(pMSG + 14 * 4);
	const DRealU w15 =	*reinterpret_cast<const uint32_t*>(pMSG + 15 * 4);

	DRealU a = h0;
	DRealU b = h1;
	DRealU c = h2;
	DRealU d = h3;

	a = Step1< 7>(a, b, c, d, k0, w0);
	d = Step1<12>(d, a, b, c, k1, w1);
	.
	.
	.
	d = Step4<10>(d, a, b, c, k61, w11);
	c = Step4<15>(c, d, a, b, k62, w2);
	b = Step4<21>(b, c, d, a, k63, w9);

	h0 += a;
	h1 += b;
	h2 += c;
	h3 += d;
}

Code Snippet 5: The main MD5 function

The majority of the code is again like a normal C function, except that the first lines prepare the data by replicating our SIMD registers with the parameter passed. In this case we load the SIMD registers with the data we want to hash. One specialty is the “Offset” call, since we don't want every SIMD lane to do exactly the same work, this call offsets the register by the lane index. It's like a thread-id you would add. See Code Snippet 6 for reference.

Offset(Register)
{
	for(i = 0; i < THREAD_COUNT; i++)
		Register[i] += i;
}

Code Snippet 6: Offset is a utility function for dealing with different register widths

That means, our first element that we want to hash is not [0, 0, 0, 0] for Intel SSE or [0, 0, 0, 0, 0, 0, 0, 0] for Intel AVX. Instead the first element is [0, 1, 2, 3] and [0, 1, 2, 3, 4, 5, 6, 7], respectively. This replicates the effect of running the function in parallel by 4 or 8 threads/cores, but in case of SIMD, instruction parallel.

We can see the results for our 10 minutes of hard work to get this function SIMD-ified in Table 1.

Table 1: MD5 performance with primitive and SIMD types

Type	Time	Speedup
x86 integer	379.389s	1.0x
SSE4	108.108s	3.5x
AVX2	51.490s	7.4x

5. Beyond Simple SIMD-threads

The results are satisfying, not linearly scaling, as there is always some non-threaded part (you can easily identify it in the provided source code). But we're not aiming for the last 10% with twice the work. As a programmer, you'd probably prefer to go for other quick solutions that maximize the gain. Some considerations always arise, like: Would it be worthwhile to unroll the loop?

MD5 hashing seems to be frequently dependent on the result of previous operations, which is not really friendly for CPU pipelines, but you could become register bound if you unroll. Our wrappers can help us to evaluate that easily. Unrolling is the software version of hyper threading, we emulate twice the threads running by repeating the execution of operations on twice the data than SIMD lanes available. Therefore create a duplicate type alike and implement unrolling inside by duplicating every operation for our basic operators as in Code Snippet 7.

struct __m1282
{
	__m128		m_V0;
	__m128		m_V1;
	inline		__m1282(){}
	inline		__m1282(__m128 C0, __m128 C1):m_V0(C0), m_V1(C1){}
};

inline	DRealF	operator+(DRealF R)const
	{return __m1282(_mm_add_ps(m_V.m_V0, R.m_V.m_V0),_mm_add_ps(m_V.m_V1, R.m_V.m_V1));}
inline	DRealF	operator-(DRealF R)const
	{return __m1282(_mm_sub_ps(m_V.m_V0, R.m_V.m_V0),_mm_sub_ps(m_V.m_V1, R.m_V.m_V1));}
inline	DRealF	operator*(DRealF R)const
	{return __m1282(_mm_mul_ps(m_V.m_V0, R.m_V.m_V0),_mm_mul_ps(m_V.m_V1, R.m_V.m_V1));}
inline	DRealF	operator/(DRealF R)const
	{return __m1282(_mm_div_ps(m_V.m_V0, R.m_V.m_V0),_mm_div_ps(m_V.m_V1, R.m_V.m_V1));}

Code Snippet 7: These operators are re-implemented to work with two SSE registers at the same time

That's it, really, now we can again run the timings of the MD5 hash function.

Table 2: MD5 performance with loop unrolling SIMD types

Type	Time	Speedup
x86 integer	379.389s	1.0x
SSE4	108.108s	3.5x
SSE4 x2	75.659s	4.8x
AVX2	51.490s	7.4x
AVX2 x2	36.014s	10.5x

The data in Table 2 shows that it's clearly worth unrolling. We achieve speed beyond the SIMD lane count scaling, probably because the x86 integer version was already stalling the pipeline with operation dependencies.

6. More complex SIMD-threads

So far our examples were simple in the sense that the code was the usual candidate to be vectorized by hand. There is nothing complex beside a lot of compute demanding operations. But how would we deal with more complex scenarios like branching?

The solution is again quite simple and widely used: speculative calculation and masking. Especially if you've worked with shader or compute languages, you'll likely have encountered this before. Let’s take a look at a basic branch of Code Snippet 8 and rewrite it to a ?: operator as in Code Snippet 9.

int a = 0;
if(i % 2 == 1)
	a = 1;
else
	a = 3;

Code Snippet 8: Calculates the mask using if-else

int a = (i % 2) ? 1 : 3;

Code Snippet 9: Calculates the mask with ternary operator ?:

If you recall our bit-select operator of Code Snippet 4, we can also use it to achieve the same with only bit operations in Code Snippet 10.

int Mask = (i % 2) ? ~0 : 0;
int a = SelectBit(3, 1, Mask);

Code Snippet 10: Use of SelectBit prepares for SIMD registers as data

Now, that might seem pointless, if we still have an ?: operator to create the mask, and the compare does not result in true or false, but in all bits set or cleared. Yet this is not a problem, because all bits set or cleared are actually what the comparison instruction of Intel SSE and Intel AVX return.

Of course, instead of assigning just 3 or 1, you can call functions and select the returned result you want. That might lead to performance improvement even in non-vectorized code, as you avoid branching and the CPU never suffers of branch misprediction, but the more complex the functions you call, the more a misprediction is possible. Even in vectorized code, we'll avoid executing unneeded long branches, by checking for special cases where all elements of our SIMD register have the same comparison result as demonstrated in Code Snippet 11.

int Mask = (i % 2) ? ~0 : 0;
int a = 0;
if(All(Mask))
	a = Function1();
else
if(None(Mask))
	a = Function3();
else
	a = BitSelect(Function3(), Function1(), Mask);

Code Snippet 11: Shows an optimized branchless selection between two functions

This detects the special cases where all of the elements are 'true' or where all are 'false'. Those cases run on SIMD the same way as on x86, just the last 'else' case is where the execution flow would diverge, hence we need to use a bit-select.

If Function1 or Function3 modify any data, you'd need to pass the mask down the call and explicitly bit select the modifications just like we've done here. For a drop-in solution, that's a bit of work, but it still results in code that’s readable by most programmers.

7. Complex example

Let's again take some source code and drop in our SIMD types. A particularly interesting case is raytracing of distance fields. For this, we'll use the scene from Iñigo Quilez's demo [2] with his friendly permission, as shown in Figure 1.

Figure 1: Test scene from Iñigo Quilez's raycasting demo

The “SIMD threading” is placed at a spot where you'd add threading usually. Every thread handles a pixel, traversing the world until it hits something, subsequently a little bit of shading is applied and the pixel is converted to RGBA and written to the frame buffer.

The scene traversing is done in an iterative way. Every ray has an unpredictable amount of steps until a hit is recognized. For example, a close up wall is reached after a few steps while some rays might reach the maximum trace distance not hit anything at all. Our main loop in Code Snippet 12 handles both cases using the bit select method we've discussed in the previous section.

DRealU LoopMask(RTrue);
for(; a < 128; a++)

{
      DRealF Dist             =     SceneDist(O.x, O.y, O.z, C);
      DRealU DistU            =     *reinterpret_cast<DRealU*>(&Dist) & DMask(LoopMask);
      Dist                    =     *reinterpret_cast<DRealF*>(&DistU);
      TotalDist               =     TotalDist + Dist;
      O                       +=    D * Dist;
      LoopMask                =     LoopMask && Dist > MinDist && TotalDist < MaxDist;
      if(DNone(LoopMask))
            break;
}

Code Snippet 12: Raycasting with SIMD types

The LoopMask variable identifies that a ray is active by ~0 or 0 in which case we are done with that ray. At the end of the loop, we test whether no ray is active anymore and in this case we break out of the loop.

In the line above we evaluate our conditions for the rays, whether we're close enough to an object to call it a hit or whether the ray is already beyond the maximum distance we want to trace. We logically AND it with the previous result, as the ray might be already terminated in one of the previous iterations.

“SceneDist” is the evaluation function for our tracing - It's run for all SIMD-lanes and is the heavy weighted function that returns the current distance to the closest object. The next line sets the distance elements to 0 for rays that are not active anymore and steps this amount further for the next iteration.

The original “SceneDist” had some assembler optimizations and material handling that we don't need for our test, and this function is reduced to the minimum we need to have a complex example. Inside are still some if-cases that are handled exactly the same as before. Overall, the “SceneDist” is quite large and rather complex and would take a while to rewrite it by hand for every SIMD-platform again and again. You might need to convert it all in one go, while typos might generate completely wrong results. Even if it works, you'll have only a few functions that you really understand, and maintenance is much higher. Doing it by hand should be the last resort. Compared to that, our changes are relatively minor. It's easy to modify and you are able to extend the visual appearance, without the need to worry about optimizing it again and being the only maintainer that understands the code, just like it would be by adding real threads.

But we've done that work to see results, so let’s check the timings in Table 3.

Table 3: Raytracing performance with primitive and SIMD types, including loop unrolling types

Type	FPS	Speedup
x86	0.992FPS	1.0x
SSE4	3.744FPS	3.8x
SSE4 x2	3.282FPS	3.3x
AVX2	6.960FPS	7.0x
AVX2 x2	5.947FPS	6.0x

You can clearly see the speed up is not scaling linearly with the element count, which is mainly because of the divergence. Some rays might need 10 times more iterations than others.

8. Why not let the compiler do it?

Compilers nowadays can vectorize to some degree, but the highest priority for the generated code is to deliver correct results, as you would not use 100 time faster binaries that deliver wrong results even 1% of the time. Some assumptions we make, like the data will be aligned for SIMD, and we allocate enough padding to not overwrite consecutive allocations, are out of scope for the compiler. You can get annotations from the Intel compiler about all opportunities it had to skip because of assumptions it could not guarantee, and you can try to rearrange code and make promises to the compiler so it'll generate the vectorized version. But that's work you have to do every time you modify your code and in more complex cases like branching, you can just guess whether it will result in branchless bit selection or serialized code.

The compiler has also no inside knowledge of what you intend to create. You know whether threads will be diverging or coherent and implement a branched or bit selecting solution. You see the point of attack, the loop that would make most sense to change to SIMD, whereas the compiler can just guess whether it will run 10times or 1 million times.

Relying on the compiler might be a win in one place and pain in another. It's good to have this alternative solution you can rely on, just like your hand placed thread entries.

9. Real threading?

Yes, real threading is useful and SIMD-threads are not a replacement — both are orthogonal. SIMD-threads are still not as simple to get running as real threading is, but you'll also run into less trouble about synchronization and seldom bugs. The really nice advantage is that every core Intel sells can run your SIMD-thread version with all the 'threads'. A dual core CPU will run 4 or 8 times faster just like your quad socket 15-core Haswell-EP. Some results for our benchmarks in combination with threading are summarized in Table 4 through Table 7.¹

Table 4: MD5 Performance on Intel® Core™ i7 4770K with both SIMD and threading

Threads	Type	Time	Speedup
1T	x86 integer	311.704s	1.00x
8T	x86 integer	47.032s	6.63x
1T	SSE4	90.601s	3.44x
8T	SSE4	14.965s	20.83x
1T	SSE4 x2	62.225s	5.01x
8T	SSE4 x2	12.203s	25.54x
1T	AVX2	42.071s	7.41x
8T	AVX2	6.474s	48.15x
1T	AVX2 x2	29.612s	10.53x
8T	AVX2 x2	5.616s	55.50x

Table 5: Raytracing Performance on Intel® Core™ i7 4770K with both SIMD and threading

Threads	Type	FPS	Speedup
1T	x86 integer	1.202FPS	1.00x
8T	x86 integer	6.019FPS	5.01x
1T	SSE4	4.674FPS	3.89x
8T	SSE4	23.298FPS	19.38x
1T	SSE4 x2	4.053FPS	3.37x
8T	SSE4 x2	20.537FPS	17.09x
1T	AVX2	8.646FPS	4.70x
8T	AVX2	42.444FPS	35.31x
1T	AVX2 x2	7.291FPS	6.07x
8T	AVX2 x2	36.776FPS	30.60x

Table 6: MD5 Performance on Intel® Core™ i7 5960X with both SIMD and threading

Threads	Type	Time	Speedup
1T	x86 integer	379.389s	1.00x
16T	x86 integer	28.499s	13.34x
1T	SSE4	108.108s	3.51x
16T	SSE4	9.194s	41.26x
1T	SSE4 x2	75.694s	5.01x
16T	SSE4 x2	7.381s	51.40x
1T	AVX2	51.490s	3.37x
16T	AVX2	3.965s	95.68x
1T	AVX2 x2	36.015s	10.53x
16T	AVX2 x2	3.387s	112.01x

Table 7: Raytracing Performance on Intel® Core™ i7 5960X with both SIMD and threading

Threads	Type	FPS	Speedup
1T	x86 integer	0.992FPS	1.00x
16T	x86 integer	6.813FPS	6.87x
1T	SSE4	3.744FPS	3.774x
16T	SSE4	37.927FPS	38.23x
1T	SSE4 x2	3.282FPS	3.31x
16T	SSE4 x2	33.770FPS	34.04x
1T	AVX2	6.960FPS	7.02x
16T	AVX2	70.545FPS	71.11x
1T	AVX2 x2	5.947FPS	6.00x
16T	AVX2 x2	59.252FPS	59.76x

¹Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

As you can see, the threading results vary depending on the CPU, the SIMD-thread results scale similar. But it's striking that you can reach speed up factors in the higher two digits if you combine both ideas. It makes sense to go for the 8x speed up on a dual core, but so does it make sense to go for an additional 8x speed up on highly expensive server hardware.

Join me, SIMD-ify your code!

About the Author

Michael Kopietz is Render Architect at Crytek's R&D and leads a team of engineers developing the rendering of CryEngine(R) and also guides students during their thesis. He worked among other things on the cross platform rendering architecture, software rendering and on highly responsive server, always high-performance and reusable code in mind. Prior, he was working on ship-battle and soccer simulation game rendering. Having his root in assembler programming on the earliest home consoles, he still wants to make every cycle count.

Code License

References

[1] Memory Management for Optimal Performance on Intel® Xeon Phi™ Coprocessor: Alignment and Prefetching https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and

[2] Rendering Worlds with Two Triangles by Iñigo Quilez http://www.iquilezles.org/www/material/nvscene2008/nvscene2008.htm

Extensions Intel® Streaming SIMD

Processeurs Intel® Atom™

Processeurs Intel® Core™

Memory Management for Optimal Performance on Intel® Xeon Phi™ Coprocessor: Alignment and Prefetching

URL:

Zone des thèmes:

Windows Zone

Software Tools for Intel Xeon Co-processors

Dernière modification par:

Debra G. (Intel)

↧

Videos - Parallel Programming with Intel Xeon Phi Coprocessors

March 30, 2015, 12:12 pm

Latest and popular articles on Intel Technologies

≫ Next: The Uniqueness of UUIDs

≪ Previous: Easy SIMD through Wrappers

Here is a list of recently published videos from Colfax International on Intel(R) Xeon Phi(TM) Coprocessors.

In this video we will discuss software tools needed and recommended for developing applications for Intel Xeon Phi coprocessors. We will begin with software that is necessary to boot coprocessors and to run pre-compiled executables on them.

Will My Application Benefit from the MIC architecture

In this video episode we will discuss the types of applications that perform well on Intel Xeon Phi coprocessors. I hope that this discussion will help you to answer the question “Will my application benefit from the MIC architecture?”

Knights Landing, the Next Manycore Architecture

In this video we will discuss the next generation MIC architecture, based on a chip codenamed Knight’s Landing or KNL, for short.

Here is a link to other videos on Parallel Programming and Optimization with Intel Xeon Phi Coprocessors from Colfax International.

Intel Xeon Phi Coprocessor

HPC

Image de l’icône:

Didacticiels

Informatique en cluster

Data Center

Optimisation

Interface de transmission de messages

Parallélisation

Vectorisation

Zone des thèmes:

IDZone

Inclure dans RSS:

↧

The Uniqueness of UUIDs

March 30, 2015, 6:13 pm

Latest and popular articles on Intel Technologies

≫ Next: Use cases benefiting from the optimization of small networking data packets using Intel® DPDK an open source solution

≪ Previous: Videos - Parallel Programming with Intel Xeon Phi Coprocessors

Documentation: System Management BIOS (SMBIOS) Reference 6 Specification
UUID Wiki Page
Follow Gael on Twitter: @GaelHof

In a managed environment, it is important that all hardware assets have a unique identifier so as to not have naming conflicts. This unique identifier is known as the UUID. UUID stands for: Universally Unique ID.

Everything works out fine as long as all of the systems you are managing actually do have unique UUIDs. However, not every OEM adheres to the UUID specifications and instead they may use the MAC address plus some type of padding. The thought here is that the MAC address is unique - it may or may not be, we don't really know. Ensuring systems have unique UUIDs is very important if an organization is using a technology such as Intel(r) Active Management Technology to discover and manage thousands of systems. If UUIDs are not unique, errors will be encountered that will be difficult to trace.

If the OEM follows the SMBIOS specifications then the systems will (most likely) have unique UUIDs. However, any implementation that has potential for implementation variations could result in duplicate UUIDs.

There are actual 5 different versions of UUIDs in use. See below for more details.

The UUID format as defined in the SMBIOS (System management BIOS) standard is as follows:

From the SMBIOS Spec:"One of the main reasons for using UUIDs is that no centralized authority is required to administer them (although one format uses IEEE 802 node identifiers, others do not.) As a result, generation on demand can be completely automated and used for a variety of purposes. The UUID generation algorithm described here supports very high allocation rates of up to 10 million per second per machine if necesarry (Carl Segan might enjoy this fact) so that they could even be used as transaction IDs. UUIDs are of a fixed size (128 bits) which is reasonably small compared to other alternatives. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general. Since UUIDs are unique and persistent, they make excellent Uniform Resourse Names. The unique ability to generate a new UUID without a registration process allows for UUIDs to be one of the URNs with the lowest minting cost."

UUID Versions:

Version 1: MAC Address & Date-Time
Version 2: DCE Security
Version 3: MD5 Hash & Namespace
Version 4: Random
Version 5: SHA-1 Hash & Namespace

UUID

Intel AMT

Image de l’icône:

Technologie d’administration active Intel®

Zone des thèmes:

IDZone

Inclure dans RSS:

Intel® QuickAssist Technology

↧

Use cases benefiting from the optimization of small networking data packets using Intel® DPDK an open source solution

March 31, 2015, 10:03 am

Latest and popular articles on Intel Technologies

≫ Next: Use Edison SDK + chroot to build up performance application

≪ Previous: The Uniqueness of UUIDs

Intel® Data Plane Development Kit (Intel® DPDK) is a set of optimized data plane software libraries and drivers that can be used to accelerate packet processing on Intel® architecture. The performance of Intel DPDK scales with improvements in processor technology from Intel® Atom™ to Intel® Xeon® processors. In April 2013 6WIND established dpdk.org an Open Source Project where Intel DPDK is offered under the open source BSD* license. Whether using the open source solution or the Intel DPDK, developers now have the ability to accelerate network applications across a broad spectrum, including telecom, enterprise, and cloud applications. The advantages of combining Intel DPDK with Intel hardware include portability, scalability, and integration with other Intel hardware solutions for even more performance gains. This blog will cover various use cases including virtual switching, big data, and next-generation firewalls where Intel DPDK packet handling has been of value.

Network Function Virtualization

Intel DPDK can be very useful when incorporated within virtualized environments. For example, a recent trend in Software Defined Networks (SDN) is increasing demand for fast host-based packet handling and a move towards Network Functions Virtualization (NFV). NFV is a new way to provide network functions such as firewalls, domain name service, and network address translation as a fully virtualized infrastructure. One example of this is Open vSwitch*, which is an open source solution capable of providing virtual switching. Intel DPDK has been combined with Open vSwitch to provide an accelerated experience.

Telecommunications Industry

The telecommunications industry is increasingly moving towards virtualization in an effort to provide more agility, flexibility, and standardization within its network environments, which over time have traditionally grown in a more heterogeneous way. In the white paper “Carrier Cloud Telecoms – Exploring the Challenges of Deploying Virtualization and SDN in Telecoms Networks”, Tieto in collaboration with Intel showed a Cloud telecom’s use case that combines SDN, NFV, Intel DPDK, Openflow*, and Open vSwitch. They looked at multiple scenarios that include dynamic provisioning of 4G/LTE traffic and resources in a virtualized SDN environment, high-performance and energy-efficient packet processing, and protocol distribution using the Intel DPDK and the Tieto IP stack (TIP), 4G/LTE to 3G video stream handover scenario, and a Packet Data Network Gateway scenario where SDN is used for the handover of Internet traffic.

The white paper NEC* Virtualized EPC Innovation Powered by Multi Core Intel® Architecture Processors, discusses how NEC was able to deploy a virtualized Evolved Packet Core (vEPC), which is a framework for converging data and voice on 4G Long-Term Evolution (LTE) networks, on a common Intel architecture server platform with Intel DPDK and achieve carrier grade service. NEC adopted the Intel DPDK for its vEPC in order to significantly improve the data plane forwarding performance in a virtualization environment.

Next-generation Firewalls

The need for continued refinements in network security have led to improved implementations of firewalls, which is another growing segment that can benefit from Intel DPDK. These next-generation firewalls may also be part of an NFV solution. Basic firewalls that are used for simple packet filtering have evolved in recent years to perform more advanced applications such as intrusion detection and prevention (IPS), network antivirus, IPsec, SSL, application control, and more. These features all reside in the data plane and require deep packet inspection of the data streams, cryptographic and compression capabilities, and heavy processing of the packet contents. A next-generation firewall was designed using Wind River Network Acceleration Platform with Intel DPDK and Intel® QuickAssist Technology§. Intel provides the hardware to receive and transmit network traffic efficiently, along with fast CPUs and large caches, which are ideal for operating with these data-intensive applications. Intel DPDK provides the mechanisms that support high-performance alternatives to Linux* system calls, bypassing the generic issues of the Linux kernel. Finally, Wind River Network Acceleration Platform builds on the Intel infrastructure to provide acceleration of native Linux applications such as an Apache server and provide even higher acceleration for security applications ported onto the network acceleration engine. For more information on this use case see the white paper “Multi-Core Networking For The New Data Plane” and watch a live demonstration here, which provides an example of a next-generation firewall capable of analytics to monitor user traffic applications, and content inspection for malware. On a related note, Intel DPDK is combined with Hyperscan and other Intel technologies for a next-generation IPS solution, which is included as part of the Intel® Security Controller.

Big Data Analytics

For a use case involving Big Data analytics, Aspera and Intel investigated ultra-high-speed data transfer solutions built on Aspera’s fasp* transport technology and the Intel® Xeon® processor E5-2600 v3 product family. The solution was able to achieve predictable ultra-high WAN transfer speeds on commodity Internet connections on both bare metal and virtualized hardware platforms, including over networks with hundreds of milliseconds of round-trip time and several percentage points of packet loss characteristic of typical global-distance WANs. By using Intel DPDK, software engineers were able to reduce the number of memory copies needed to send and receive a packet. This enabled Aspera to boost single stream data transfer speeds to 37.75 Gbps on the tested system¹, which represents network utilization of 39 Gbps when Ethernet framing and IP packet headers are accounted for. The team also began preliminary investigation of the transfer performance on virtualized platforms by testing on a kernel-based virtual machine (KVM) hypervisor and obtained initial transfer speeds of 16.1 Gbps. The KVM solution was not yet NUMA or memory optimized, and thus the team expects to obtain even faster speeds as they apply these optimizations in the future. For details about performance findings, system specifications, software specifications, etc. see the white paper Big Data Technologies for Ultra-High-Speed data Transfer and Processing.

Power Grid

As the world’s largest electric utility company, the State Grid Corporation of China (SGCC) provides power to about 1.1 billion people. SGCC relies on a high performance computing cluster to ensure the power grid’s safe and stable operation. With the expansion of China’s power grid size, SGCC’s Advanced Digital Power System Simulator* (ADPSS*) had to be enhanced to meet the state’s increasing power supply demands. The white paper “An integrated Intel® architecture based solution for power grid simulation” explores these challenges. Intel DPDK was one of the key ingredients used to reduce the latency to within 50 microseconds for ADPSS. This was a requirement for creating a large scale power system simulation for 3,000 generators and 30,000 grid transmission lines.

Summary

Intel DPDK can help with a broad spectrum of use cases including NFV, next-generation firewalls, and big data across different industries such as telecommunications, energy, and information technology. It can provide optimization anytime you have high performance applications utilizing small (64 byte) networking data packets. It offers a simple software programming model that scales from Intel Atom processors to the latest Intel Xeon processors, providing flexible system configurations to meet any customer requirements for performance and scalable I/O. The benefits provided by Intel DPDK can be combined with other Intel technologies for additional improvements. These can include Intel QuickAssist Technology, a cryptographic accelerator and Hyperscan, a Deep Packet Inspection solution.

Resources

Open-Source Project
dpdk.org
Intel® DPDK: Overview
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/dpdk-packet-processing-ia-overview-presentation.html
Intel® DPDK: Installation and Configuration Guide
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-getting-started-guide.html.
Intel® DPDK: Programmer’s Guide
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-programmers-guide.html.
Intel® DPDK: API Reference Documentation
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-api-reference.html.
Intel DPDK: Sample Applications
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-sample-applications-user-guide.html.
Intel® DPDK: Latest Source Code Packages for the Intel® DPDK Library
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/dpdk-source-code.html

About the Author

David Mulnix has been a software engineer with Intel Corporation for over 15 years. His areas of focus have included software automation, server power and performance analysis, and cloud security.

Intel® DPDK

Intel® Xeon® processors

SDN

NFV

Open vSwitch*

Zone des thèmes:

IDZone

↧

Use Edison SDK + chroot to build up performance application

March 31, 2015, 10:38 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® System Studio - Solutions, Tips and Tricks

≪ Previous: Use cases benefiting from the optimization of small networking data packets using Intel® DPDK an open source solution

Extract more performance via Intel software techniques

Edison is a powerful IoT platform with dual-core 500MHz Atom CPU inside. By deploying Intel software techniques, more power can be extracted from it. LIke the following figure shows; it's a comparison test performed on Edison and the result clearly presents the performance benefits by using Intel optimized libraries and compiler software techniques. Can't wait to try it on your own Edison? Check it on http://software.intel.com/en-us/intel-system-studio and follow the steps in this article.

* detection time varies among different cases of pictures/video stream.
* bottom labels are for image resolution
* ipp stands for Intel^® Performance Primitives
* tbb stands for Intel^® Threading Building Block
* icc stands for Intel^® C++ Compiler

Edison SDK + chroot to facilitate the development works

In addition, Edison is designed for quick prototype/product development. Therefore, it provides its own SDK to facilitate the application development. Without using SDK, you can still build up libraries and sample application directly on the Edison target. But it would take half day long to get the job done.

Edison SDK provides cross-compile setup script and here we provide another tip by using chroot linux utility + Edison SDK to ease the development environment setup. The result shows it only takes less than 3 minutes on our Core i7@3.3GHz machine to build all opencv libs and applications. Check the figure below to review final built-out binaries.

Where to get Edison SDK? You can either download it from Edison support website (search “SDK”) or build your own one. BSP reference guide contains detailed information about how to generate Edison SDK and how to customize your Edison image. The following figure shows the final output of SDK image name, one single file combined the installation script and bzip format payloads.

Once you successfully installed the SDK by excuting it, you should find SDK located at the following directory structure.

By applying chroot to this specified root folder “…/coer2-32-poky-linux”, you will immediately have a Linux shell working in the same root directories structures as in Edison. It provide some convience for development works. For example, you could update certain library packages via opkg like update happens in Edison target. And you can build or even debug the codes in current environment first (with the running kernel of current developer machine) before deploying the test software to the real Edison target.

More information and detailed steps to build up Edison opencv application.

This attached txt file below contains the detailed steps to build up an opencv face detection application under Edison SDK + chroot setup. To use Intel^® C++ compiler, you have to install the latest version of Intel system studio via http://software.intel.com/en-us/intel-system-studio. Please be aware that Intel^®C++ compiler is part of Intel System Studio.

Download detailSteps.txt

Before getting this Edison SDK setup ready, you may want to know more information about Edison;
* Edison Hardware information
* How to assemble Edison boards – Video
* How to flash the image
* How to customize Edison kernel

Zone des thèmes:

IDZone

↧

Intel® System Studio - Solutions, Tips and Tricks

April 1, 2015, 9:59 am

Latest and popular articles on Intel Technologies

≫ Next: Controlling Philip Hue Lights using Intel Edison

≪ Previous: Use Edison SDK + chroot to build up performance application

Intel^® System Studio 2015
- Release Notes
- Support Matrix
- Installation Guides
- What's New?
  - General
  - Windows* Target Specific
- IDE Integration
- Specific Components and Platforms - Introductory Guides
- Intel^® System Debugger
- Analysis
- New Platform Support and Intel^® C++ Compiler Optimizations
- Intel^® Energy Profiler
- Intel^® Math Kernel Library (Intel^® MKL)
- GNU GDB* 7.7 Update 1
  - New Features and Highlights of the 7.7 Version
- Android* Development

Linux* and WindRiver Linux* Target Support - Intel System Studio Component Guides

Android* Target Support - Intel System Studio Component by Component Guides

Intel^® Atom™ Processor E3xxx, Z3xxx (code-named Baytrail) - Overview of Intel System Studio support
- Developing Intel Atom Processor E3xxx, Z3xxx applications - overview of Intel System Studio components
- Intel System Studio Intel Atom Processor Support - general overview of Intel System Studio support for all Intel Atom processors

Intel^® Quark SoC - Overview of Intel System Studio Support

Windows* Host - Linux* or Android* Target

Using Intel System Studio for Tizen* In-Vehicle Infotainment
- Case Study / Guide
- Using Intel Compiler for Tizen

Build and Design for Performance

Power and Performance Tuning

Debug, Verification, and Reliability

Additional Resources

Intel® Advanced Vector Extensions

Extensions Intel® Streaming SIMD

Rubriques de compilateurs

Recherche d’erreurs

Pour commencer

Amélioration des performances

Bibliothèques

Erreurs de mémoire

Développement multithread

Analyse de sécurité statique

Erreurs de thread

ISS-Learn

Learning Lab

URL:

Intel® System Studio Training Material

Zone des thèmes:

IDZone

↧

Controlling Philip Hue Lights using Intel Edison

April 2, 2015, 3:55 pm

Latest and popular articles on Intel Technologies

≫ Next: Процедурный рендеринг разреженного пространства

≪ Previous: Intel® System Studio - Solutions, Tips and Tricks

Have you ever wanted to control your light bulbs? Ever forget to turn off your lights before a long trip? Well Philip's has developed this smart light bulbs that allow you to control the lights via wifi called Philip Hue Lights. Philip's has a great smartphone app that let's you control the lights (setting timers, moods, etc.), but I wanted to try controlling the lights via my Intel Edison board via NodeJS.

Here is a link to the project I created on Hackster.io detailing the setups I took: http://www.hackster.io/ashishdatta/philip-hue-lights-intel-edison

Intel Edison Board

Image de l’icône:

Zone des thèmes:

IOT Zone

Inclure dans RSS: