Arm nn tensorflow

Arm nn tensorflow DEFAULT

https://blog.tensorflow.org/2021/02/accelerated-inference-on-arm-microcontrollers-with-tensorflow-lite.html

https://1.bp.blogspot.com/--hE_UFwKPCs/YCG_W47amDI/AAAAAAAAD9c/FarmUarBfyMu1W8N71S4XIqLS_9Geqb8gCLcBGAsYHQ/s0/arm_logo.jpeg

February 10, 2021 — A guest post by Fredrik Knutsson of Arm
The MCU universe Microcontrollers (MCUs) are the tiny computers that power our technological environment. There are over 30 billion of them manufactured every year, embedded in everything from household appliances to fitness trackers. If you’re in a house right now, there are dozens of microcontrollers all around you. If you drive a car, there are dozens rid…

Accelerated inference on Arm microcontrollers with TensorFlow Lite for Microcontrollers and CMSIS-NN

A guest post by Fredrik Knutsson of Arm


arm logo

The MCU universe

Microcontrollers (MCUs) are the tiny computers that power our technological environment. There are over 30 billion of them manufactured every year, embedded in everything from household appliances to fitness trackers. If you’re in a house right now, there are dozens of microcontrollers all around you. If you drive a car, there are dozens riding with you on every drive. Using TensorFlow Lite for Microcontrollers (TFLM), developers can deploy TensorFlow models to many of these devices, enabling entirely new forms of on-device intelligence.

While ubiquitous, microcontrollers are designed to be inexpensive and energy efficient, which means they have small amounts of memory and limited processing power. A typical microcontroller might have a few hundred kilobytes of RAM, and a 32-bit processor running at less than 100 MHz. With advances in machine learning enabled by TFLM, it has become possible to run neural networks on these devices.

microcontroller

With minimal computational resources, it is important that microcontroller programs are optimized to run as efficiently as possible. This means making the most of the features of their microprocessor hardware, which requires carefully tuned application code.

Many of the microcontrollers used in popular products are built around Arm’s Cortex-M based processors, which are the industry leader in 32-bit microcontrollers, with more than 47 billion shipped. Arm’s open source CMSIS-NN library provides optimized implementations of common neural network functions that maximize performance on Cortex-M processors. This includes making use of DSP and M-Profile Vector Extension (MVE) instructions for hardware acceleration of operations such as matrix multiplication.

Benchmarks for key use cases

Arm’s engineers have worked closely with the TensorFlow team to develop optimized versions of the TensorFlow Lite kernels that use CMSIS-NN to deliver blazing fast performance on Arm Cortex-M cores. Developers using TensorFlow Lite can use these optimized kernels with no additional work, just by using the latest version of the library. Arm has made these optimizations in open source, and they are free and easy for developers to use today!

The following benchmarks show the performance uplift when using CMSIS-NN optimized kernels versus reference kernels for several key use cases featured in the TFLM example applications. The tests have been performed on an Arm Cortex-M4 based FPGA platform:

Table showing performance uplift when using CMSIS-NN kernels

The Arm Cortex-M4 processor supports DSP extensions, that enables the processor to execute DSP-like instructions for faster inference. To improve the inference performance even further, the new Arm Cortex-M55 processor supports MVE, also known as Helium technology.

Improving performance with CMSIS-NN

So far, the following optimized CMSIS-NN kernels have been integrated with TFLM:

 Table showing CMSIS-NN kernels integrated with TFLM

There will be regular updates to the CMSIS-NN library to expand the support of optimized kernels, where the key driver for improving support is that it should give a significant performance increase for a given use case. For discussion regarding kernel optimizations, a good starting point is to raise a ticket on the TensorFlow or CMSIS Github repository describing your use case.

Most of the optimizations are implemented specifically for 8-bit quantized (int8) operations, and this will be the focus of future improvements.

It’s easy to try the optimized kernels yourself by following the instructions that accompany the examples. For example, to build the person detection example for the SparkFun Edge with CMSIS-NN kernels, you can use the following command:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=sparkfun_edge OPTIMIZED_KERNEL_DIR=cmsis_nn person_detection_int8_bin

The latest version of the TensorFlow Lite Arduino library includes the CMSIS-NN optimizations, and includes all of the example applications, which are compatible with the Cortex-M4 based Arduino Nano 33 BLE Sense.

Next leap in neural processing

Looking ahead into 2021 we can expect a dramatic increase in neural processing from the introduction of devices including a microNPU (Neural Processing Unit) working alongside a microcontroller. These microNPUs are designed to accelerate ML inference within the constraints of embedded and IoT devices, with devices using the Arm Cortex-M55 MCU coupled with the new Ethos-U55 microNPU delivering up to a 480x performance increase compared to previous microcontrollers.

This unprecedented level of ML processing capability within smaller, power constrained devices will unlock a huge amount of innovation across a range of applications, from smart homes and cities to industrial, retail, and healthcare. The potential for innovation within each of these different areas is huge, with hundreds of sub segments and thousands of potential applications that will make a real difference to people’s lives.

Sours: https://blog.tensorflow.org/2021/02/accelerated-inference-on-arm-microcontrollers-with-tensorflow-lite.html

Introduction

Arm NN is a key component of the machine learning platform, which is part of the Linaro Machine Intelligence Initiative.

The Arm NN SDK is a set of open-source software and tools that enables machine learning workloads on power-efficient devices. It provides a bridge between existing neural network frameworks and power-efficient Cortex-A CPUs, Arm Mali GPUs and Arm Ethos NPUs.

Arm NN SDK utilizes the Compute Library to target programmable cores, such as Cortex-A CPUs and Mali GPUs, as efficiently as possible. To target Ethos NPUs the NPU-Driver is utilized. We also welcome new contributors to provide their own driver and backend. Note, Arm NN does not provide support for Cortex-M CPUs.

The latest release supports models created with TensorFlow Lite (TfLite) and ONNX. Arm NN analysis a given model and replaces the operations within it with implementations particularly designed for the hardware you want to execute it on. This results in a great boost of execution speed. How much faster your neural network can be executed depends on the operations it contains and the available hardware. Below you can see the speedup we've been experiencing in our experiments with a few common networks.

Arm NN is written using portable C++14 and the build system uses CMake, therefore it is possible to build for a wide variety of target platforms, from a wide variety of host environments.

Getting started: Software tools overview

Depending on what kind of framework (Tensorflow Lite, ONNX) you've been using to create your model there are multiple software tools available within Arm NN that can serve your needs.

Generally, there is a parser available for each supported framework. Each parser allows you to run models from one framework e.g. the TfLite-Parser lets you run TfLite models. You can integrate these parsers into your own application to load, optimize and execute your model. We also provide python bindings for our parsers and the Arm NN core. We call the result PyArmNN. Therefore your application can be conveniently written in either C++ using the "original" Arm NN library or in Python using PyArmNN. You can find tutorials on how to setup and use our parsers in our doxygen documentation. The latest version can be found in the wiki section of this repository.

Admittedly, building Arm NN and its parsers from source is not always easy to accomplish. We are trying to increase our usability by providing Arm NN as a Debian package. Our debian package is the most easy way to install the Arm NN Core, the TfLite Parser and PyArmNN (More support is about to come): Installation via Apt Repository

The newest member in Arm NNs software toolkit is the TfLite Delegate. The delegate can be integrated in TfLite. TfLite will then delegate operations, that can be accelerated with Arm NN, to Arm NN. Every other operation will still be executed with the usual TfLite runtime. This is our recommended way to accelerate TfLite models. As with our parsers there are tutorials in our doxygen documentation that can be found in the wiki section.

If you would like to use Arm NN on Android you can follow this guide which explains how to build Arm NN using the AndroidNDK. But you might also want to take a look at another repository which implements a hardware abstraction layer (HAL) for Android. The repository is called Android-NN-Driver and when integrated into Android it will automatically run neural networks with Arm NN.

Where to find more information

The section above introduces the most important tools that Arm NN provides. You can find a complete list in our doxygen documentation. The latest version can be found in the wiki section of our github repository.

For FAQs and troubleshooting advice, see FAQ.md or take a look at previous github issues.

Note

  1. The following tools have been removed in 21.05:

    • TensorFlow Parser
    • Caffe Parser
    • Quantizer
  2. Ubuntu Linux 16.04 LTS is no longer supported from April 30, 2021. As a result Ubuntu 16.04 LTS will no longer receive security patches or other software updates. Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.

How to get involved

If you would like to get involved but don't know where to start, a good place to look is in our Github Issues.

Feature requests without a volunteer to implement them are closed, but have the 'Help wanted' label, these can be found here. Once you find a suitable Issue, feel free to re-open it and add a comment, so that other people know you are working on it and can help.

When the feature is implemented the 'Help wanted' label will be removed.

Contributions

The Arm NN project welcomes contributions. For more details on contributing to Arm NN see the Contributing page on the MLPlatform.org website, or see the Contributor Guide.

Particularly if you'd like to implement your own backend next to our CPU, GPU and NPU backends there are guides for backend development: Backend development guide, Dynamic backend development guide

Disclaimer

The armnn/tests directory contains tests used during Arm NN development. Many of them depend on third-party IP, model protobufs and image files not distributed with Arm NN. The dependencies of some of the tests are available freely on the Internet, for those who wish to experiment, but they won't run out of the box.

License

Arm NN is provided under the MIT license. See LICENSE for more information. Contributions to this project are accepted under the same license.

Individual files contain the following tag instead of the full license text.

This enables machine processing of license information based on the SPDX License Identifiers that are available here: http://spdx.org/licenses/

Third-party

Third party tools used by Arm NN:

Sours: https://github.com/ARM-software/armnn
  1. New mexico solutions coors
  2. Minecraft samurai armor mod
  3. Jan 5 zodiac
  4. Crossbow explosive bolts warzone

Arm NN

About Arm NN SDK

Arm NN SDK is a set of open-source Linux software and tools that enables machine learning workloads on power-efficient devices. It provides a bridge between existing neural network frameworks and power-efficient Cortex-A CPUs, Arm Mali GPUs and Arm Ethos NPUs.

Arm NN SDK utilizes the Compute Library to target programmable cores, such as Cortex-A CPUs and Mali GPUs, as efficiently as possible. Arm NN does not provide support for Cortex-M CPUs.

The latest release supports Caffe, TensorFlow, TensorFlow Lite, and ONNX. Arm NN takes networks from these frameworks, translates them to the internal Arm NN format and then, through the Compute Library, deploys them efficiently on Cortex-A CPUs, and, if present, Mali GPUs such as the Mali-G71 and Mali-G72.

In September 2018, Arm donated Arm NN to the Linaro Machine Intelligence Initiative, where it is now developed fully in open source. To find out more, visit mlplatform.org.

 

 

Sours: https://developer.arm.com/ip-products/processors/machine-learning/arm-nn

Software Developer Kit

Automotive

Autonomous driving is the next frontier for car manufacturers.

Gaming

Advanced technology designed to deliver the latest gaming and graphics features to mobile devices.

Laptops

Always-on, always connected laptops provide richer, more productive experiences.

Healthcare

Improve healthcare with proactive, and advanced treatment solutions.

Industrial

Industrial and operational practices become increasing efficient with connected IoT devices.

Infrastructure

IoT, cloud and 5G are driving the transformation from datacenter to devices.

Mobile Computing

Scalable solutions for a broad range of mobile devices power our connected digital lives on the go.

Smart Cities

Transform cities to be more responsive to events and changes.

Smart Homes

The power of home automation through always-on IoT devices.

Storage

Power to meet the growing needs of HDD & SSD storage applications.

Wearables

Secure, flexible processing for wearable electronics with small silicon footprint.

Sours: https://www.arm.com/products/silicon-ip-cpu/ethos/arm-nn

Nn tensorflow arm

3.15.5. Arm NN and Arm Compute Library¶

3.15.5.1. Introduction¶

Arm NN and Arm Compute Library, as a set of machine learning software, tools and libraries, enable Machine Learning on Arm.

For Sitara devices without accelerators like C66x or EVE (AM3/4/6), one can use Arm provided libraries created for supporting inference only tasks on Arm CPUs. Arm NN and Arm Compute Library can also be used on AM57xx devices as well, as complementary approach to TIDL-API.

3.15.5.2. Supported versions¶

  • ARMNN 19.08
  • ARM Compute Library 19.08

3.15.5.3. Arm Compute Library¶

Arm Compute Library is a software library for computer vision and machine learning, optimized for NEON SIMD architecture (Mali GPU OpenCL is not applicable to TI devices). Exact list of functions can be found at https://developer.arm.com/technologies/compute-library. Arm Compute Libraries and sample executables are included in PLSDK filesystem. AM3/4/5/6 devices can utilize this library to unleash full potential of Arm CPUs.

Sample NN related executables (using Arm Compute Library only):

/usr/bin/graph2tree /usr/bin/graph_alexnet /usr/bin/graph_googlenet /usr/bin/graph_inception_v3 /usr/bin/graph_inception_v4 /usr/bin/graph_lenet /usr/bin/graph_mobilenet /usr/bin/graph_mobilenet_qasymm8 /usr/bin/graph_resnet50 /usr/bin/graph_resnext50 /usr/bin/graph_squeezenet /usr/bin/graph_squeezenet_v1_1 /usr/bin/graph_vgg16 /usr/bin/graph_vgg19

3.15.5.4. Arm NN¶

Arm NN is library built on top of Arm Compute Library leveraging its NEON optimized kernels. Importing of Caffe, ONNX, TensorFlow, and TensorFlow Lite inference models is significantly simplified. Library and executables are part of AM3/4/5/6 target filesystem. More information can be found at: https://developer.arm.com/products/processors/machine-learning/arm-nn

Sample Arm NN executables using Caffe models:

/usr/bin/CaffeAlexNet-Armnn /usr/bin/CaffeCifar10AcrossChannels-Armnn /usr/bin/CaffeInception_BN-Armnn /usr/bin/CaffeMnist-Armnn /usr/bin/CaffeResNet-Armnn /usr/bin/CaffeVGG-Armnn /usr/bin/CaffeYolo-Armnn

Sample executables using ONNX models:

/usr/bin/OnnxMnist-Armnn /usr/bin/OnnxMobileNet-Armnn

Sample executables using TensorFlow models:

/usr/bin/TfCifar10-Armnn /usr/bin/TfInceptionV3-Armnn /usr/bin/TfMnist-Armnn /usr/bin/TfMobileNet-Armnn /usr/bin/TfResNext-Armnn

Sample executables using TensorFlow Lite models:

/usr/bin/TfLiteInceptionV3Quantized-Armnn /usr/bin/TfLiteInceptionV4Quantized-Armnn /usr/bin/TfLiteMnasNet-Armnn /usr/bin/TfLiteMobileNetQuantizedSoftmax-Armnn /usr/bin/TfLiteMobileNetSsd-Armnn /usr/bin/TfLiteMobilenetQuantized-Armnn /usr/bin/TfLiteMobilenetV2Quantized-Armnn /usr/bin/TfLiteResNetV2-50-Quantized-Armnn /usr/bin/TfLiteResNetV2-Armnn /usr/bin/TfLiteVGG16Quantized-Armnn

3.15.5.5. Arm NN MobileNet Demo¶

Upon boot, Matrix-GUI is started with multiple icons that can start many out of box demos. Under sub-menu “Machine Learning”, there are two icons to start the Arm NN demos:

  • Arm NN MobileNet Real Common Objects
  • Arm NN MobileNet Camera Input

These examples demonstrate Deep Learning Imagenet classification (1000 classes) with MobileNet model on Arm. One example uses pre-recorded real-world video clip and the other uses live camera input. The pre-recorded video clip (320x320 resolution) and live camera input (default 640x480 resolution) are scaled down and central-cropped in run-time (using OpenCV API) to 224x224. Result of this processing is standard Imagenet classification output (1D vector with 1000 elements).

Executable invoked from the Matrix-GUI demos above is /usr/bin/ArmnnExamples, and the TensorFlow Lite parser is used.

Options of ArmnnExamples are listed below:

-f [ --model-format ] arg caffe-binary, caffe-text, onnx-binary, onnx-text, tflite-binary, tensorflow-binary or tensorflow-text. E.g.: -f tensorflow-binary -m [ --model-path ] arg Model Name w/ full path, e.g.of supported model types: .caffemodel, .prototxt, .tflite, .onnx. E.g.: -m /usr/share/arm/armnn/models/m obilenet_v1_1.0_224_frozen.pb -c [ --compute ] arg The preferred order of devices to run layers on by default. Possible choices: CpuAcc, CpuRef, GpuAcc. E.g.: -c CpuAcc -i [ --input-name ] arg Identifier of the input tensor in the network. E.g.: -i input -s [ --input-tensor-shape ] arg The shape of the input tensor in the network as a flat array of integers separated by whitespace. This parameter is optional, depending on the network. E.g.: -s '1 224 224 3' -d [ --input-tensor-data ] arg Input test file name. It can be image/video clip file name or use 'camera_live_input' to select camera input. E.g.: -d /usr/share/arm/armnn/testvecs/camera_live_input -o [ --output-name ] arg Identifier of the output tensor in the network. E.g.: -o MobilenetV1/Predictions/Reshape_1 --number_frame arg (=1) Number of frames to process. E.g.: --number_frame 100.

Here is an example of classification using live camera input - stop at any time with mouse right-click on output image window.

ArmnnExamples -f tflite-binary -i input -s '1 224 224 3' -o MobilenetV2/Predictions/Reshape_1 -d /usr/share/arm/armnn/testvecs/camera_live_input -m /usr/share/arm/armnn/models/mobilenet_v2_1.0_224.tflite -c CpuAcc --number_frame 100

Here is an example of classification using video clip - stop at any time with mouse right-click on output image window.

ArmnnExamples -f tflite-binary -i input -s '1 224 224 3' -o MobilenetV2/Predictions/Reshape_1 -d /usr/share/arm/armnn/testvecs/test2.mp4 -m /usr/share/arm/armnn/models/mobilenet_v2_1.0_224.tflite -c CpuAcc --number_frame 100

Here is an example of classification using JPG image - use “–number_frame” to select the number of runs.

[email protected]:/usr/bin# ArmnnExamples -f tflite-binary -i input -s '1 224 224 3' -o MobilenetV2/Predictions/Reshape_1 -d /usr/share/arm/armnn/testvecs/baseball.jpg -m /usr/share/arm/armnn/models/mobilenet_v2_1.0_224.tflite -c CpuAcc --number_frame 10 ArmNN v20190800 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 7.57426 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.48181 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.46633 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.41803 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.3029 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.45797 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.45416 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.49093 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.33742 Top(1) prediction is 430:baseball with confidence: 69.5592% Top(2) prediction is 575:golf ball with confidence: 0.307349% Top(3) prediction is 474:can opener, tin opener with confidence: 0.248897% Top(4) prediction is 884:vase with confidence: 0.196634% Top(5) prediction is 130:spoonbill with confidence: 0.191194% Performance (FPS): 9.4193

Sours: https://software-dl.ti.com/processor-sdk-linux/esd/docs/06_03_00_106/linux/Foundational_Components/Machine_Learning/armnn.html
LVC21-113 TensorFlow Lite Delegates on Arm based Devices

Arm NN

Arm NN is an inference engine for CPUs, GPUs and NPUs. It bridges the gap between existing NN frameworks and the underlying IP. It enables efficient translation of existing neural network frameworks, such as TensorFlow and Caffe, allowing them to run efficiently, without modification, across Arm Cortex-A CPUs, GPUs (Arm Mali or any openCL 2.0) and Arm Ethos NPUs.

More information at https://developer.arm.com/ip-products/processors/machine-learning/arm-nn

Installation

Arm NN is packaged at science:machinelearning OBS project, and developed at https://github.com/ARM-software/armnn.

You can install ARM-NN from:

Current status

Current (2020-09-29) options enabled on aarch64 Arm NN:

* openCL support (GPU) has been tested on openSUSE Tumbleweed with HiKey960 board which includes a Mali Bifrost G71, with openCL 2.x support. According to [1] and [2], it requires a GPU with openCL 1.2 + cl_arm_non_uniform_work_group_size (for better performances openCL 2.0). Upstream tests are on Mali GPU.

Known issues

openCL backend

  • If you do not have /usr/lib64/libOpenCL.so file, you must link it to libOpenCL.so.1 file:
sudo ln -s /usr/lib64/libOpenCL.so.1 /usr/lib64/libOpenCL.so
  • If you get the following error:
An error occurred when preparing the network workloads: in create_kernel src/core/CL/CLKernelLibrary.cpp:1087: Non uniform workgroup size is not supported!!

you have an openCL 1.x without cl_arm_non_uniform_work_group_size support, which is mandatory for ArmNN openCL.

Tools

ExecuteNetwork

The ExecuteNetwork program, from Arm NN takes any model and any input tensor, and simply prints out the output tensor. Run it with no arguments to see command-line help.

ArmnnConverter

The ArmnnConverter program takes a model in any input format and produces a serialized model in Arm NN format (*.armnn). It allows to run this model without ad-hoc parser, just native Arm NN format. Run it with no arguments to see command-line help. Note that this program can only convert models for which all operations are supported by the serialization tool src/armnnSerializer.

ArmnnQuantizer

The ArmnnQuantizer program takes a 32-bit float network and converts it into a quantized asymmetric 8-bit or quantized symmetric 16-bit network. Static quantization is supported by default but dynamic quantization can be enabled if CSV file of raw input tensors is specified. Run it with no arguments to see command-line help.

Tests/Examples

On all tests, you may select the compute mode with -c CpuRef (standard C++, slow) or -c CpuAcc (NEON accelerated) or -c GpuAcc (GPU accelerated, requires openCL) option.

SimpleSample

Run SimpleSample and enter a number when prompted (here 458):

Please enter a number: 458 Your number was 458

Caffe backend

CaffeInception_BN-Armnn

CaffeInception_BN-Armnn example uses a Caffe model on top of ARM-NN for image classification. You need to get the data and the model, so please download:

Arm NN is not able to use this model as is and it should be converted:

  • batch size to be set to 1 (instead of 10)
  • Arm NN does not support all Caffe syntaxes, so some previous neural-network model files require updates to the latest Caffe syntax

So, you need to:

  • Copy deploy.prototxt to deploy_armnn.prototxt and update the file to set the batch size to 1:
--- models/deploy.prototxt 2019-10-01 13:25:13.502886667 +0000 +++ models/deploy_armnn.prototxt 2019-10-01 13:38:55.860972787 +0000 @@ -3,7 +3,7 @@ layer { name: "data" type: "Input" top: "data" - input_param { shape: { dim: 10 dim: 3 dim: 224 dim: 224 } } + input_param { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } } } layer {
  • and run the following convert.py script from the 'models/' folder (requires python3-caffe):
#!/usr/bin/python3 import caffe net = caffe.Net('deploy.prototxt', 'Inception21k.caffemodel', caffe.TEST) new_net = caffe.Net('deploy_armnn.prototxt', 'Inception21k.caffemodel', caffe.TEST) new_net.save('Inception-BN-batchsize1.caffemodel')

Now, you can run CaffeInception_BN-Armnn --data-dir=data --model-dir=models :

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 3694 with value: 0.255735 Top(2) prediction is 3197 with value: 0.0031263 Top(3) prediction is 1081 with value: 0.000757725 Top(4) prediction is 567 with value: 0.000526447 Top(5) prediction is 559 with value: 9.72124e-05 Total time for 1 test cases: 0.088 seconds Average time per test case: 88.260 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

CaffeResNet-Armnn

CaffeResNet-Armnn example uses a Caffe model on top of ARM-NN for image classification. You need to get the data and the model, so please download:

And run CaffeResNet-Armnn --data-dir=data --model-dir=models :

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 21 with value: 0.466987 Top(2) prediction is 7 with value: 0.000633067 Top(3) prediction is 1 with value: 2.17822e-06 Top(4) prediction is 0 with value: 6.27832e-08 = Prediction values for test #1 Top(1) prediction is 2 with value: 0.511024 Top(2) prediction is 0 with value: 2.7405e-07 Total time for 2 test cases: 0.205 seconds Average time per test case: 102.741 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

CaffeMnist-Armnn

CaffeMnist-Armnn example uses a Caffe model on top of ARM-NN for handwritten digits recognition. In this example, this is number 7.

You need to get the data and the model, so please install arm-ml-example:

As CaffeMnist-Armnn requires is slightly different naming, you need to rename the files:

cp -r /usr/share/armnn-mnist/* /tmp/ mv /tmp/data/t10k-labels-idx1-ubyte /tmp/data/t10k-labels.idx1-ubyte mv /tmp/data/t10k-images-idx3-ubyte /tmp/data/t10k-images.idx3-ubyte

And run CaffeMnist-Armnn --data-dir=/tmp/data/ --model-dir=/tmp/model/:

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 7 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #1 Top(1) prediction is 2 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #5 Top(1) prediction is 1 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #8 Top(1) prediction is 5 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #9 Top(1) prediction is 9 with value: 1 Top(2) prediction is 0 with value: 0 Total time for 5 test cases: 0.008 seconds Average time per test case: 1.569 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

MNIST Caffe example

MNIST Caffe example uses a Caffe model on top of ARM-NN for handwritten digits recognition. In this example, this is number 7.

You must install ARM ML examples and associated data from:

Go to the data folder:

cd /usr/share/armnn-mnist/

and run mnist_caffe:

Predicted: 7 Actual: 7

ONNX backend

OnnxMnist-Armnn

OnnxMnist-Armnn example uses an ONNX model on top of ARM-NN for handwritten digits recognition. In this example, this is number 7.

You need to get the data, so please install arm-ml-example:

And download the model from https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz

As OnnxMnist-Armnn requires is slightly different naming, you need to rename the files:

cp -r /usr/share/armnn-mnist/* /tmp/ mv /tmp/data/t10k-labels-idx1-ubyte /tmp/data/t10k-labels.idx1-ubyte mv /tmp/data/t10k-images-idx3-ubyte /tmp/data/t10k-images.idx3-ubyte

For the model:

wget https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz tar xzf mnist.tar.gz cp mnist/model.onnx /tmp/model/mnist_onnx.onnx

And run OnnxMnist-Armnn --data-dir=/tmp/data/ --model-dir=/tmp/model/ -i 1:

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 7 with value: 28.34 Top(2) prediction is 3 with value: 9.42895 Top(3) prediction is 2 with value: 8.64272 Top(4) prediction is 1 with value: 0.627583 Top(5) prediction is 0 with value: -1.25672 Total time for 1 test cases: 0.002 seconds Average time per test case: 2.278 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

OnnxMobileNet-Armnn

OnnxMobileNet-Armnn example uses an ONNX model on top of ARM-NN for image classification. In this example, it will look for shark, dog and cat.

You need to get the mobilenetv2 model for ONNX, so:

For the data, you need to download:

  • an image of a shark (great white shark), rename it shark.jpg and place it to data/ folder.
  • an image of a Cat (Tiger cat), rename it Cat.jpg and place it to data/ folder.
  • an image of a Dog (golden retriever), rename it Dog.jpg and place it to data/ folder.


And run OnnxMobileNet-Armnn --data-dir=data --model-dir=models -i 3:

ArmNN v20190800 Performance test running in DEBUG build - results may be inaccurate. = Prediction values for test #0 Top(1) prediction is 273 with value: 16.4625 Top(2) prediction is 227 with value: 13.9884 Top(3) prediction is 225 with value: 11.6609 Top(4) prediction is 168 with value: 11.3706 Top(5) prediction is 159 with value: 9.35255 = Prediction values for test #1 Top(1) prediction is 281 with value: 16.7145 Top(2) prediction is 272 with value: 5.43621 Top(3) prediction is 271 with value: 5.3766 Top(4) prediction is 51 with value: 5.24998 Top(5) prediction is 24 with value: 2.50436 = Prediction values for test #2 Top(1) prediction is 2 with value: 21.4471 Top(2) prediction is 0 with value: 4.55977 Total time for 3 test cases: 0.164 seconds Average time per test case: 54.651 ms Overall accuracy: 0.667 Runtime::UnloadNetwork(): Unloaded network with ID: 0

TensorFlow backend

TensorFlow examples may print some errors (depending on your images), since your Cat image maybe recognized as a 'Tabby Cat' (label 282) and not the expected 'Tiger Cat' (label 283), same for Dog and Shark, see https://github.com/ARM-software/armnn/issues/165#issuecomment-538299546

TfInceptionV3-Armnn

TfInception_BN-Armnn example uses a TensorFlow model on top of ARM-NN for image classification. You need to get the data and the model, so please download:

Now, you can run TfInceptionV3-Armnn --data-dir=data --model-dir=models :

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 208 with value: 0.918417 Top(2) prediction is 206 with value: 0.000891919 Top(3) prediction is 176 with value: 0.000658453 Top(4) prediction is 155 with value: 0.000206609 Top(5) prediction is 92 with value: 0.000192534 = Prediction values for test #1 Top(1) prediction is 283 with value: 0.544097 Top(2) prediction is 282 with value: 0.321364 Top(3) prediction is 198 with value: 0.000288878 Top(4) prediction is 179 with value: 0.000153869 Top(5) prediction is 146 with value: 0.000141289 = Prediction values for test #2 Top(1) prediction is 3 with value: 0.826077 Top(2) prediction is 0 with value: 0.000125644 Total time for 3 test cases: 0.365 seconds Average time per test case: 121.635 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

TfResNext-Armnn

TfResNext-Armnn example uses a TensorFlow model on top of ARM-NN for image classification. You need to get the data and the model, so please download:

And run TfResNext-Armnn --data-dir=data --model-dir=models :

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 209 with value: 0.856742 Top(2) prediction is 208 with value: 0.0588841 Top(3) prediction is 167 with value: 0.00553092 Top(4) prediction is 160 with value: 0.000479352 Top(5) prediction is 102 with value: 0.000265176 = Prediction values for test #1 Top(1) prediction is 283 with value: 0.344484 Top(2) prediction is 282 with value: 0.0748539 Top(3) prediction is 52 with value: 0.00447383 Top(4) prediction is 25 with value: 0.000883748 Top(5) prediction is 6 with value: 5.20586e-05 = Prediction values for test #2 Top(1) prediction is 3 with value: 0.588796 Top(2) prediction is 2 with value: 0.000818478 Top(3) prediction is 1 with value: 4.20274e-06 Top(4) prediction is 0 with value: 4.55538e-10 Total time for 3 test cases: 0.060 seconds Average time per test case: 19.954 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

TfMnist-Armnn

TfMnist-Armnn example uses a TensorFlow model on top of ARM-NN for handwritten digits recognition. In this example, this is number 7.

You need to get the data and the model, so please install arm-ml-example:

As TfMnist-Armnn requires is slightly different naming, you need to rename the files:

cp -r /usr/share/armnn-mnist/* /tmp/ mv /tmp/data/t10k-labels-idx1-ubyte /tmp/data/t10k-labels.idx1-ubyte mv /tmp/data/t10k-images-idx3-ubyte /tmp/data/t10k-images.idx3-ubyte

And run TfMnist-Armnn --data-dir=/tmp/data/ --model-dir=/tmp/model/:

ArmNN v20190800 = Prediction values for test #0 Top(1) prediction is 7 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #1 Top(1) prediction is 2 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #2 Top(1) prediction is 1 with value: 1 Top(2) prediction is 0 with value: 0 = Prediction values for test #3 Top(1) prediction is 0 with value: 1 = Prediction values for test #4 Top(1) prediction is 4 with value: 1 Top(2) prediction is 0 with value: 0 Total time for 5 test cases: 0.000 seconds Average time per test case: 0.045 ms Overall accuracy: 1.000 Runtime::UnloadNetwork(): Unloaded network with ID: 0

MNIST TensorFlow example

MNIST TensorFlow example uses a TensorFlow model on top of ARM-NN for handwritten digits recognition. In this example, this is number 7.

You must install ARM ML examples (and associated data) from:

Go to the data folder:

cd /usr/share/armnn-mnist/

and run mnist_tf:

Predicted: 7 Actual: 7

mnist-draw - Web app

MNIST Draw is single page website that enables users to hand-draw and classify digits between 0 and 9 using machine learning. A machine learning model trained against the MNIST dataset is used for classification.

The project is a modified version of mnist-draw, which uses the Arm NN SDK to perform inferences on an Arm Cortex-A CPU. The application runs on any ARM system and can be accessed over a network using a browser.

There is no RPM package yet, so you need to build it and run it manually.

Install dependencies:

zypper in armnn-devel python3-Pillow python3-numpy gcc gcc-c++ make

Compile from sources:

cd /tmp git clone https://github.com/ARM-software/Tool-Solutions/ cd Tool-Solutions/ml-tool-examples/mnist-draw make -C armnn-draw chmod a+w . -R # Fix permissions

If you want to use GpuAcc instead of CpuAcc, you can update cgi-bin/mnist.py by replacing mnist_tf_convol argument from:

completed = subprocess.run(['./armnn-draw/mnist_tf_convol', '1', '1', 'image.txt'], stderr=subprocess.PIPE, check=True)

to:

completed = subprocess.run(['./armnn-draw/mnist_tf_convol', '2', '1', 'image.txt'], stderr=subprocess.PIPE, check=True)

Run it:

python3 -m http.server --cgi 8080

And access it from your web browser, e.g: http://192.168.0.4:8080 if your board has IP 192.168.0.4.

MNIST-Draw-Web-Interface-Image.gif

TensorFlow Lite backend

TensorFlow Lite examples may print some errors (depending on your images), since your Cat image maybe recognized as a 'Tabby Cat' (label 282) and not the expected 'Tiger Cat' (label 283), see https://github.com/ARM-software/armnn/issues/165#issuecomment-538299546

To run TfLite*-Armnn examples, you need to download the models and extract them to models/ folder:

# Only the *.tflite files are needed, but more files are in the archives wget http://download.tensorflow.org/models/tflite/mnasnet_1.3_224_09_07_2018.tgz tar xzf mnasnet_*.tgz mv mnasnet_*/ models pushd models wget http://download.tensorflow.org/models/tflite_11_05_08/inception_v3_quant.tgz tar xzf inception_v3_quant.tgz wget http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz tar xzf mobilenet_v1_1.0_224_quant.tgz wget http://download.tensorflow.org/models/tflite_11_05_08/mobilenet_v2_1.0_224_quant.tgz tar xzf mobilenet_v2_1.0_224_quant.tgz popd

You may also get the labels from the MobileNet V1 archive: https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_1.0_224_quant_and_labels.zip

For the data, you need to download:

  • an image of a shark (great white shark), rename it shark.jpg and place it to data/ folder.
  • an image of a Cat (Tiger cat), rename it Cat.jpg and place it to data/ folder.
  • an image of a Dog (golden retriever), rename it Dog.jpg and place it to data/ folder.

TfLiteInceptionV3Quantized-Armnn example

Once you have the models/ and data/ folders ready, you can run TfLiteInceptionV3Quantized-Armnn --data-dir=data --model-dir=models

TfLiteMnasNet-Armnn example

Once you have the models/ and data/ folders ready, you can run TfLiteMnasNet-Armnn --data-dir=data --model-dir=models

TfLiteMobilenetQuantized-Armnn example

Once you have the models/ and data/ folders ready, you can run TfLiteMobilenetQuantized-Armnn --data-dir=data --model-dir=models

TfLiteMobilenetV2Quantized-Armnn example

Once you have the models/ and data/ folders ready, you can run TfLiteMobilenetV2Quantized-Armnn --data-dir=data --model-dir=models

Additionnal (downstream) tests

Additional downstream tests are packaged separately, in armnn-extratests package.

TI ArmnnExamples

TI provides an additional test ArmnnExamples which allow to run models from all supported backends (Caffe, TensorFlow, TensorFlowLite, ONNX) on images, video (filter on .mp4, .mov and .avi, but you can change extension to workaround the filter and use .ogv, and others), and live stream from a webcam. More information on http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_ArmNN.html#arm-nn-mobilenet-demo

You need to download a number of files (tests files from tidl-api git repo, mobilenet model, and mobilenet labels):

git clone git://git.ti.com/tidl/tidl-api.git # Can be skipped if you want to use your own test files wget http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224.tgz tar xzf *.tgz wget https://raw.githubusercontent.com/leferrad/tensorflow-mobilenet/master/imagenet/labels.txt sudo mkdir -p /usr/share/arm/armnn/models/ sudo cp labels.txt /usr/share/arm/armnn/models/ sudo chmod 666 /usr/share/arm/armnn/models/labels.txt # TO BE FIXED

Test with the baseball.jpg image from tidl-api/ :

ArmnnExamples -f tensorflow-binary -i input -s '1 224 224 3' -o MobilenetV1/Predictions/Reshape_1 -d ./tidl-api/examples/classification/images/baseball.jpg -m ./mobilenet_v1_1.0_224_frozen.pb -c CpuAcc --number_frame 10

Test with the test2.mp4 video clip from tidl-api/, it displays the video, with top match and FPS (requires h.264 decoder to be installed):

ArmnnExamples -f tensorflow-binary -i input -s '1 224 224 3' -o MobilenetV1/Predictions/Reshape_1 -d ./tidl-api/examples/classification/clips/test2.mp4 -m ./mobilenet_v1_1.0_224_frozen.pb -c CpuAcc --number_frame 100

You may also use warplane ogv file (just change the extension to .mp4)

Test with a live stream from a camera, it displays the video, with top match and FPS (camera_live_input0 is for /dev/video0, camera_live_input1 is for /dev/video1, etc.):

ArmnnExamples -f tensorflow-binary -i input -s '1 224 224 3' -o MobilenetV1/Predictions/Reshape_1 -d camera_live_input0 -m ./mobilenet_v1_1.0_224_frozen.pb -c CpuAcc --number_frame 100
Sours: https://en.opensuse.org/Arm_NN

You will also be interested:

Introduction

This user manual describes the CMSIS NN software library, a collection of efficient neural network kernels developed to maximize the performance and minimize the memory footprint of neural networks on Cortex-M processor cores.

The library is divided into a number of functions each covering a specific category:

  • Convolution Functions
  • Activation Functions
  • Fully-connected Layer Functions
  • SVDF Layer Functions
  • Pooling Functions
  • Softmax Functions
  • Basic math Functions

The library has separate functions for operating on different weight and activation data types including 8-bit integers (q7_t) and 16-bit integers (q15_t). The descrition of the kernels are included in the function description. The implementation details are also described in this paper [1].

Function Classification

The functions can be classified into two segments

  • Legacy functions supporting ARM's internal symmetric quantization(8 bits).
  • Functions that support TensorFlow Lite framework with symmetric quantization(8 bits).

The legacy functions can be identified with their suffix of _q7 or _q15 and are no new development is done there. The article in [2] describes in detail how to run a network using the legacy functions.

The functions supporting TensorFlow Lite framework is identified by the _s8 suffix and can be invoked from TFL micro. The functions are bit exact to TensorFlow Lite. Refer to the TensorFlow's documentation in [3] on how to run a TensorFlow Lite model using optimized CMSIS-NN kernels.

Block Diagram

CMSIS-NN-OVERVIEW.PNG

Examples

The library ships with a number of examples which demonstrate how to use the library functions.

Pre-processor Macros

Each library project have different pre-processor macros.

Define macro ARM_MATH_DSP, If the silicon supports DSP instructions(DSP extension).

Define macro ARM_MATH_MVEI, If the silicon supports M-Profile Vector Extension.

  • ARM_MATH_AUTOVECTORIZE Used in conjucture with ARM_MATH_MVEI to let the compiler auto vectorize for the functions that uses inline assembly. It does not affect functions that use C or intrinsics.
  • ARM_MATH_BIG_ENDIAN:

Define macro ARM_MATH_BIG_ENDIAN to build the library for big endian targets. This is supported only for the legacy functions i.e, functions targetted at TensorFlow Lite do not support big endianness. By default library builds for little endian targets.

Define macro ARM_NN_TRUNCATE to use floor instead of round-to-the-nearest-int for the computation.

Copyright Notice

Copyright (C) 2010-2019 Arm Limited. All rights reserved.

[1] CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs https://arxiv.org/abs/1801.06601

[2] Converting a Neural Network for Arm Cortex-M with CMSIS-NN

https://developer.arm.com/solutions/machine-learning-on-arm/developer-material/how-to-guides/converting-a-neural-network-for-arm-cortex-m-with-cmsis-nn/single-page [3] https://www.tensorflow.org/lite/microcontrollers/library

[4] https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/NN#legacy-vs-tfl-micro-compliant-apis

Sours: https://www.keil.com/pack/doc/CMSIS/NN/html/index.html


8229 8230 8231 8232 8233