深度学习

将 TensorFlow Lite 模型转换为 ONNX

ONNX 旨在将深度学习框架联系在一起。TF2ONNX 将 TensorFlow 模型转换为 ONNX ，这样其他深度学习系统可以从 TensorFlow 的功能中受益。但 TF2ONNX 当前不支持量化。并且有些模型仅以 TensorFlow Lite 格式发布。本文介绍了 TFLite2ONNX ，它能将量化的 TFLite 模型转换为 ONNX 。
Convert TensorFlow Lite Models to ONNX

ONNX aims to bridge deep learning frameworks together. TF2ONNX was built to translate TensorFlow models to ONNX, therefore other deep learning systems can benefit from TensorFlow functionality. However, TF2ONNX currently doesn't support quantization. This article introduces TFLite2ONNX which converts TensorFlow Lite models to ONNX with quantization semantic translation enabled.
Introducing TFLite Parser Python Package

I have being maintaining the TFLite parser python package since September 2019. With which people can parse TensorFlow Lite models with one single import tflite.
Boost Quantization Inference Performance

This artical summaries techniques utilized in convolution optimization for TVM. With AutoTVM empowered auto-tuning, carefully designed schedule algothrims and fine-grain computation manipulation may achieve impressive optimization results.
深度学习系统杂谈

两年前，当我参与到深度学习软件系统的研发和优化时，一个观念逐渐在我脑海中形成——相关的软硬件系统会逐步收敛成标准，或一两个巨无霸。两年过去了，深度学习生态有了很大的变化。这篇文章主要分析当下深度学习系统的现状，对一些热门话题做一些探讨，简单谈一下对未来趋势的理解。
QNNPACK 实现揭秘

QNNPACK 是 Facebook 开发的专门用于量化神经网络计算的加速库，其卓越的性能表现一经开源就击败了几乎全部已公开的加速算法。
卷积神经网络优化算法

随着技术的发展，研究人员提出了多种卷积优化算法，包括 Im2col、Winograd 等等。本文简要介绍几种常见的优化方法，并讨论作者在该领域的一些经验。
Neural Network Pruning Papers

List some papers related to Neural Networks Pruning that I have read recently.
通用矩阵乘（GEMM）优化算法

本文简要介绍通用矩阵乘（General Matrix Multiplication）优化的基本概念和方法、神经网络量化中的优化方法。旨在帮助大家在概念中建立一些直觉，无甚高论。
神经网络量化简介

介绍神经网络量化相关理论、算术、数学、研究和实现。
Neural Network Quantization Resources

List resources on neural network quantization here. Quantization are moving from research to industry (I mean real applications) nowdays (as in the begining of 2019). Hoping that this list may help.
Neural Network Quantization Introduction

Brings Neural Network Quantization related theory, arithmetic, mathmetic, research and implementation to you, in an introduction approach.
Android Neural Networks API —— 一种神经网络软件系统中间层的设计与实现

随着深度学习的进一步发展，用于减轻框架层和硬件厂商开发代价的「中间层表示」以各种形式涌现：包括采用编译技术做图优化的、制定模型文件格式的、操作系统中间层的。本文重点介绍了操作系统中间层的代表 Android Neural Networks API 的软件架构、内部模块交互方式，并讨论了其设计。总体而言，Android Neural Networks API 简洁有效，符合软件系统的设计方法学。
机器学习框架中的内存同步机制

现代机器学习框架支持在多种硬件设备上执行运行，而不同设备间的内存往往是独立管理的，即在某个时刻数据可能存储在不同的设备内存中。那么当网络的不同部分在不同的硬件设备上运行时，各个设备不一定总是拥有最新的数据，因而存在设备间同步数据的问题。
笔记：Machine Learning by Andrew Ng

只记录「以前」不知道或没注意或忘掉的信息。