Swish activation function vs relu

2/20/2024

Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V (2019) Searching for mobilenetv3. He Y, Zhang X, Sun J (2017b) Channel pruning for accelerating very deep neural networks. He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. He K, Zhang X, Ren S, Sun J (2015b) Spatial pyramid pooling in deep convolutional networks for visual recognition. He K, Zhang X, Ren S, Sun J (2015a) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. He K, Gkioxari G, Dollár P, Girshick R (2017a) Mask r-cnn. Han S, Mao H, Dally W (2015) Deep compression: compressing dnns with pruning, trained quantization and huffman coding. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Gidaris S, Komodakis N (2016) Locnet: improving localization accuracy for object detection. arXiv: 1511.07289ĭeng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. arXiv: 190701845Ĭlevert D, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). In: CVPR, pp 1610–02357Ĭhu X, Zhang B, Xu R, Li J (2019) Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. In: ICLRĬhollet F (2017) Xception: deep learning with depthwise separable convolutions. arXiv: 190809791Ĭai H, Zhu L, Han S (2019b) Proxylessnas: direct neural architecture search on target task and hardware. In: ICML, pp 549–558Ĭai H, Gan C, Han S (2019a) Once for all: train one network and specialize it for efficient deployment. Neurocomputingīender G, Kindermans PJ, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. We further demonstrate that enriching activation has a good ability for transfer learning, and measure the performance on MSCOCO object detection.īasha SHS, Dubey SR, Pulabaigari V, Mukherjee S (2019) Impact of fully connected layers on performance of convolutional neural networks for image classification. Compared to existing activation schemes adopted by these lightweight networks, we demonstrate performance improvements on CIFAR-10 and ImageNet datasets. We verify this enriching activation scheme on popular lightweight networks. (H)-SwishX learns significant maximal value in each layer of network to reduce the accuracy reduction during the lightweight network quantization. We also propose a novel activation function called (H)-SwishX for enriching activation, which adds a learnable maximal value to (H)-Swish.

Enriching activation is achieved by utilizing activation functions with negative value in the position where ReLU causes information loss. We call this method enriching activation. We propose a method to minimize the changes to the existing network, we only need to replace ReLU with Swish in the appropriate position of lightweight network. In this paper, we study the information loss caused by activation function in lightweight network, and discusses how to use activation function with negative value to solve this issue. Applying activation function in network appropriately can improve accuracy and speed up converging. Activation function plays an important role in neural network.

0 Comments

discovery guide

Swish activation function vs relu

Leave a Reply.

Author

Archives

Categories