Subscribe

Subscribe to our Newsletter and get informed about new publication regulary and special discounts for subscribers!

BMSA > Volume 16 > Entropies of Overcomplete Kernel Dictionaries
< Back to Volume

Entropies of Overcomplete Kernel Dictionaries

Full Text PDF

Abstract:

In signal analysis and synthesis, linear approximation theory considers a linear decomposition of any given signal in a set of atoms, collected into a so-called dictionary. Relevant sparse representations are obtained by relaxing the orthogonality condition of the atoms, yielding overcomplete dictionaries with an extended number of atoms. More generally than the linear decomposition, overcomplete kernel dictionaries provide an elegant nonlinear extension by defining the atoms through a mapping kernel function (e.g., the gaussian kernel). Models based on such kernel dictionaries are used in neural networks, gaussian processes and online learning with kernels. The quality of an overcomplete dictionary is evaluated with a diversity measure the distance, the approximation, the coherence and the Babel measures. In this paper, we develop a framework to examine overcomplete kernel dictionaries with the entropy from information theory. Indeed, a higher value of the entropy is associated to a further uniform spread of the atoms over the space. For each of the aforementioned diversity measures, we derive lower bounds on the entropy. Several definitions of the entropy are examined, wth an extensive analysis in both the input space and the mapped feature space.

Info:

Periodical:
Bulletin of Mathematical Sciences and Applications (Volume 16)
Pages:
1-19
Citation:
P. Honeine "Entropies of Overcomplete Kernel Dictionaries", Bulletin of Mathematical Sciences and Applications, Vol. 16, pp. 1-19, 2016
Online since:
August 2016
Authors:
Export:
Distribution:
References:

[1] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. Signal Processing, IEEE Transactions on, 54(11): 4311- 4322, (2006).

[2] G. Baudat and F. Anouar. Kernel-based methods and function approximation. In In International Joint Conference on Neural Networks (IJCNN), volume 5, pages 1244-1249, Washington, DC, USA, July (2001).

[3] Guang bin Huang, P. Saratch, Senior Member, and Narashiman Sundararajan. A generalized growing and pruning rbf (ggap-rbf) neural network for function approximation. IEEE Transactions on Neural Networks, 16: 57-67, (2005).

[4] Jon Cartwright. Roll over, boltzmann. Physics World, May (2014).

[5] Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley Series in Telecommunications and Signal Processing. Wiley-Interscience, 2nd edition edition, (2006).

[6] Lehel Csat´o and Manfred Opper. Sparse representation for gaussian process models. In Advances in Neural Information Processing Systems 13, pages 444-450. MIT Press, (2001).

[7] Lehel Csat´o and Manfred Opper. Sparse online gaussian processes. Neural Computation, 14: 641-668, (2002).

[8] F. Cucker and S. Smale. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39: 1-49, (2002).

[9] D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proceedings - National Academy of Sciences (PNAS), 100(5): 2197- 2202, March (2003).

[10] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Information Theory, 47(7): 2845-2862, March (2001).

[11] M. Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, (2010).

[12] K. Engan, S.O. Aase, and J. Hakon Husoy. Method of optimal directions for frame design. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, volume 5, pages 2443-2446 vol. 5, (1999).

[13] Y. Engel, S. Mannor, and R. Meir. The kernel recursive least squares algorithm. IEEE Trans. Signal Processing, 52(8): 2275-2285, (2004).

[14] Haijin Fan, Qing Song, and Sumit Bam Shrestha. Online learning with kernel regularized least mean square algorithms. Knowledge-Based Systems, 59: 21 - 32, (2014).

[15] A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss. Approximation of functions over redundant dictionaries using coherence. In Proc. 14-th annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 243-252, Philadelphia, PA, USA, 2003. Society for Industrial and Applied Mathematics.

[16] A. C. Gilbert, S. Muthukrishnan, M. J. Strauss, and J. Tropp. Improved sparse approximation over quasi-incoherent dictionaries. In International Conference on Image Processing (ICIP), volume 1, pages 37-40, Barcelona, Spain, Sept. (2003).

[17] M. Girolami. Orthogonal series density estimation and the kernel eigenvalue problem. Neural Computation, 14: 669-688, (2002).

[18] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, second edition edition, (2009).

[19] Paul Honeine. Online kernel principal component analysis: a reduced-order model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 1814-1826, September (2012).

[20] Paul Honeine. Analyzing sparse dictionaries for online learning with kernels. IEEE Transactions on Signal Processing, 63(23): 6343-6353, December (2015).

[21] Paul Honeine. Approximation errors of online sparsification criteria. IEEE Transactions on Signal Processing, 63(17): 4700-4709, September (2015).

[22] Paul Honeine and Cdric Richard. Preimage problem in kernel-based machine learning. IEEE Signal Processing Magazine, 28(2): 77-88, March (2011).

[23] Paul Honeine, Cdric Richard, and Jos C. M. Bermudez. On-line nonlinear sparse approximation of functions. In Proc. IEEE International Symposium on Information Theory, pages 956-960, Nice, France, June (2007).

[24] Paul Honeine and Fei Zhu. Eviter la maldiction de pr-image : application la factorisation en matrices non ngatives noyaux. In Actes du 25-me Colloque GRETSI sur le Traitement du Signal et des Images, Lyon, France, September (2015).

[25] R. Jenssen. Kernel entropy component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5): 847-860, (2010).

[26] I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, NY, USA, (1986).

[27] Tobias Jung and Daniel Polani. Sequential learning with ls-svm for large-scale data sets. In Proceedings of the 16th International Conference on Artificial Neural Networks - Volume Part II, ICANN'06, pages 381-390, Berlin, Heidelberg, 2006. Springer-Verlag.

[28] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 689-696, New York, NY, USA, 2009. ACM.

[29] S. Mallat. A wavelet tour of signal processing. Academic Press, (1998).

[30] S. Mallat and Z. Zhang. Matching pursuit with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41: 3397-3415, (1993).

[31] Patric Nader, Paul Honeine, and Pierre Beauseroy. Online one-class classification for intrusion detection based on the mahalanobis distance. In Proc. 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pages 1-6, Bruges, Belgium, 22-24 April (2015).

[32] Zineb Noumir, Paul Honeine, and Cdric Richard. On simple one-class classification methods. In Proc. IEEE International Symposium on Information Theory, pages 2022-2026, MIT, Cambridge (MA), USA, 1-6 July (2012).

[33] Zineb Noumir, Paul Honeine, and Cdric Richard. One-class machines based on the coherence criterion. In Proc. IEEE workshop on Statistical Signal Processing, pages 600-603, Ann Arbor, Michigan, USA, 5-8 August (2012).

[34] Zineb Noumir, Paul Honeine, and Cdric Richard. Online one-class machines based on the coherence criterion. In Proc. 20th European Conference on Signal Processing, pages 664-668, Bucharest, Romania, 27-31 August (2012).

[35] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583): 607-609, (1996).

[36] John Platt. A resource-allocating network for function interpolation. Neural Comput., 3(2): 213- 225, June (1991).

[37] Puskal P. Pokharel, Weifeng Liu, and Jose C. Principe. Kernel least mean square algorithm with constrained growth. Signal Processing, 89(3): 257 - 265, (2009).

[38] C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, (2006).

[39] Cdric Richard, Jos C. M. Bermudez, and Paul Honeine. Online prediction of time series data with kernels. IEEE Transactions on Signal Processing, 57(3): 1058-1067, March (2009).

[40] Chafic Said, Rgis Lengell, Paul Honeine, and Roger Achkar. Online kernel adaptive algorithms with dictionary adaptation for mimo models. IEEE Signal Processing Letters, 20(5): 535-538, May (2013).

[41] Chafic Said, Rgis Lengell, Paul Honeine, Cdric Richard, and Roger Achkar. Nonlinear adaptive filtering using kernel-based algorithms with dictionary adaptation. International Journal of Adaptive Control and Signal Processing, 29(11): 1391-1410, November (2015).

[42] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, (2004).

[43] Bharath Kumar Sriperumbudur Vangeepuram. Reproducing Kernel Space Embeddings and Metrics on Probability Measures. PhD thesis, Electrical Engineering (Signal and Image Processing), University of California at San Diego, La Jolla, CA, USA, 2010. AAI3432386.

[44] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle. Least Squares Support Vector Machines. World Scientific Pub. Co., Singapore, (2002).

[45] J. A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Trans. Information Theory, 50: 2231-2242, (2004).

[46] J.A. Tropp. Just relax: convex programming methods for identifying sparse signals in noise. Information Theory, IEEE Transactions on, 52(3): 1030-1051, March (2006).

[47] C. Tsallis. Nonadditive entropy: The concept and its use. European Physical Journal A, 40: 257-266, June (2009).

[48] V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY, USA, (1995).

[49] Najdan Vukovi´c and Zoran Miljkovi´c. A growing and pruning sequential learning algorithm of hyper basis function neural network for function approximation. Neural Netw., 46: 210-226, Oct. (2013).

[50] Fei Zhu and Paul Honeine. Online nonnegative matrix factorization based on kernel machines. In Proc. 23rd European Conference on Signal Processing, pages 2381-2385, Nice, France, 31 Aug. -4 Sept. (2015).

[51] Fei Zhu and Paul Honeine. Pareto front of bi-objective kernel-based nonnegative matrix factorization. In Proc. 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pages 585-590, Bruges, Belgium, 22-24 April (2015).

[52] Fei Zhu and Paul Honeine. Bi-objective nonnegative matrix factorization: Linear versus kernelbased models. IEEE Transactions on Geoscience and Remote Sensing, pages 1-11, in press (2016).

[53] Fei Zhu and Paul Honeine. Online kernel nonnegative matrix factorization. Signal Processing, (in press), (2016).

[54] Fei Zhu, Paul Honeine, and Maya Kallas. Kernel non-negative matrix factorization without the pre-image problem. In Proc. 24th IEEE workshop on Machine Learning for Signal Processing, pages 1-6, Reims, France, 21-24 September (2014).

[55] Fei Zhu, Paul Honeine, and Maya Kallas. Kernel nonnegative matrix factorization without the curse of the pre-image. Technical Report arXiv: 1407. 4420v1, ArXiv, (2015).

[56] Fei Zhu, Paul Honeine, and Maya Kallas. Kernel nonnegative matrix factorization without the curse of the pre-image - application to unmixing hyperspectral images. http: /arxiv. org/abs/1407. 4420, pages 1-13, (2016).

Show More Hide