This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
class="download" HREF = "https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT">Code</A><div class="download">Abstract</div><div class="abstract"> eep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against ℓ∞-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data.
M. W. Spratling (2023) <B>Comprehensive assessment methods are key to progress in deep learning</B> [commentary]. <I><AHREF ="https://www.cambridge.org/core/journals/behavioral-and-brain-sciences">Behavioral and Brain Sciences</A></I>, in press.
H. Guan and M. W. Spratling (2023) <b>Query semantic reconstruction for background in few-shot segmentation.</b><I><AHREF ="https://doi.org/10.1007/s00371-023-02817-x">The Visual Computer</A></I>, in press.
Few-shot segmentation (FSS) aims to segment unseen classes using a few annotated samples.
Typically, a prototype representing the foreground class is extracted from annotated support image(s)
and is matched to features representing each pixel in the query image. However, models learnt in this
way are insufficiently discriminatory, and often produce false positives: misclassifying background
pixels as foreground. Some FSS methods try to address this issue by using the background in the
support image(s) to help identify the background in the query image. However, the backgrounds
of theses images is often quite distinct, and hence, the support image background information is
uninformative. This article proposes a method, QSR, that extracts the background from the query
image itself, and as a result is better able to discriminate between foreground and background features
in the query image. This is achieved by modifying the training process to associate prototypes with
class labels including known classes from the training data and latent classes representing unknown
background objects. This class information is then used to extract a background prototype from
the query image. To successfully associate prototypes with class labels and extract a background
prototype that is capable of predicting a mask for the background regions of the image, the machinery
for extracting and using foreground prototypes is induced to become more discriminative between
different classes. Experiments achieves state-of-the-art results for both 1-shot and 5-shot FSS on the
PASCAL-5i and COCO-20i dataset. As QSR operates only during training, results are produced with
no extra computational complexity during testing.
L. Li and M. W. Spratling (2023) <b>Data augmentation alone can improve adversarial training.</b><I>Proceedings of the 11th <AHREF ="https://openreview.net/forum?id=y4uc4NtTWaq">International Conference on Learning Representations (ICLR)</A></I>.
class="download"HREF ="https://github.com/TreeLLi/DA-Alone-Improves-AT">Code</A><divclass="download">Abstract</div><divclass="abstract">Adversarial training suffers from the issue of robust overfitting, which seriously impairs its generalization performance. Data augmentation, which is effective at preventing overfitting in standard training, has been observed by many previous works to be ineffective in mitigating overfitting in adversarial training. This work proves that, contrary to previous findings, data augmentation alone can significantly boost accuracy and robustness in adversarial training. We find that the hardness and the diversity of data augmentation are important factors in combating robust overfitting. In general, diversity can improve both accuracy and robustness, while hardness can boost robustness at the cost of accuracy within a certain limit and degrade them both over that limit. To mitigate robust overfitting, we first propose a new crop transformation Cropshift with improved diversity compared to the conventional one (Padcrop). We then propose a new data augmentation scheme, based on Cropshift, with much improved diversity and well-balanced hardness. Empirically, our augmentation method achieves the state-of-the-art accuracy and robustness for data augmentations in adversarial training. Furthermore, it matches, or even exceeds when combined with weight averaging, the performance of the best contemporary regularization methods for alleviating robust overfitting.
J. Ning, H. Guan and M. W. Spratling (2023) <b>Rethinking the backbone architecture for tiny object detection.</b><I>Proceedings of the 18th <AHREF =
"https://doi.org/10.5220/0011643500003417">International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP)</A></I>, Volume 5, pp. 103-114.
Tiny object detection has become an active area of research because images with tiny targets are common in several important real-world scenarios. However, existing tiny object detection methods use standard deep neural networks as their backbone architecture. We argue that such backbones are inappropriate for detecting tiny objects as they are designed for the classification of larger objects, and do not have the spatial resolution to identify small targets. Specifically, such backbones use max-pooling or a large stride at early stages in the architecture. This produces lower resolution feature-maps that can be efficiently processed by subsequent layers. However, such low-resolution feature-maps do not contain information that can reliably discriminate tiny objects. To solve this problem we design Ã¢ÂÂbottom-heavyÃ¢ÂÂ versions of backbones that allocate more resources to processing higher-resolution features without introducing any additional computational burden overall. We also investigate if pre-training these backbones on images of appropriate size, using CIFAR100 and ImageNet32, can further improve performance on tiny object detection. Results on TinyPerson and WiderFace show that detectors with our proposed backbones achieve better results than the current state-of-the-art methods.
L. Li and M. W. Spratling (2023) <b>Understanding and combating robust overfitting via input loss landscape analysis and regularization.</b><I><AHREF ="https://doi.org/10.1016/j.patcog.2022.109229">Pattern Recognition</A></I>, 136 (109229).
Adversarial training is widely used to improve the robustness of deep neural networks to adversarial attack. However, adversarial training is prone to overfitting, and the cause is far from clear. This work sheds light on the mechanisms underlying overfitting through analyzing the loss landscape w.r.t. the input. We find that robust overfitting results from standard training, specifically the minimization of the clean loss, and can be mitigated by regularization of the loss gradients. Moreover, we find that robust overfitting turns severer during adversarial training partially because the gradient regularization effect of adversarial training becomes weaker due to the increase in the loss landscape's curvature. To improve robust generalization, we propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction. Our method significantly mitigates robust overfitting and achieves the highest robustness and efficiency compared to similar previous methods.
B. Gao and M. W. Spratling (2023) <b>Explaining away results in more robust visual tracking.</b><I><AHREF ="https://doi.org/10.1007/s00371-022-02466-6">The Visual Computer</A></I>, 39:2081-95.
Many current trackers utilise an appearance model to localise the target object in each frame. However, such approaches often fail when there are similar looking distractor objects in the surrounding background, meaning that target appearance alone is insufficient for robust tracking. In contrast, humans consider the distractor objects as additional visual cues, in order to infer the position of the target. Inspired by this observation, this paper proposes a novel tracking architecture in which not only is the appearance of the tracked object, but also the appearance of the distractors detected in previous frames, taken into consideration using a form of probabilistic inference known as explaining away. This mechanism increases the robustness of tracking by making it more likely that the target appearance model is matched to the true target, rather than similar-looking regions of the current frame. The proposed method can be combined with many existing trackers. Combining it with SiamFC, DaSiamRPN, Super DiMP and ARSuper DiMP all resulted in an increase in the tracking accuracy compared to that achieved by the underlying tracker alone. When combined with Super DiMP and ARSuper DiMP the resulting trackers produce performance that is competitive with the state-of-the-art on seven popular benchmarks.
B. Gao and M. W. Spratling (2022) <b>Shape-texture debiased training for robust template matching.</b><I><AHREF ="https://doi.org/10.3390/s22176658">Sensors</A></I>, 22(17), 6658.
Finding a template in a search image is an important task underlying many computer vision applications. This is typical solved by calculating a similarity map using features extracted from the separate images. Recent approaches perform template matching in a deep feature-space, produced by a convolutional neural network (CNN), which is found to provide more tolerance to changes in appearance. Inspired by these findings, in this article we investigate if enhancing the CNN's encoding of shape information can produce more distinguishable features that improve the performance of template matching. By comparing features from a same CNN but trained by different shape-texture training methods, we determined a feature-space which improves the performance of most template matching algorithms. When combining the proposed method with the Divisive Input Modulation (DIM) template matching algorithm, its performance is greatly improved, and the resulting method produces state-of-the-art results on a standard benchmark. To confirm these results we also create a new benchmark and show that the proposed method also outperforms existing techniques on this new dataset.
N. Manchev and M. W. Spratling (2022) <b>On the biological plausibility of orthogonal initialisation for solving gradient instability in deep neural networks.</b><I>Proceedings of the 9th <AHREF ="https://doi.org/10.1109/ISCMI56532.2022.10068489">International Conference on Soft Computing and Machine Intelligence (ISCMI)</A></I>, pp. 47-55.
Initialising the synaptic weights of artificial neural
networks (ANNs) with orthogonal matrices is known to alleviate
vanishing and exploding gradient problems. A major objection
against such initialisation schemes is that they are deemed
biologically implausible as they mandate factorization techniques
that are difficult to attribute to a neurobiological process. This
paper presents two initialisation schemes that allow a network to
naturally evolve its weights to form orthogonal matrices, provides
theoretical analysis that pre-training orthogonalisation always
converges, and empirically confirms that the proposed schemes
outperform randomly initialised recurrent and feedforward networks.
C. Huang, H. Guan, A. Jiang, Y. Zhang, M. W. Spratling and Y.-F. Wang (2022) <b>Registration based few-shot anomaly detection.</b><I><AHREF ="https://doi.org/10.1007/978-3-031-20053-3_18">European Conference on Computer Vision (ECCV)</A></I>, In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Proceedings part 24, Lecture Notes in Computer Science, Volume 13684, pp:303-19, Springer.
This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. So far, existing FSAD studies follow the one-model-per-category learning paradigm used for standard AD, and the inter-category commonality has not been explored. Inspired by how humans detect anomalies, i.e., comparing an image in question to normal images, we here leverage registration, an image alignment task that is inherently generalizable across categories, as the proxy task, to train a category-agnostic anomaly detection model. During testing, the anomalies are identified by comparing the registered features of the test image and its corresponding support (normal) images. As far as we know, this is the first FSAD method that trains a single generalizable model and requires no re-training or parameter fine-tuning for new categories. Experimental results have shown that the proposed method outperforms the state-of-the-art FSAD methods by 3%-8% in AUC on the MVTec and MPDD benchmarks. Source code will be publicly available.
H. Guan and M. W. Spratling (2022) <b>CobNet: cross attention on object and background for few-shot segmentation.</b><I>Proceedings of the 26th <AHREF ="https://doi.org/10.1109/ICPR56361.2022.9956070">International Conference on Pattern Recognition (ICPR)</A></I>, pp. 39-45.
Few-shot segmentation aims to segment images containing objects from previously unseen classes using only a few annotated samples. Most current methods focus on using object information extracted, with the aid of human annotations, from support images to identify the same objects in new query images. However, background information can also be useful to distinguish objects from their surroundings. Hence, some previous methods also extract background information from the support images. In this paper, we argue that such information is of limited utility, as the background in different images can vary widely. To overcome this issue, we propose CobNet which utilises information about the background that is extracted from the query images without annotations of those images. Experiments show that our method achieves a mean Intersection-over-Union score of 61.4% and 37.8% for 1-shot segmentation on PASCAL-5i and COCO-20i respectively, outperforming previous methods. It is also shown to produce state-of-the-art performances of 53.7% for weakly-supervised few-shot segmentation, where no annotations are provided for the support images.
B. Gao and M. W. Spratling (2022) <b>More robust object tracking via shape and motion cue integration.</b><I><AHREF ="https://doi.org/10.1016/j.sigpro.2022.108628">Signal Processing</A></I>, 199 (108628).
Most current trackers utilise an appearance model to localise the target object in each frame. However, such approaches often fail when there are similar looking distractor objects in the surrounding background. This paper promotes an approach that can be combined with many existing trackers to tackle this issue and improve tracking robustness. The proposed approach makes use of two additional cues to target location: making the appearance model more sensitive to shape cues in offline training, and using the historical locations of the target to predict its future position during online inference. Combining these additional mechanisms with SiamFC, SiamFC++, Super DiMP and ARSuper DiMP all resulted in an increase in the tracking accuracy compared to that achieved by the corresponding underlying tracker alone. When combined with ARSuper DiMP the resulting tracker is shown to outperform all popular state-of-the-art trackers on three benchmark datasets (OTB-100, NFS, and LaSOT), and produce performance that is competitive with the state-of-the-art on the UAV123, Trackingnet, GOT-10K and VOT2020 datasets.
B. Gao and M. W. Spratling (2021) <b>Robust template matching via hierarchical convolutional features from a shape biased CNN.</b><I>Proceedings of the <AHREF ="https://doi.org/10.1007/978-981-16-6963-7_31">International Conference on Image, Vision and Intelligent Systems (ICIVIS)</A></I>, Lecture Notes in Electrical Engineering, Volume 813. Springer, Singapore.
Finding a template in a search image is an important task underlying many computer vision applications. Recent approaches perform template matching in a deep feature-space, produced by a convolutional neural network (CNN), which is found to provide more tolerance to changes in appearance. In this article we investigate if enhancing the CNN's encoding of shape information can produce more distinguishable features that improve the performance of template matching. This investigation results in a new template matching method that produces state-of-the-art results on a standard benchmark. To confirm these results we also create a new benchmark and show that the proposed method also outperforms existing techniques on this new dataset.
M. W. Spratling (2020) <b>Explaining away results in accurate and tolerant template matching.</b><I><AHREF ="https://doi.org/10.1016/j.patcog.2020.107337">Pattern Recognition</A></I>, 104 (107337).
Recognising and locating image patches or sets of image features is an important task underlying much work in computer vision. Traditionally this has been accomplished using template matching. However, template matching is notoriously brittle in the face of changes in appearance caused by, for example, variations in viewpoint, partial occlusion, and non-rigid deformations. This article tests a method of template matching that is more tolerant to such changes in appearance and that can, therefore, more accurately identify image patches. In traditional template matching the comparison between a template and the image is independent of the other templates. In contrast, the method advocated here takes into account the evidence provided by the image for the template at each location and the full range of alternative explanations represented by the same template at other locations and by other templates. Specifically, the proposed method of template matching is performed using a form of probabilistic inference known as "explaining away". The algorithm used to implement explaining away has previously been used to simulate several neurobiological mechanisms, and been applied to image contour detection and pattern recognition tasks. Here it is applied for the first time to image patch matching, and is shown to produce superior results in comparison to the current state-of-the-art methods.
N. Manchev and M. W. Spratling (2020) <b>Target propagation in recurrent neural networks.</b><I><AHREF ="http://www.jmlr.org/papers/v21/18-141.html">Journal of Machine Learning Research</A></I>, 21(7): 1-33.
Recent neurophysiological data showing the effects of locomotion on neural
activity in mouse primary visual cortex has been interpreted as providing strong
support for the predictive coding account of cortical function. Specifically,
this work has been interpreted as providing direct evidence that
prediction-error, a distinguishing property of predictive coding, is encoded in
cortex. This article evaluates these claims and highlights some of the
discrepancies between the proposed predictive coding model and the
neuro-biology. Furthermore, it is shown that the model can be modified so as to
fit the empirical data more successfully.
I. E. Kartoglu and M. W. Spratling (2018) <b>Two collaborative filtering recommender systems based on sparse dictionary coding.</b><I><AHREF ="http://dx.doi.org/10.1007/s10115-018-1157-2">Knowledge and Information Systems</A></I>, 57(3): 709-20.
This paper proposes two types of recommender systems based on sparse dictionary
coding. Firstly, a novel predictive recommender system that attempts to predict
a user's future rating of a specific item. Secondly, a top-n recommender system
which finds a list of items predicted to be most relevant for a given user. The
proposed methods are assessed using a variety of different metrics and are shown
to be competitive with existing collaborative filtering recommender
systems. Specifically, the sparse dictionary-based predictive recommender has
advantages over existing methods in terms of a lower computational cost and not
requiring parameter tuning. The sparse dictionary-based top-n recommender system
has advantages over existing methods in terms of the accuracy of the predictions
it makes and not requiring parameter tuning. An open-source software implemented
and used for the evaluation in this paper is also provided for reproducibility.
Q. Wang and M. W. Spratling (2018) <b>Contour detection refined by a sparse reconstruction-based discrimination method.</b><I><AHREF ="http://dx.doi.org/10.1007/s11760-017-1147-y">Signal, Image and Video Processing</A></I>, 12(2): 207-14.
Predictive coding has been proposed as a model of the hierarchical
perceptual inference process performed in the cortex. However, results
demonstrating that predictive coding is capable of performing the complex
inference required to recognise objects in natural images have not
previously been presented.
This article proposes a hierarchical neural network based on predictive
coding for performing visual object recognition.
This network is applied to the tasks of categorising hand-written digits,
identifying faces, and locating cars in images of street scenes. It is shown
that image recognition can be performed with tolerance to position,
illumination, size, partial occlusion and within-category variation.
The current results, therefore, provide the first practical demonstration
that predictive coding (at least the particular implementation of predictive
coding used here; the PC/BC-DIM) is capable of performing accurate
visual object recognition.
D. Re, A. Gibaldi, S. P. Sabatini and M. W. Spratling (2017) <b>An integrated system based on binocular learned receptive fields for saccade-vergence on visually salient targets.</b><I>Proceedings of the 12th <AHREF =
"http://www.scitepress.org/DigitalLibrary/PublicationsDetail.aspx?ID=TyxypQ+XiJc=&t=1">International Conference on Computer Vision Theory and Applications (VISAPP)</A></I>, Volume 6, pp:204-15.
The human visual system uses saccadic and vergence eyes movements to foveate interesting objects with both eyes, and thus exploring the visual scene. To mimic this biological behavior in active vision, we proposed a bio-inspired integrated system able to learn a functional sensory representation of the environment, together with the motor commands for binocular eye coordination, directly by interacting with the environment itself. The proposed architecture, rather than sequentially combining different functionalities, is a robust integration of different modules that rely on a front-end of learned binocular receptive fields to specialize on different sub-tasks. The resulting modular architecture is able to detect salient targets in the scene and perform precise binocular saccadic and vergence movement on it. The performances of the proposed approach has been tested on the iCub Simulator, providing a quantitative evaluation of the computational potentiality of the learned sensory and motor resources.
M. W. Spratling (2017) <b>A review of predictive coding algorithms.</b><I><AHREF =
"http://dx.doi.org/10.1016/j.bandc.2015.11.003">Brain and Cognition</A></I>, 112: 92-7.
class="download">Abstract</div><divclass="abstract">Predictive coding is a
leading theory of how the brain performs probabilistic inference. However, there
are a number of distinct algorithms which are described by the term "predictive
coding". This article provides a concise review of these different predictive
coding algorithms, highlighting their similarities and differences. Five
algorithms are covered: linear predictive coding which has a long and
influential history in the signal processing literature; the first
neuroscience-related application of predictive coding to explaining the function
of the retina; and three versions of predictive coding that have been proposed
to model cortical function. While all these algorithms aim to fit a generative
model to sensory data, they differ in the type of generative model they employ,
in the process used to optimise the fit between the model and sensory data, and
in the way that they are related to neurobiology.</div>
W. Muhammad and M. W. Spratling (2017) <b>A neural model of coordinated head and eye movement control.</b><I><AHREF ="http://dx.doi.org/10.1007/s10846-016-0410-8">Journal of Intelligent & Robotic Systems</A></I>, 85(1):107-26.
class="download">Abstract</div><divclass="abstract">Gaze shifts require the coordinated movement of both the eyes and the head in both animals and humanoid robots. To achieve this the brain and the robot control system needs to be able to perform complex non-linear sensory-motor transformations between many degrees of freedom and resolve the redundancy in such a system. In this article we propose a hierarchical neural network model for performing 3-D coordinated gaze shifts. The network is based on the PC/BC-DIM (Predictive Coding/Biased Competition with Divisive Input Modulation) basis function model. The proposed model consists of independent eyes and head controlled circuits with mutual interactions for the appropriate adjustment of coordination behaviour. Based on the initial eyes and head positions the network resolves redundancies involved in 3-D gaze shifts and produces accurate gaze control without any kinematic analysis or imposing any constraints. Furthermore the behaviour of the proposed model is consistent with coordinated eye and head movements observed in primates.</div>
Q. Wang and M. W. Spratling (2016) <b>Contour detection in colour images using a neurophysiologically inspired model.</b><I><AHREF ="http://dx.doi.org/10.1007/s12559-016-9432-6">Cognitive Computation</A></I>, 8(6):1027-35.
The predictive coding/biased competition (PC/BC) model of V1 has previously been applied to locate boundaries defined by local discontinuities in intensity within an image. Here it is extended to perform contour detection for colour images. The proposed extensions are inspired by neurophysiological data from single neurons in macaque primary visual cortex (V1), and the behaviour of this extended model is consistent with the neurophysiological experimental results. Furthermore, when compared to methods used for contour detection in computer vision, the colour PC/BC model of V1 slightly outperforms some recently proposed algorithms which use more cues and/or require a complicated training procedure.
M. W. Spratling (2016) <b>A neural implementation of Bayesian inference based on predictive coding.</b><I><AHREF =
Previous work has shown that predictive coding can provide a detailed
explanation of a very wide range of low-level perceptual processes. It is also
widely believed that predictive coding can account for high-level, cognitive,
abilities. This article provides support for this view by showing that
predictive coding can simulate phenomena such as categorisation, the influence
of abstract knowledge on perception, recall and reasoning about conceptual
knowledge, context-dependent behavioural control, and naive physics. The
particular implementation of predictive coding used here (PC/BC-DIM) has previously
been used to simulate low-level perceptual behaviour and the neural mechanisms
that underlie them. This algorithm thus provides a single framework for
modelling both perceptual and cognitive brain function.
M. W. Spratling (2016) <b>A neural implementation of the Hough transform and the advantages of explaining away.</b><I><AHREF ="http://dx.doi.org/10.1016/j.imavis.2016.05.001">Image and Vision Computing</A></I>, 52:15-24.
The Hough Transform (HT) is widely used for feature extraction and object
detection. However, during the HT individual image elements vote for many
possible parameter values. This results in a dense accumulator array and
problems identifying the parameter values that correspond to image
features. This article proposes a new method for implementing the voting process
in the HT. This method employs a competitive neural network algorithm to perform
a form of probabilistic inference known as "explaining away". This results in a
sparse accumulator array in which the parameter values of image features can be
more accurately identified. The proposed method is initially demonstrated using
the simple, prototypical, task of straight line detection in synthetic
images. In this task it is shown to more accurately identify straight lines, and
the parameter of those lines, compared to the standard Hough voting process. The
proposed method is further assessed using a version of the implicit shape model
(ISM) algorithm applied to car detection in natural images. In this application
it is shown to more accurately identify cars, compared to using the standard
Hough voting process in the same algorithm, and compared to the original ISM
Q. Wang and M. W. Spratling (2016) <b>A simplified texture gradient method for improved image segmentation.</b><I><AHREF ="http://dx.doi.org/10.1007/s11760-015-0794-0">Signal, Image and Video Processing</A></I>, 10(4):679-86.
Inspired by the probability of boundary (Pb) algorithm, a simplified texture gradient method has been developed to locate texture boundaries within grayscale images. Despite considerable simplification, the proposed algorithm's ability to locate texture boundaries is comparable with Pb's texture boundary method. The proposed texture gradient method is also integrated with a biologically inspired model, to enable boundaries defined by discontinuities in both intensity and texture to be located. The combined algorithm outperforms the current state-of-art image segmentation method (Pb) when this method is also restricted to using only local cues of intensity and texture at a single scale.</div>
<br>W. Muhammad and M. W. Spratling (2015) <b>A neural model of binocular saccade planning and
class="download">Abstract</div><divclass="abstract"> Representing signals as
linear combinations of basis vectors sparsely selected from an overcomplete
dictionary has proven to be advantageous for many applications in pattern
recognition, machine learning, signal processing, and computer vision. While
this approach was originally inspired by insights into cortical information
processing, biologically-plausible approaches have been limited to exploring the
functionality of early sensory processing in the brain, while more practical
application have employed non-biologically-plausible sparse-coding
algorithms. Here, a biologically-plausible algorithm is proposed that can be
applied to practical problems. This algorithm is evaluated using standard
benchmark tasks in the domain of pattern classification, and its performance is
compared to a wide range of alternative algorithms that are widely used in
signal and image processing. The results show that, for the classification
tasks performed here, the proposed method is very competitive with the best of
the alternative algorithms that have been evaluated. This demonstrates that
classification using sparse representations can be performed in a
neurally-plausible manner, and hence, that this mechanism of classification
might be exploited by the brain. </div>
<br>M. W. Spratling (2014) <b>A single functional model of drivers and modulators in cortex.</b><I><AHREF ="http://dx.doi.org/10.1007/s10827-013-0471-7">Journal of Computational Neuroscience</A></I>, 36(1): 97-118.
A distinction is commonly made between synaptic connections capable of evoking a
response ("drivers") and those that can alter ongoing activity but not
initiate it ("modulators"). Here it is proposed that, in cortex, both drivers
and modulators are an emergent property of the perceptual inference performed by
cortical circuits. Hence, it is proposed that there is a single underlying
computational explanation for both forms of synaptic connection. This idea is
illustrated using a predictive coding model of cortical perceptual inference.
In this model all synaptic inputs are treated identically. However,
functionally, certain synaptic inputs drive neural responses while others have a
modulatory influence. This model is shown to account for driving and modulatory
influences in bottom-up, lateral, and top-down pathways, and is used to simulate
a wide range of neurophysiological phenomena including surround suppression,
contour integration, gain modulation, spatio-temporal prediction, and attention.
The proposed computational model thus provides a single functional explanation
for drivers and modulators and a unified account of a diverse range of
neurophysiological data. </div>
<br>M. W. Spratling (2013) <b>Predictive coding.</b> In <I><AHREF ="http://dx.doi.org/10.1007/978-1-4614-7320-6_509-6">Encyclopedia of Computational Neuroscience,</A></I> D. Jaeger and R. Jung (Eds.), Springer, New York.
<br>M. W. Spratling (2013) <B>Distinguishing theory from implementation in predictive coding accounts of brain function</B> [commentary]. <I><AHREF ="http://dx.doi.org/10.1017/S0140525X12002178">Behavioral and Brain Sciences</A></I>, 36(3):231-2.
<br>M. W. Spratling (2013) <b>Image segmentation using a sparse coding model of cortical area V1.</b><I><AHREF ="http://dx.doi.org/10.1109/TIP.2012.2235850">IEEE Transactions on Image Processing</A></I>, 22(4):1631-43.
PC/BC ("Predictive Coding/Biased Competition") is a simple computational model that has previously been shown to explain a very wide range of V1 response properties. This article extends work on the PC/BC model of V1 by showing that it can also account for V1 response properties measured using the reverse correlation methodology. Reverse correlation employs an experimental procedure that is significantly different from that used in more typical neurophysiological experiments, and measures some distinctly different response properties in V1. Despite these differences PC/BC successfully accounts for the data. The current results thus provide additional support for the PC/BC model of V1 and further demonstrate that PC/BC offers a unified explanation for the seemingly diverse range of behaviours observed in primary visual cortex.
<br>M. W. Spratling (2012) <B>Predictive coding as a model of the V1
The predictive coding/biased competition (PC/BC) model is a specific implementation of predictive coding theory that has previously been shown to provide a detailed account of the response properties of orientation tuned cells in primary visual cortex (V1). Here it is shown that the same model can successfully simulate psychophysical data relating to the saliency of unique items in search arrays, of contours embedded in random texture, and of borders between textured regions. This model thus provides a possible implementation of the hypothesis that V1 generates a bottom-up saliency map. However, PC/BC is very different from previous models of visual salience, in that it proposes that saliency results from the failure of an internal model of simple elementary image components to accurately predict the visual input. Saliency can therefore be interpreted as a mechanism by which prediction errors attract attention in an attempt to improve the accuracy of the brain's internal representation of the world.
<br>M. W. Spratling (2012) <B>Unsupervised learning of generative and discriminative
weights encoding elementary image components in a predictive coding model of
The presence of a large number of inhibitory contacts at the soma and axon
initial segment of cortical pyramidal cells has inspired a large and influential
class of neural network model which use post-integration lateral inhibition as a
mechanism for competition between nodes. However, inhibitory synapses also
target the dendrites of pyramidal cells. The role of this dendritic inhibition
in competition between neurons has not previously been addressed. We
demonstrate, using a simple computational model, that such pre-integration
lateral inhibition provides networks of neurons with useful representational and
computational properties which are not provided by post-integration
<br>S. J. Grice, M. W. Spratling, A. Karmiloff-Smith, H. Halit, G. Csibra, M. de Haan and M. H. Johnson (2001) <B>Disordered visual processing and oscillatory brain activity in autism and Williams Syndrome.</B>