A Brief Introduction to Connectionism
Connectionist modelling exploits artificial neural networks that mimic some of the basic properties of neural processing in the brain. Densely connected webs of simple processing units propagate and transform complex patterns of activity. When exposed to a training environment, these networks undergo a process of self-organisation.
A simple pattern associator
Self-organisation yields information processing systems which support new forms of behaviour. The study of the dynamics of these systems and their learning capabilities may provide us with important clues as to the nature of the mechanisms underlying development in infants and young children.
This is my fabulous left-aligned pullquote.
Neural network modellers have investigated many different types of network architectures, such as auto-associators, Boltzmann machines, perceptrons, Hopfield nets, adaptive resonance systems and competitive networks. Many of these parallel, distributed processing (PDP) systems offer good approximations to the type of neural computations that occur in the brain. PDP models can be trained using a variety of learning algorithms to direct the dynamics of self-organisation.
These characteristics offer the opportunity to explore the learning capacities of a wide range of biologically plausible mechanisms. Eventually, it will be possible to catalogue the types of neural networks that are needed to perform specific linguistic and cognitive functions. It may even be possible to specify the properties of the basic genetic components that permit these systems to develop. This information will be invaluable to psychologists seeking to understand how the brain develops and how brain development relates to behavioural development.
General Properties of PDP Networks
- PDP models share a number of computational properties that are crucial for the impact they have on our understanding of the development of human information processing.
- The simple computational units that make up a PDP network process information in parallel. Individual units may receive information from many sources simultaneously. Consequently, the computations carried out by units are usually context dependent: The impact of one unit upon another usually depends on the state of many other units in the network.
- Information is encoded in a network in a distributed fashion. The global pattern of activity in the network, rather just a single part of it, is responsible for its behaviour. Consequently, individual units or connections in a network are not always semantically interpretable: It can be difficult to decipher what they mean.
- Networks can be trained to perform new tasks. Training proceeds by adapting the strength of the connections in the network. The necessary ingredients for training a neural network are a training environment and a learning algorithm. What the network learns depends upon the properties of these two ingredients.
- Interference with the units or connections in a network results in a graceful degradation of its performance: The greater the level of interference, the greater is the degree of degradation in performance. Networks also tend to be robust in the face of noisy input. Noisy versions of old inputs to the network will tend to produce the same response as the original inputs.
- PDP networks are analogy machines. Network responses to novel stimuli reflect the similarity of the new inputs to those on which the network has already been trained. Some types of networks form internal representations of their inputs. Internal representations transform the perceived similarity between input patterns and reorganise responses to novel stimuli to reflect the internal organisation of the network rather than the literal similarity between inputs.
The auto-associator consists of a set of units (represented by circles) with incoming and outgoing connections (represented by arrows). In addition, each unit possesses as set of connections to every other unit in the network (represented by small black circles). Activity entering the network along the input lines initiates a build up of activity in the units which is passed forward along the output lines and to the other units in the network. A reverberating cycle of activation is thereby launched in the network. If the strengths of the connections in the network are suitably chosen, the autoassociator will eventually stabilise to a state of equilibrium in the activity of the units. Usually, the pattern of activation achieved by the auto-associator is just the same pattern of activity that was used to initiate the cycle – hence the term auto-association. It may seem strange to build a network that just replicates the pattern of activity to which it is exposed. However, there are a several desirable properties associated with networks of this type:
- The network can act as a store for many input patterns simultaneously, thereby functioning as a memory system.
- The network can be trained to reproduce new patterns by adapting the connections using a simple learning algorithm.
- If the auto-associator is presented with a noisy version of one of the patterns in its memory, the final stable state of the network will look more like the original pattern than the noisy input. The auto-associator performs pattern completion.
There is a growing body of evidence indicating that neural networks in the hippocampus store episodic memories and that their computational/architectural structure resembles that of an auto-associator (Treves & Rolls 1994).
The pattern associator receives an input pattern that is used to clamp the states of a set of input units. The activity at the input propagates along a set of connections to an array of output units which themselves are stimulated into activity. The precise pattern of activity produced at the output layer depends on the strength of the connections between the two layers of units. This pattern of output activity can be quite different from the pattern of activity received at the input. Hence, networks of this type are suited to transforming one pattern into another pattern.
The pattern associator can encode many transformations just like the auto-associator can reproduce many episodes. Furthermore, the pattern associator can be taught to learn new transformations, using a learning algorithm that adapts the strengths of the connections in the network to accommodate the desired transformation.
There is good reason to believe that transformations of neuronal patterns of activation occurs between layered structures of the brain. For example, association areas in the brain link patterns of activity in the visual cortex to patterns of activity in the olfactory cortex. The connections between these two areas are responsible for performing a transformational or mapping function whereby sight of food can stimulate neural activity related to its smell. Pattern association plays an important role in brain processing.
The competitive learning network is more structured in its architecture than either the auto-associator or the pattern associator. The input layer to the network (Layer 1) consists of a collection of units with excitatory connections projecting to all of the units in Layer 2 of the network. Layer 2 contains multiple clusters of units. Clusters are not connected but all of the units within a cluster have inhibitory connections to each other. Stimulation of a cluster by a pattern of activity propagating from Layer 1 results in a competition between the units within a cluster. The computational properties of the interaction between units within a cluster are arranged so that only one unit remains active by the end of the competitive process – a Winner-Take-All system. The winning unit, together with all the other winning units from the other clusters in Layer 2, provide the input to a third layer of unit clusters which enter into a similar competition.
Learning takes place in the network by adjusting the strength of the excitatory connections between the layers of the network. The inhibitory connections between units within a cluster are not adjusted. The competitive learning algorithm adjusts excitatory connections by increasing the strength of the active connections feeding into the winning unit and decreasing the strength of the active connections feeding into the all the losing units. If the same pattern of activity is presented at the input layer after this adjustment had been made, then the same unit in the cluster will win the competition. However, it will win the competition faster since it receives stronger support from the input layer. Different input patterns may result in different units within a cluster winning the competitive process. The process of adjusting connections to favour the winning unit will accelerate the competitive process.
There are usually far fewer units within a cluster than there are distinguishable input patterns. Consequently, individual units will usually commandeer multiple patterns from the input set, i.e, they will win the competition for a subset of the total set of input patterns. The number of patterns captured by a particular unit depends partly on the number units within the competitive cluster and partly on the way that the input patterns populate the input space.
Large numbers of units within an inhibitory cluster leads to individual units capturing a small proportion of the input patterns. Patterns that group together (are similar to one another) in the input space will tend to be captured by the same unit. Effectively, the process of competitive learning leads to a classification of the input space. The activation of a unit within a competitive cluster counts as an index of the category to which the current input pattern belongs. The number of categories into which the input space is divided is determined by the number of units in the inhibitory cluster.
The process of automatic grouping and classification of patterns that is observed in competitive learning networks closely resembles the manner in which topographic maps are formed in the brain. For example, the primary visual cortex in the primate brain contains orientation columns organised in a retinotopic manner. Von der Marlsburg (1973) was able to simulate the formation of these columns using a competitive-style learning network.
Constraint satisfaction network
The constraint satisfaction network offers an example of how a neural network can offer alternative interpretations of a single input stimulus. The inputs to the network are the eight vertices of a Necker cube. Each vertex of the cube projects excitatory connections to two models of the cube. Each model represents a different interpretation of the Necker cube’s orientation. Excitatory connections link the nodes within a model. Inhibitory connections link nodes between models which are incompatible with each other. For example, BLL and FLL are nodes in different models that receive excitatory connections from the same input vertex. However, they represent competing hypotheses about the orientation of the cube so they are connected to each other by inhibitory links. Similarly, it makes sense that only one Front, Lower, Left node should be active – you only see one interpretation of the cube at any given moment. Consequently, both FLL nodes are also connected by inhibitory links. The pattern of connections in the network captures a set of constraints on the activations of the nodes corresponding to different interpretations of the Necker cube. Ideally, all the nodes in one model of the cube should end up fully active while all the nodes in the other model should end up fully dormant.
Node activations build up gradually in the network. The activity of a node varies with the amount of activation it receives from other nodes. If the net activity received from other nodes in the network is positive, then it will increase its level of activity. Otherwise, the node’s activity will decay. All nodes receive excitatory connections from the eight input vertices of the cube and from other nodes in the same model of the cube. Inhibitory connections come from the nodes in the alternative model. The net activity (positive or negative) entering a given node will depend on the relative amount of activity in the two models of the cube. If the node activities in one model are generally higher than node activities in the other model, then the former will tend to increase its level of activity still further while turning off the nodes in the other model – a rich get richer and poor get poorer effect. Usually, one of the models becomes fully active and the other entirely dormant though there are some stable configurations of activity in which nodes in both models remain active.
Why does one model come to dominate the other – after all, both models receive excitatory connections from all eight input vertices? The reason is that node activations are updated in an asyncronous fashion, i.e., only one node’s activation is updated at any point in time. If this node receives a preponderance of excitatory input, then its activity will increase before any other node activations are updated. Nodes compatible with this increase in activity will benefit when their turn comes to be updated whereas incompatible nodes will be hindered.
Hence, the final stable pattern of activity in the network is influenced by the order in which nodes are updated. Individual nodes in the network correspond to local interpretations of an input vertex. Consequently, the order of local interpretations of the input vertices influences the global interpretation of the cube. In other words, when you look at a Necker cube your interpretation of one of the vertices biases your interpretation of all the others.
This simple example of constraint satisfaction (adapted from Feldman 1981) involves a network in which the connections are hard-wired: The strengths of the excitatory and inhibitory connections are specified by the modeller. In essence, the network embodies the modeller’s theory about the nature of the constraints that need to be satisfied for interpreting the Necker cube. It is also possible to train networks to embody multiple constraint satisfaction using a network called a Boltzmann machine (Ackley, Hinton & Sejnowski, 1984). This approach is particularly useful when the constraints are complex and when the modeller has no specific reason to prefer one solution to the problem over another. In fact, the network’s solution may offer perspectives which the modeller had not imagined.
Unlike the three other types of network mentioned previously, no obvious biological analogue to a constraint satisfaction network has yet been discovered. Although the human cognitive system is continually performing tasks that involve the satisfaction of multiple and complex constraints, we do not understand the organisation of the neural mechanisms that implement this process. Constraint satisfaction networks can be best thought of as representing more abstract characterisations of the mechanisms that support cognition. As in the case of the Necker cube, these systems can used to generate and test hypotheses concerning human cognition.
Neural networks models should not be confused with the semantic networks that have been used to model human conceptual structure (Collins & Quillian, 1972). In a semantic network, individual nodes stand for concepts such as canary, bird or wings: Concepts are represented in a localist fashion. Nodes are connected by links which indicate the conceptual relation between the nodes. Links such as HAS and ISA can be used to represent the knowledge that a canary ISA bird or a bird HAS wings. Knowledge is represented in a transparent fashion and new facts can be added easily to the network simply by the addition of conceptual nodes or the creation of different types of links. Likewise, knowledge can be deleted from the network by removing links or nodes.
In a PDP network, patterns of activity across a collection of nodes can often be interpreted as representing a concept: Concepts are represented in a distributed fashion. Individual connections cannot be interpreted as representing a conceptual relation. The connections between the nodes in a PDP network are just excitatory or inhibitory and are usually defined by some real-valued number. The whole matrix of connections encodes what the network knows about a given problem. Change the connections and you change the knowledge. In contrast to a semantic network, it is impossible to add a completely new fact to the network’s knowledge base simply by the addition of new nodes or connections. Individual facts are represented by collections of nodes and connections, often involving the entire network. New information has to be integrated with old information. Conversely, it is usually impossible to remove a single fact from the network simply by the deletion of a node or a connection.
Evaluating the Poverty of the Stimulus
Plato’s Cave Allegory by Markus Maurer
In the development of an intelligent organism, there is a fundamental trade-off between information stored in its internal processing systems and information available in its natural environment. Information that is reliably available in the environment does not need to be stored since it can be accessed externally whenever the organism requires it. For the developing organism, there is a similar trade-off between the information processing systems that are needed to get learning started and the information available in the environment that can be used to construct new information processing systems. The nature-nurture debate concerns itself with the appropriate definition of this trade-off.
Nativists argue that environmental stimuli are too impoverished to account for the complexity of the human cognitive system and consequently they promote the importance of innate information processing capabilities. Empiricists argue that the rich structures available in the environment are underestimated by the nativists and that highly specialised innate information processing systems may not be necessary to get learning off the ground.
One of the great strengths of PDP networks is their ability to extract and represent the patterns of regularity inherent in a structured training environment. In this respect, they act like statistical inference machines. However, the information processing systems which they construct depends critically on their initial computational and architectural resources. The use of PDP networks to model development provides the researcher with a powerful tool for examining the trade-off between initial architectural/computational constraints on the one hand and environmental information on the other.
Modelling Brain Systems with Neural Networks
Our current state of knowledge concerning the brain mechanisms underlying human mental development is scant. It is likely that different neural systems embody different computational principles and it is becoming increasing apparent that brain development is a far more malleable process than was previously imagined. The organisation of mature brain systems varies considerably from one individual to the next. Even in adulthood, the brain is still sufficiently malleable to permit the transfer of mental function to different cortical centres after lesion.
The wide variety of neural systems that emerge during development poses a real problem for understanding the basic principles underlying the computations performed. We need to understand the principles of neural computation if we are to decipher the solutions that the brain has constructed in the service of intelligent behaviour. It is improbable that we will understand brain function fully by studying the activity of single neurons.
Unfortunately, when we move to higher levels of neural organisation the complexity of the structures involved becomes daunting. Analysing the details of brain activity and neural connectivity pose practical and methodological problems that may prove intractable. Modelling offers a way forward in the study of the computational principles underlying the operation of these complex systems. In much the same way as an engineer studying fluid dynamics might use a computer model to test theories about fluid motion along a pipe, reseachers can use artificial neural networks to study the learning properties of complex neural systems.
Armed with this knowledge, we are in a better position to identify tasks that require complex computational/architectural resources to get learning off the ground and tasks that require a relatively minimal pre-wiring of the system. Studying the properties of artificial learning systems can inform our understanding of the brain and its development, thereby contributing to our understanding of the mechanisms underlying the development and functioning of the mature brain.