"What can Visual Attention Tell us About Consciousness and Volition?"
January 29, 2008, 4-6PM
Note-taker: E. I.
Last Updated: April 18, 2008
[Q = Question, A = Answer, C = Comment, R = Response]
3:15
My primary interest—visual search task, looking for something you’re looking for in fields full of things you are not looking for. This ties into the question of what you experience when you’re actually looking at the world.
If I tell you to find a tiger in this picture…looking at this, you have to search for the tiger and you will not know whether there is a tiger until you find it.
Problem can be divided into three parts:
1) preattentive: before finding the lion, what is going on experientially? It is not the case that there was a black hole…there was some visual experience beforehand. We’ll call that preattentive.
2) attentive: once attention is on lion and lion is recognized
3) postattentive status: what happens to the representation once attention is moved away from the lion.
6:42
To some extent I will address the question of “how do I measure the ‘speed of volition’”? What I will tell you is that the ability to deploy attention around the field is actually very fast, but your ability to do that under volitional control is much slower. I will show how you can measure that. What difference does it make once you have attended to something—does it change your perceptual properties in any way that is interesting?
Visual search in lab: Usually you put an array of things on a computer screen and vary the set size and measure how long people take to report is there an “x” present? What we find is that the amount of time it takes for people to find something is independent of how many items are on the screen. It turns out that there is only a limited set of properties that allow us do search: color orientation size, holes, line terminations, curvatures, 3-D properties… We have some sort of access to across the entire field in “one-step”.
If I tell you to find the green X, you can use info about color and orientation. Guided search (I have been working on for 20 years).
If we do not have these “one-step” features, we will have to look at each item, i.e. search for it. (e.g. finding a T among Ls). In this case, the greater the number of items, the longer it takes to complete the search (approx 20msec added per item). Eyes can only go to 3-4 items per second as opposed to
Q. Do you think about this in terms of gestaltist, pop-out? The Ls were in the same orientation.
A. Yes, we have to purposely change the orientations of each item so that people cannot select using the basic features (basic shapes etc).
Q. What is the difference between preattentive perception and gist perception?
A. I will get to that in more detail later, but the quick answer is: potentially not very much. Preattentive = what happens before something is selected by attention.
Gist = What you can do if you never select at all. Non-selective vision.
14:30
The object identity turns out to be something that does not “jump out” at you. E.g. sketch of chicken. The reason it is difficult to identify a particular chicken is because the mutant distracter chickens share the same local properties with the chicken you are searching for, and because we do not know how these properties hang together until we attend to them (1 version of the binding problem. Binding problem (in neuroscience): Input from eyes-> Visual cortex with 8 million cortex specialized in color, shape, etc…how are all of these coordinated together to form a coherent image?) Before attention arrives, preattentively chickens with the same local features are identical in their preattentive representation. Until attention, don’t know how basic features (e.g. color and orientation) hang together. (e,g, Finding red & vertical among figures that each have red/green/vertical/horizontal features much more difficult than when each of the entities have either red or green and either horizontal or vertical feature)
18:15
Asking for the preattentive representation is difficult because as soon as you try to describe it, you would employ your attention.
Historical way of investigating it: Have subject attend to a particular place and ask what is going on in another location. However, not ideal as it results in endless debates whether it was truly the case that all of the subject’s attention was focused in the place she was asked to attend.
22:10
Now, how fast can we bind features with attention?
Impression/gist—can recognize that the field is full of line-line entities and that they’re green and red, but CANNOT recognize an overall pattern of combined features within the filed, for example that this half of the field contains only red-verticals where as the other half are green-verticals.
How can we figure out how fast we can move around the field?
Many experiments, but one example. [Figure presented with letters arranged in a circle, as well as one located in the center] Focus on center and move attention clockwise on entities (letters) located around the field starting at the top. Locate first letter that is mirror-reversed and indicate whether this letter is a “P” or a “S”.
We will measure the “speed of volition” by changing the location of the mirror reversed letter is.
It should be intuitively clear to you that it took longer for you to find the mirror reversed image that was located farther away from the starting point, compared to if it was located closer to the starting point. We measure the rate by (reaction time/number of letters subjects had to consider before reaching the target letter)
We will compare with “anarchy case” where there is one mirror-reversed letter and we do not specify how you can identify this (i.e. do not have to focus attention anywhere, do not have to consider entities in any particular order). Vary letters on the number of letters on the screen to compare rate of identification (time it took to ID letter/number of letters presented).
Volitional situation rate: 200msec/entity
Anarchich situation rate: at most 30msec/entity
i.e. Search is much slower when you are telling yourself what to look for (data suggests 3-10 times slower). Analogy: Analyzing an areal picture—going grid by grid is slower than freely perusing over entire field.
33:00
What is the status of the chicken after you have attended to it? Change-blindness type experiment where screen will show several ‘chickens’, will flash to another screen where one of the chickens may/may not fall apart, and will flash back to the original screen. Subjects must attend to the chickens, recognize and report back: Was there a chicken that fell apart? Results show that until I direct my attention to it, I do not know how the chicken binds together.
Analogy with face recognition: I cannot recognize Ricardo until I attend to him (Until I attend to him, he is just this collections of “Ricardo bits”). Once I deflect my attention away from Ricardo, the chicken experiment suggests that he goes back to being “Ricardo bits” again.
The reaction time of finding a “fallen apart chicken” from a completely new set of chickens is the same as finding it from chickens that have previously been attended to.
Thus, we can infer that there is nothing about the act of binding (via attention) that survives perceptually in any way that tells you that changed it. Basically there is no evidence that looking at the stable display helps on spotting the-fallen-apart-chicken any faster.
Q. If you changed the basic features (e.g. had an array of red dots and changed the color of the dots), would the results stay the same?
A. Yes. Somewhat surprisingly, even with basic features the results are the same.
If we measure how long it takes to locate a particular letter in a screen (say “T”) and have the subject repeat that search task while presenting the “T” in the same location every time…500 trials later, the subject’s reaction time would not have gotten any faster.
If we change the location of the “T” every time…→ Same result! No change in speed of search after 500 trials.
You may think that you have visual short-term memory to hold items.
Why aren’t they using their memory to do this task? The reason people search the display again rather than use their memory is because searching their memory is much slower.
Therefore, there is nothing about living with that stimulus for an extended period of time, or having prior experience, that gives you anything special (special advantage for the next search).
46:47
In the real world, you do actually tend to use your memory (instead of doing an random search every time) in two senses.
1) If things are sufficiently complicated, you will go to your memory (e.g. where was that cranberry juice last time I saw it?). This access may give you a comparative advantage.
2) If there are 30 items on a screen, but you only ever ask about six of them, then repeated search will be a repeated search over a relevant six entities, and you learn not to search among the other.
The results presented here shows only that there is nothing about the visual stimulus that isn’t changed by the fact that you have attended to it. You do not gain a special property that you didn’t have before. As visual searches go, the first search and the nth search look the same. That is the critical piece.
[Change blindness demo with the Sistine Chapel image.]
Finding changes between two images is a difficult task (no matter how long you look at the first image). This demo also shows that changes in basic features can also be difficult to detect.
[Another change blindness demo with array of red and green dots. Was the cued dot a different color before?] Still difficult to detect (even though we are looking at basic features). With time, subjects tend to get better because they memorize a particular pattern or items (1-4 items) in the array and answer those accurately and guess at everything.
1:01:50
It is not the case that if you are attending to a particular chicken, the rest of the world disappears. (Except perhaps very extreme cases where people get tunnel-visioned upon extreme stress, e.g. mugging situations where people may report that they remember the weapon but nothing else about the experience).
My current favorite way of describing this is that there are two pathways:
Selective pathway that does object recognition and a non-selective pathway that fills in the rest of the experience (gist-y experience).
Aude Oliva has done very interesting work on the gist of scenes. Her results show that if I take a particular scene and I take raw image statistics within a fraction of a second, that is enough to give me a semantic label (gist, categorical status) of the scene.
Non-selective pathway fills the world up with visual stuff/experience for you. In a preattentive sense, we’ll know something about broad image (e.g. bunches of “+” exist on the screen), and once we apply attention to each one of the “+”, we will be able to recognize whether or not it is a particular “+” that you were looking for.
Same with “coffee bean search”: You can tell immediately as you see the picture that they are a bunch of coffee beans. Once you get your attention here, you will recognize that there is a guy’s head hidden in the picture. The non-selective pathway says “round, shiny, coffeebean.” Attentional deployment to each part of the picture confirms “yes, coffee bean, yes coffee bean, they’re still all coffee beans”…and they guy is not a guy until you run it through this bottleneck.
1:12:00
Q. There seem to be situations where people seem to have the where representation unconsciously, whereas you seem to characterize that as a conscious representation. What makes you hold a theory of rich perception where all of our visual field has visual stuff in it? The two-pathway distinction does not seem to require that.
A. The what (temporal)/where (paraetal) two pathway model is popular in neuroscience. I have been using the where pathway, or the more theoretically-neutral pathway, as an example of conscious experience because it helps explain why we see all of the peripheral “stuff” in the visual field. Now there are alternative hypothesis that the visual field is not full of stuff all the time—that we are not
I had a chance to study a Balint’s patience (bilateral paretal damage). They behave as if they can only see one object at a time. These patients are not like agnosics who cannot recognize (e.g. faces) even when they see them. Balint’s patients are able to identify objects they attend to. It is very difficult, however, to understand what their pre-attentive visual experience is like. But to go back to your previous question…
Q (restated). There are situations where people could have the where-pathway unconsciously, but you seem to require that it is conscious.
A. That [the where pathway is conscious] is certainly not a requirement of what and where stuff. It seems to be that the notion of this where-pathway (I prefer using the term “non-selective” pathway), I’ve been using that as an account of a conscious pathway because I want to use it to describe why it is that the visual field is full of stuff all the time. Now, there is an alternative position, which is that the visual field is not full of stuff all the time—that you are only aware of the current object of attention and everything else is a grand illusion. I can understand that this may be theoretically possible, but do not think this to be the actual is the case. If I am in a dark room and someone suddenly switches on the lights, I seem to experience immediately that the room is full of “stuff”. It is terribly difficult however, for methodological reasons, to try to pin this down in the lab. This debate has been going on for at least fifteen years without any solid closure.
Q. Do you have the same intuitions for other (non-visual) senses—that it is always full of things? There was an inconclusive study done in California
A. My intuition is that visual experience is different at least from the chemical (smell, taste) senses. It may be, perhaps because the chemical senses adapt extremely quickly and in the absence of change, one cannot detect any smell/taste. For auditory experience my intuition is that it may be there continuously.
C. Auditory perception has been shown to be non-adaptive.
Q. Could there exist evolutionarily ancestral organisms with the where-pathway, without object recognition?
A. You would have to go pretty far down the evolutionary tree to find such organisms.
Q. What about blind sight effects that show that frogs (for example) avoid objects even without visual experience (i.e. when the front of the brain is removed)?
A. Blind sight effects can be found in humans too. There are clearly visual processes that occur outside of consciousness, but what is difficult to pin-down is trying to explain what it is that may be the contents of visual consciousness.
Reading: http://www.neuphi.com/images/readings/IsWasVisCog06.pdf