Monday, September 17, 2012

Artificial Democracy

I am studying machine reinforcement learning for my master's thesis. Today I stumbled across a concept called "ensembles." There are many different algorithms to which can be used to make a computer learn how to do something that may be difficult to explicitly describe. For example, how do you make a helicopter hover?

You could study the motion of the helicopter and formulate precise equations to control the blades. Reinforcement learning is interesting because you can teach the helicopter that it has blades and it can spin them quickly or slowly; this way or that way. Then you teach the helicopter that falling or shaking a lot are bad. If it does either of these, you punish the helicopter. Over time, the helicopter can learn how to hover because it is trying to get the most "reward" that it can by its actions. There are several algorithms that are suitable to different types of problems. We might choose one of these algorithms and program the helicopter to "learn" by that method.

The approach that I learned about today is like an artificial democracy. Instead of choosing one algorithm under the rules of which the helicopter must learn, you choose several algorithms at the same time. Each algorithm will come up with its own policy of how to fly the helicopter to maximize its own interpretation of the "reward."



Next, the helicopter listens to all of the algorithms at once, either by taking some average of biasing towards some rather than others. It would be as if the helicopter had a group of policy advisers who would provide suggestions. I read a dissertation today which argued that ensembles of policies might actually be more robust than individual policies for certain problems. Perhaps ensembles could work well on problems which are "in-between" areas of specialization for the member algorithms or problems for which very little information is known.

This idea is very intuitive and seems very related to how living systems and human-made systems work (like democratic governments or ant colonies).

Saturday, September 8, 2012

The Girlfriend Problem

Anyone who has ever had a girlfriend knows about the problem of feedback attribution.

Perhaps you call your girlfriend to see if she wants to go on a date. When she answers, she seems upset about something, but refuses to tell you what is specifically upsetting her. A man somewhat experienced with women can (sometimes) figure out the reason why she is upset. He can think of what he has done in the recent past and make consider some of the things that perhaps he shouldn't have done. He is able to select a small group behaviors or actions from the totality of his existence that probably made his girlfriend upset and use this knowledge to determine how to apologize properly.

How is it that a man is able to figure out why a woman is mad when she refuses to tell him? Can a robot learn to do this too?

Perhaps Fred knows that his girlfriend, Sarah, usually finds out about his mischievous behavior after 14 days with a variance of 3 days. When trying to apologize to Sarah, he can assume that the she is upset because of an action approximately 14 days in the past. Fred promises not to repeat any of his actions that took place between 11 and 17 days ago, hoping that she will then forgive him.

Should can spread Sarah's disgust over his past actions in many ways to try and determine what it was that he did wrong. He can use any of the stochastic distributions to associate Sarah's frustration with the appropriate actions. Fred might try to take into account the frequency of his actions and Sarah's usual disposition to produce a better guess.

For example, Fred plays golf with his friends each week. Normally, Sarah is quite content during these weeks. Fred decides that because of how often he golfs, Sarah is probably mad for a different reason. he notices that 12 days ago was the only time he can remember when one of his golfing excursions overlapped with Sarah's birthday. It is very unusual for these two events to overlap, so Fred decides that there is a high probability that Sarah is upset about that and decides that in the future, he will either try to reschedule his golfing trip or Sarah's birthday.

An intelligent machine might function in a similar way. Consider a machine that is designed to play diplomacy. As the game progresses, the machine chooses actions that benefit its own nation. However, the machine is attacked by Russia, who had previously been very peaceful. The machine must decide why this betrayal occurred. Did one of the machine's past moves offend Russia? Did a past move hit one of Russia's allies?

The difficulty of the problem is that a machine does not know when its actions will be judged. An intelligent machine must be able to learn from the judgments (good or bad) that are bought upon it. Humans are able to ask themselves, "Why did this happen to me?" We are pretty good at determining what we did in the past to deserve our punishment or reward. From that knowledge, we learn whether to do it more or less.