Author
|
Topic: RE: virus:Prisoners Dilemma (Read 1362 times) |
|
Blunderov
Archon
Gender:
Posts: 3160 Reputation: 8.63 Rate Blunderov
"We think in generalities, we live in details"
|
|
RE: virus:Prisoners Dilemma
« on: 2004-12-02 16:35:04 » |
|
rhinoceros Sent: 22 November 2004 06:38 PM
Same as the "Prisoner's dilemma" (http://en.wikipedia.org/wiki/Prisoner's_dilemma). We could discuss that one next.
[Blunderov] This one has been bothering me for some time. It's been like when a tune gets stuck in your head.
"Two suspects, you and another person, are arrested by the police. The police have insufficient evidence for a conviction, and having separated the both of you, visit each of you and offer the same deal: if you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If he confesses and you remain silent, you get the full 10-year sentence and he goes free. If you both stay silent, all they can do is give you both 6 months for a minor charge. If you both confess, you each get 6 years."
http://en.wikipedia.org/wiki/Prisoner's_dilemma
It's interesting that the dilemma seems to vanish if you substitute the death sentence for the 10 year sentence. Seen in this light, it would be irrational not to confess every time. Not to do so would be to risk the death penalty and I'm willing to bet most people would consider this to be an unacceptable risk.
Why it is that this slight substitution of terms should bring such certainty with it? I suppose that 10 years is considered an acceptable risk whereas death is not.
I think it must be that the dynamic of acceptable versus unacceptable risk is intrinsic to the problem. And it invites the question: how would one prisoner know what an acceptable risk was to the other? If the risk was unacceptable to the other prisoner then he would be certain to confess.
Seemingly a lot would depend on the actual, concrete nature of the particular relationship. It could make the equilibriums quite different.
Best Regards.
--- To unsubscribe from the Virus list go to <http://www.lucifer.com/cgi-bin/virus-l>
|
|
|
|
rhinoceros
Archon
Gender:
Posts: 1318 Reputation: 8.06 Rate rhinoceros
My point is ...
|
|
RE: virus:Prisoners Dilemma
« Reply #1 on: 2004-12-05 12:40:53 » |
|
[rhinoceros] If the length of this reply of mine makes you scroll quickly and move one to other things, I think that you should not miss at least this link which I am discussing later:
Indirect reciprocity, assessment hardwiring and reputation http://www.edge.org/3rd_culture/sigmund04/sigmund04_index.html
[Blunderov] This one has been bothering me for some time. It's been like when a tune gets stuck in your head.
"Two suspects, you and another person, are arrested by the police. The police have insufficient evidence for a conviction, and having separated the both of you, visit each of you and offer the same deal: if you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If he confesses and you remain silent, you get the full 10-year sentence and he goes free. If you both stay silent, all they can do is give you both 6 months for a minor charge. If you both confess, you each get 6 years."
http://en.wikipedia.org/wiki/Prisoner's_dilemma
It's interesting that the dilemma seems to vanish if you substitute the death sentence for the 10 year sentence. Seen in this light, it would be irrational not to confess every time. Not to do so would be to risk the death penalty and I'm willing to bet most people would consider this to be an unacceptable risk.
Why it is that this slight substitution of terms should bring such certainty with it? I suppose that 10 years is considered an acceptable risk whereas death is not.
I think it must be that the dynamic of acceptable versus unacceptable risk is intrinsic to the problem. And it invites the question: how would one prisoner know what an acceptable risk was to the other? If the risk was unacceptable to the other prisoner then he would be certain to confess.
Seemingly a lot would depend on the actual, concrete nature of the particular relationship. It could make the equilibriums quite different.
[rhinoceros] Of course you always have to weight the results of the different possible outcomes before making a choice. So, putting an infinite value (death) to one of the outcomes gives it an infinite weight. But death sentence or not, the rational choice in this game is the same: you must defect and tell on the other guy.
a) If you "defect" and tell on the other guy, then you either get 6 years in prison (if he defects too, and tells on you) or you go free (if he remains silent - "cooperates"). So, it is either 6 years or you go free.
b) If you "cooperate" (remain silent), you get either 10 years (if he defects and tells on you) or 6 months (if he cooperates and remains silent). So, it is either 10 years (or death) or 6 months. You are worse off.
What is odd is that, if the other guy is rational too, you should expect him to do the same, and the most likely outcome is 6 years of prison for the both of you. On the other hand, you could both get only 6 months each if you cooperated, but this is not a "stable strategy" -- you would have to put your fate in his hands, even if you could talk and make an arrangement.
http://en.wikipedia.org/wiki/Nash_equilibrium <quote> The Prisoner's dilemma has one Nash equilibrium: when both players defect. However, "both defect" is inferior to "both cooperate", in the sense that the total jail time served by the two prisoners is greater if both defect. The strategy "both cooperate" is unstable, as a player could do better by defecting while their opponent still cooperates. Thus, "both cooperate" is not an equilibrium. As Ian Stewart put it, "sometimes rational decisions aren't sensible!" <end quote>
Are there any good news for the "nice guys" then? Are they destined to be losers? Maybe not. The players of this game do not live in a real world. Their rationality is only informed by the rules of this particular game and their choice has no consequences outside the game. No retribution is expected outside the prison, no frowning colleagues are going to guarantine them, no society (a Mafia perhaps) is waiting for them outside to enforce a morality, no honor system or religion is there to tell them what is right. Odd that these factors could make them act to their actual best interest, to cooperate and get only 6 moths each -- but these are not included in the game anyway.
The iterative versions of the game, where there are just money rewards and you can punish the other player for his past behavior are a bit more realistic and very interesting. Wikipedia says that a winning strategy which has been tested in simulations is "Tit-for-Tat": You start by cooperating in good faith and then you do what your opponent did the last time. There is a danger of a deadlock here, because if one player ever defects then you are both stuck in an endless sequence of defections. These rule-based imaginary players can hold grudges forever... But you can get rid of it by being a bit generous: You respond to defection with cooperation 10% of the time, and the deadlock breaks.
So, is "Generous Tit-for-Tat" the way to go? It seems that this strategy has been defeated by another one: "Win-stay, Lose-shift".
Indirect reciprocity, assessment hardwiring and reputation http://www.edge.org/3rd_culture/sigmund04/sigmund04_index.html
This is a "must read"! You will find out about "Tit-for-Tat", "Win-Stay, Lose-Shift", and going to other people's funerals because otherwise they won't come to yours.
<snip> Later we found another even more robust strategy than generous Tit-for-Tat. This was later called Pavlov's strategy, a name that is not the best possible, but that has stuck. Pavlov's strategy says that you should cooperate if and only if in the previous round you and your co-player have done the same thing. According to this strategy:
If you both cooperated, then you cooperate. If you have both defected, then you should also cooperate. If you have cooperated and the other player has defected, then you should defect in the next round. If you defected and the other player has cooperated, then you should again defect in the next round.
At first glance the strategy looks bizarre, but in our computer simulation it turned out that it always won in an environment where mistakes were likely. In the end, it was almost always the dominating strategy in the population. Almost everyone was playing Pavlov's strategy, and it was very stable; it was much better than Tit-for-Tat.
Later we understood that this strategy is actually not so strange. It is the simplest learning mechanism that you can imagine. This is a Win-Stay, Lose-Shift learning mechanism that has already been studied in animals — for training horses and so on — for a hundred years. <end snip>
[rhinoceros] Case #4 did not seem to me intuitively right at first. It says "If you defected and the other player has cooperated, then you should again defect in the next round." But it does have a point: If you defected and fucked over the opponent who cooperated, you brace yourself for his retribution. And talking about hardwired evolutionarily viable animal behavior, we often notice in a personal conflict that the one who was in the wrong often becomes testy and defensively hostile -- not submissive. Does the limbic brain take control and use an evolved viable behavior which used to work well for animals? Well, maybe I am reading too much into this...
But this is not the end of the story. According to what Sigmunt says next, there are socially informed strategies specific to humans which go beyond "win-stay, lose-shift". The key concepts are "indirect reciprocity" and "reputation".
<snip> These ideas fed into our work on indirect reciprocity, a concept that was first introduced by Robert Trivers in a famous paper in the 1970s. I recall that he mentioned this idea obliquely when he wrote about something he called "general altruism". Here you give something back not to the person to whom you owe something, but to somebody else in society. He pointed out that this also works with regard to cooperation at a high level. Trivers didn't go into details, because at the time it was not really at the center of his thinking. He was mostly interested in animal behavior, and so far indirect reciprocity has not been proven to exist in animal behavior. It might exist in some cases, but ethologists are still debating the pros and cons.
In human societies, however, indirect reciprocity has a very striking effect. There is a famous anecdote about the American baseball player Yogi Berra, who said something to the effect of, "I make a point of going to other people's funerals because otherwise they won't come to mine." This is not as nonsensical as it seems. If a colleague of the university, for instance, goes faithfully to every faculty member's funeral, then the faculty will turn out strongly at his.
<snip>
From Trivers' work others derived models about indirect reciprocity, but they were the wrong types of models. People had been reading Axelrod and there were some abortive attempts at modeling indirect reciprocity and explaining it through game theory. Their conclusion was that reciprocity could not work except in groups of two, which have to interact for a long time. One idea was that the principle behind indirect reciprocity is that if I receive something, I'm more likely to give the next person who comes along. There might be something true about it, but there have been experiments showing that this principle by itself would not suffice with regard to explaining the stability of indirect reciprocity.
Then a famous scientist in Chicago named Richard Alexander, the director of the Museum of Natural History, wrote a book about the Darwinian evolution of morals. In this book he asked questions like, what is moral? And how do we start to form our ideas about what is good and bad? We look at what people do for society. We are always assessing the reputations of others, and are more likely to give to somebody who has a high reputation, someone who has in her or his past, given help t others — not necessarily to me, though, but to somebody. If I only give to a person with a high reputation, I channel my help to those who have proved their value for cooperation.
<snip>
At the same time, though, there have been theoreticians who have said that this model cannot work for a very simple reason: If you are discriminating and see that a recipient is a defector who has not given, then you will not give to that person. But at the same time that you punish him by not giving your own score will be diminished. Even if this act of not giving is fully justified in your eyes, the next person who sees only whether you have given or not in the past will infer, "Ah ha! You have not given, and therefore you are a bad guy, and you will not get anything from me." By punishing somebody you lower your own score and therefore you risk not receiving a benefit from a third person in the next round. Punishing somebody is a costly business, because it costs you future benefits.
The theoreticians then said this cannot work. Why should you engage in this act of punishment when it costs you something? This has been called a social dilemma. Punishing others is altruistic in the sense that if you didn't have this possibility of punishment, cooperation would vanish from the group. But this altruism costs you something. Your own reasoning should tell you that it's better to always give, because then your score will always be at a maximum. Therefore, your chances of receiving something will also be maximal. <end snip>
[rhinoceros] Next he discusses how all this may apply to e-commerce, specifically Amazon, eBay, and Google, and concludes:
<snip> I should stress that we have been talking here essentially about human nature. The more or less official idea that human beings are selfish and rational — an idea that nobody except economists really took seriously, and now even economists say that they never did — this idea has now been totally discredited. There are many experiments that show that spontaneous impulses like the tendency for fairness or acts of sympathy or generosity play a huge role in human life. <end snip>
|
|
|
|
hell-kite
Initiate
Gender:
Posts: 73 Reputation: 5.03 Rate hell-kite
feed me!
|
|
Re: virus:Prisoners Dilemma
« Reply #2 on: 2004-12-05 17:29:42 » |
|
very good post btw, rhino --- To unsubscribe from the Virus list go to <http://www.lucifer.com/cgi-bin/virus-l>
|
Othello. Thou dost conspire against thy friend, Iago, If thou but think'st him wrong'd, and mak'st his ear A stranger to thy thoughts.
|
|
|
simul
Adept
Gender:
Posts: 614 Reputation: 7.53 Rate simul
I am a lama.
|
|
Re: virus:Prisoners Dilemma
« Reply #3 on: 2004-12-06 00:24:34 » |
|
If you trust your partner deeply (think of a fractal dimension here), then 10 years won't seem as important as your relationship.
But police can always break apart trust. Trust is based entirely on past experience, cost and benefits, etc. It can be undermined by force/leverage. Love cannot be undermined by force. Love is unreasonable and thus yields superior results in prisoner's dilemma games (also in life, IMHO).
If you loved the other person, you would never confess - regardless of the risk. --- To unsubscribe from the Virus list go to <http://www.lucifer.com/cgi-bin/virus-l>
|
First, read Bruce Sterling's "Distraction", and then read http://electionmethods.org.
|
|
|
rhinoceros
Archon
Gender:
Posts: 1318 Reputation: 8.06 Rate rhinoceros
My point is ...
|
|
RE: virus:Prisoners Dilemma
« Reply #4 on: 2004-12-06 19:29:35 » |
|
[simil] If you trust your partner deeply (think of a fractal dimension here), then 10 years won't seem as important as your relationship.
But police can always break apart trust. Trust is based entirely on past experience, cost and benefits, etc. It can be undermined by force/leverage. Love cannot be undermined by force. Love is unreasonable and thus yields superior results in prisoner's dilemma games (also in life, IMHO).
If you loved the other person, you would never confess - regardless of the risk.
[rhinoceros] I agree with the last one, and I would add that many things other then love, such as spite for those who hold you captive or "old skool gangsta" ethics can keep you silent too. But let's get back to the game.
It is a fairly simple and interesting exercise to figure out this new version of the game. Love between the two prisoners is easier to model than those external moral considerations.
Let's see... 1. If you cooperate (don't talk) and he/she cooperates too, you each get 6 months in prison.
2. If you cooperate and he/she defects (tells on you), you get 20 years and he/she goes free.
3. If you defect and he/she cooperates, you go free and he/she gets 20 years.
4. If you both defect (tell on each other), you get 12 years each.
- Goal of the game: Maximize the opponent's (lover's) benefit, completely ignoring yours.
If you cooperate, the loved one gets either 6 months in prison or none, while if you defect the loved one gets either 12 years or 20 years. Obviously you choose to cooperate.
Now, if the loved one loves you too and has the same goal (to maximize your benefit and ignore his/hers), will choose to cooperate too. So, this mutual unconditional love gets 6 months in prison for both, the intuitionally best result. But if the loved one does not love you, you get 20 years and he/she goes free -- that should make you even happier, of course.
Erik mentioned trust. Of course, by adding trust (knowledge of the loved one's choice) some of the 4 cases can be eliminated making the game more trivial, as if one player was playing both sides: You both just walk to the desired result (if it is the same for both of you).
On the other hand, in games with a subjective perspective such as this one, where the two players have the same goal but not necessarily the same desired results, trust does not tell you much because of the possible second-guessing.
Now, let's add a constraint. You have a minimum concern for yourself. Let's say you don't want the worst -- you don't want to go to prison for 20 years. Now... that would be a real dilemma. The only way to make sure of this is to defect, and then your loved one may get 20 years if he/she cooperates. This constraint makes your initial goal untenable...
Any better suggestion for additional constraints anyone?
Corrections to the logic are welcome. <preemptive excuse>I scribbled this in haste.</preemptive excuse>
|
|
|
|
the.bricoleur
Archon
Posts: 341 Reputation: 8.29 Rate the.bricoleur
making sense of change
|
|
RE: virus:Prisoners Dilemma
« Reply #5 on: 2004-12-13 08:09:27 » |
|
This just came in on the memetics list:
New Tack Wins Prisoner's Dilemma
Proving that a new approach can secure victory in a classic strategy game, a team from England's Southampton University has won the 20th-anniversary Iterated Prisoner's Dilemma competition, toppling the long-term winner from its throne.
....
|
|
|
|
David Lucifer
Archon
Posts: 2642 Reputation: 8.75 Rate David Lucifer
Enlighten me.
|
|
RE: virus:Prisoners Dilemma
« Reply #6 on: 2004-12-13 12:52:57 » |
|
Quote from: Iolo Morganwg on 2004-12-13 08:09:27 This just came in on the memetics list:
New Tack Wins Prisoner's Dilemma
Proving that a new approach can secure victory in a classic strategy game, a team from England's Southampton University has won the 20th-anniversary Iterated Prisoner's Dilemma competition, toppling the long-term winner from its throne.
|
Interesting, but it looks to me like the Southampton team found a loophole in the rules that allowed them to cheat. (They entered 60 programs in the tournament that communicate with each other.)
|
|
|
|
rhinoceros
Archon
Gender:
Posts: 1318 Reputation: 8.06 Rate rhinoceros
My point is ...
|
|
RE: virus:Prisoners Dilemma
« Reply #7 on: 2004-12-13 13:51:13 » |
|
[Iolo Morganwg] This just came in on the memetics list:
New Tack Wins Prisoner's Dilemma http://www.wired.com/news/culture/0,1284,65317,00.html
Proving that a new approach can secure victory in a classic strategy game, a team from England's Southampton University has won the 20th-anniversary Iterated Prisoner's Dilemma competition, toppling the long-term winner from its throne.
[David Lucifer] Interesting, but it looks to me like the Southampton team found a loophole in the rules that allowed them to cheat. (They entered 60 programs in the tournament that communicate with each other.)
[rhinoceros] Interesting winning "cheat" for the iterative Prisoner's Dilemma. The winning strategy involved a "conspiracy", or should I just say "favoring your kin?". The "tit for tat" and "win-stay, lose-shift" strategies described in the article from edge.org which I posted recently were more "straight and honest", but I wonder whether they were as realistic.
Essentially the Southampton team entered the competition with multiple agents which recognized each other by "signaling" through the moves they made, and then played for the team. I think this "cheat", which won $50 for the Southampton team, does have a significance.
The article concludes:
<snip> "What's interesting from our point of view," he said, "was to test some ideas we had about teamwork in general agent systems, and this detection of working together as a team is a quite fundamental problem. What was interesting was to see how many colluders you need in a population. It turns out we had far too many -- we would have won with around 20."
Jennings is also interested in testing the strategy on an evolutionary variant of the game in which each player plays only its neighbors on a grid. If your neighbors do better than you do, you adopt their strategy.
"Our initial results tell us that ours is an evolutionarily stable strategy -- if we start off with a reasonable number of our colluders in the system, in the end everyone will be a colluder like ours," he said. <end snip>
|
|
|
|
|