06Adversarial

.pdf
School
University of Illinois, Urbana Champaign**We aren't endorsed by this school
Course
CSE 388
Subject
Computer Science
Date
Dec 17, 2024
Pages
74
Uploaded by MajorGull4766
Something something computer somethingEECS 492 Topic 6:Adversarial Search!You belong here.You may have heard of a growth mindset. It means that your brain really does physically change as you get better at hard tasks. Keep going to keep growing!Please download these slides for class: Canvas course, "Google Drive" linkxkcd.comSilver, David, et al. "Mastering the game of Go with deep neural networks and tree search."Nature529.7587 (2016): 484-489.
Background image
The type of multiagent games we'll discuss2-player (2 agents)Agents move in alternating turnsFully observableDeterministicAdversariesAgents have opposing goalsZero-sum game
Background image
Money Box!I choose a box.My opponent ("enemy") chooses a pile of their money in that box for me to take.I should choose the box with the "maximum minimum" value!$5 $2$10 $8$3 $15
Background image
Money Box! Game Tree$8$2$8$3$5 $2$10 $8$3 $15I choose a box.Enemy chooses a money pile.$5 $2$10 $8$3 $15For each state, how much is it worth (to me, Max) to be there?MaxMinIdea:Create complete game treeGet utility of each terminal stateAssume opponent (Min) chooses min-valued stateI (Max) choose max-valued stateLater, we'll consider:Improving efficiency with pruning and a heuristic.Terminal states. Value determined by utility functionfrom Max's perspective.Player name about to move
Background image
Adversarial Search TopicsLater in these slidesFor now: we'll pretend it's always practical to make a huge game treefor every move….Develop a new definition for “optimal move” to account for an adversary.PruningIgnore less relevant portions of the search treeCut down on time and memory!Heuristic evaluation functionApproximate true utility of a state without a complete search (or with imperfect information)
Background image
Example: Full Game Tree
Background image
Example: Full Game TreeRoot is current state.Maxmakes a decision by computing a minimax valuefor everystate (not just successors) and picking the best successor to move to.To discuss later: Well, not everystate…
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min3465464Exercise: Find the minimax values for the entire right subtree (rooted at C). Then, find the minimax value at A and choose Max's move.Computing Minimax Values by Hand
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224Solution: Max should move A B, and expects value 4 by the end of the game, assuming Min acts wisely.
Background image
minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]maxValue(g, s, p)sis a state where Max must move.Returns [value of s, move]with value from p(Max) perspective.If terminal, return utility (for Max)Else finds the value of sby getting the maximum (>) value among its children.minValue(g, s, p)sis a state where Min must move.Returns [value of s, move] with value from p(Max) perspective.If terminal, return utility (for Max)Else finds the value of sby getting the minimum (<) value among its children.minimaxSearch(g, s)Maxis at the root only call minimax when it's Max's turn.prepresents Max(always!)Finds best move and returns it.Minimax PseudocodeUtility is alwaysfrom Max's perspective!
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with AB
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with BD
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with DH
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with HP
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 5v: 5HQ next…HP
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with Observe that minimax is a depth-first search!
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 3v: 5 3[3, HQ]HQ
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 3v: -3DH
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 4v: 3 4Working through Isubtree similarly,minValue for Ireturns 4…[4, DI]DI
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 4v: ∞ 4BD
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with Working through Esubtree similarly,maxValue for Ereturns 6…v2: 6v: 4[4, BD]BE
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 4v: -4AB
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with v2: 2v: 4Working through Csubtree similarly,minValue for Creturns 2…[4, AB]AC
Background image
BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with [4, AB]AB
Background image
This is the game tree for just the first moveof the game!Back to Tic Tac Toe…Max needs to make its 1stmove.
Background image
Back to Tic Tac Toe…At each turn, look at tree with current state as root.We might cachesome/all of the tree so we don't have to recompute…Soon we'll discuss heuristicsand pruningto evaluate just part of the tree.For now: Just pretend it's practical to recompute the entire tree from the current state for every move.Max needs to make its 2ndmove.Also note: We never need to compute the tree with Min at the root, because we're Max! Min makes its own decisions.
Background image
PropertiesHow efficient is minimax? Assume:Maximum depth: mLegal moves at each point: bMinimax is a depth-first searchTime complexity: O(bm)Space complexity:O(bm) generate all actions at onceO(m) generate one action at a timeNot good! Will limit how far we can actually look ahead in the game tree.
Background image
LimitationsChessBranching factor of 35Games often include 50 moves/playerSearch space: 3550*2~ 10154Traditional search infeasibleProblem: Under our current assumptions,minimax has to explore the entire tree!
Background image
What Can We Do?We can ignore moves that are guaranteed not to be useful.Prune the search tree: Alpha-Beta Pruning
Background image
Example where pruning is possibleMINIMAX(root) = max(min(3, 12, 8), min(2, x, y), min(14, 5, 2))= max(3, min(2, x, y), 2)= max(3, z, 2) where z = min(2, x, y) 2= 3B (3)C (?)D (2)(3)(12)(8)(2)xy(14)(5)(2)a1a2a3b1b2b3c1c2c3d1d2d3A (?)Decision independent of values of pruned leaves x and y!MAXMIN
Background image
Alpha-Beta Pruning Bigger Example??????????MaxMinMaxMin
Background image
Alpha-Beta Pruning Bigger Example??????????MaxMinMax
Background image
Alpha-Beta Pruning Bigger Example?????????8+MaxMinMax8
Background image
Alpha-Beta Pruning Bigger Example?????????8+MaxMinMax8
Background image
Alpha-Beta Pruning Bigger Example?????????8+7MaxMinMax8
Background image
Alpha-Beta Pruning Bigger Example?????????873MaxMinMax8
Background image
Alpha-Beta Pruning Bigger Example8-????????873MaxMinMax8
Background image
Alpha-Beta Pruning Bigger Example8-????????873MaxMinMax89
Background image
Alpha-Beta Pruning Bigger Example8-???????9+873MaxMinMax89
Background image
Alpha-Beta Pruning Bigger Example8-???????9+873MaxMinMax89
Background image
Alpha-Beta Pruning Bigger Example8??8+????9+873MaxMinMax89
Background image
Alpha-Beta Pruning Bigger Example8??8+???2+9+873MaxMinMax892
Background image
Alpha-Beta Pruning Bigger Example8??8+???49+873MaxMinMax8924
Background image
Alpha-Beta Pruning Bigger Example84-?8+???49+873MaxMinMax8924
Background image
Alpha-Beta Pruning Bigger Example84-?8+???49+873MaxMinMax8924
Background image
Alpha-Beta Pruning Bigger Example84-?8+3+??49+873MaxMinMax89243
Background image
Alpha-Beta Pruning Bigger Example84-?8+3+??49+873MaxMinMax89243
Background image
Alpha-Beta Pruning Bigger Example84-?8+9+??49+873MaxMinMax892439
Background image
Alpha-Beta Pruning Bigger Example84-?8+9+??49+873MaxMinMax8924393
Background image
Alpha-Beta Pruning Bigger Example84-9-8+9??49+873MaxMinMax8924393
Background image
Alpha-Beta Pruning Bigger Example84-9-8+93+?49+873MaxMinMax89243933
Background image
Alpha-Beta Pruning Bigger Example84-9-8+93+?49+873MaxMinMax892439331
Background image
Alpha-Beta Pruning Bigger Example84-9-8+93+?49+873MaxMinMax892439331
Background image
Alpha-Beta Pruning Bigger Example84-9-8+93+?49+873MaxMinMax8924393312
Background image
Alpha-Beta Pruning Bigger Example84-38+93?49+873MaxMinMax8924393312
Background image
Alpha-Beta Pruning Bigger Example84-3893?49+873MaxMinMax8924393312Conclusion: Go left, expecting value of 8.
Background image
Effectiveness is highly dependent on order in which states are examined84-3-83??49+873MaxMinMax8924312What if we happened to switch the order in which we look at these two subtrees?
Background image
See R&N for pseudocodeCan be applied to trees of any depthOften can prune entire subtrees!Alpha-Beta Overview
Background image
Best case: O(bm/2) (if we can identify the best successors first)Random case: O(b3m/4) Worst case: O(bm)Let b* be the effective branching factor when pruning is done. (When number of nodes to consider is reduced.)In the best case (max branches pruned), we have:For chess branching decreases from 35 to ~6Time Complexity of Alpha-BetaMaximal depth: mLegal moves at each point: b
Background image
Another example to try on your own
Background image
Running in “real-time”How can we improve the efficiency of adversarial search?
Background image
Evaluation FunctionsMinimax depends on utility function applied to a terminal state.Idea:Instead use an evaluation function applied to any state,then prune & compute min/max values from thereEvaluation function(a heuristic):For terminal states, should return same as utility function For non-terminal states, should return a reasonable prediction for final utilityIs Max winning by a lot? Just a little? Losing by a lot?Computation shouldn't take too long.Example:Utility function: -1 for Max loss, +1 for Max win.Evaluation function: … 0.9 for Max doing very well, 0.1 for Max just slightly ahead…
Background image
Evaluation Function via FeaturesOne option: use properties of the environmentFeaturesdescribe current world stateThen, for example, we could use:Assumes independence of the features, which could lead to less accurate evaluations.Developing a good Eval can be very tricky…Uses properties of the environment (e.g., board positions)aka attributes, measurements…
Background image
Cutting Off the Search: H-MinimaxProblem we are addressing: Alpha-beta pruning has to search all the way to terminal states... Approach #1: Use a depth-limited search, and apply Eval to states at the depth limit.Approach #2: Use iterative deependinguntil out of time for that turn…Approach #3: Use a more general cutoff test(not necessarily a depth limit) that determines when to not expand a node and instead apply Eval.Quiescent searchdon't expand quiescent states (unlikely to change much in near future)Challenge: Horizon effectDelaying inevitable bad situation to beyond the "horizon" (e.g., the depth limit)English vocab:quiescent: in a state or period of inactivity or dormancy"H" for heuristic(evaluation function)
Background image
Up NextConstraint satisfaction!
Background image
Background image
Optional Section:Stochasticity in GamesAdding another layer of complexity…
Background image
Stochastic GamesUncertainty also relevant for gamesCan extend turn-playing model to include chance elementsBigger trees…E.g., backgammon
Background image
Backgammon in a NutshellGoal:Move all pieces off boardWhite moves towards 24Black moves towards 1MovementBased on roll of dicePiece can move to any position unless multiple opponents are presentIf one opponent- can capture and force restart241
Background image
Why is this Hard?What white knows:Its own legal movesWhat white doesn’t know:What black will doWhat black will roll the new part…How we will deal with this:Chance nodes!
Background image
ExpectiminimaxBluenodes: MaxRednodes: MinBlacknodes: Chance(lower-case letters: possible events)Root472Cutoff / terminalChanceMinChanceMax2310bcdvwxyzIn this example, assuming b, c, d equally likely:54356219Max now knows what move to make for each roll. Hope for z! But will have to see…Max "rolls" v, w, x, y, or zMax chooses moveMin "rolls" b, c, or dMin chooses
Background image
Stepping BackWhat if we knew the dice rolls?Just minmax!Time complexity: O(bm)We must consider allpossible dice rollsn= number of distinct chance outcomes (e.g., 6 for a single standard die)Time complexity: O(bmnm)Can’t leverage alpha-beta pruning in the same way…Maximal depth: mLegal moves at each point: bNumber of chance outcomes: n
Background image