06Adversarial

.pdf

School

University of Illinois, Urbana Champaign**We aren't endorsed by this school

Course

CSE 388

Subject

Computer Science

Date

Dec 17, 2024

Pages

Uploaded by MajorGull4766

Something something computer somethingEECS 492 Topic 6:Adversarial Search!You belong here.You may have heard of a growth mindset. It means that your brain really does physically change as you get better at hard tasks. Keep going to keep growing!Please download these slides for class: Canvas course, "Google Drive" linkxkcd.comSilver, David, et al. "Mastering the game of Go with deep neural networks and tree search."Nature529.7587 (2016): 484-489.

The type of multiagent games we'll discuss•2-player (2 agents)•Agents move in alternating turns•Fully observable•Deterministic•Adversaries•Agents have opposing goals•Zero-sum game

Money Box!•I choose a box.•My opponent ("enemy") chooses a pile of their money in that box for me to take.I should choose the box with the "maximum minimum" value!$5 $2$10 $8$3 $15

Money Box! Game Tree$8$2$8$3$5 $2$10 $8$3 $15I choose a box.Enemy chooses a money pile.$5 $2$10 $8$3 $15For each state, how much is it worth (to me, Max) to be there?MaxMinIdea:•Create complete game tree•Get utility of each terminal state•Assume opponent (Min) chooses min-valued state•I (Max) choose max-valued stateLater, we'll consider:•Improving efficiency with pruning and a heuristic.Terminal states. Value determined by utility functionfrom Max's perspective.Player name about to move

Adversarial Search TopicsLater in these slides…For now: we'll pretend it's always practical to make a huge game treefor every move….•Develop a new definition for “optimal move” to account for an adversary.•Pruning•Ignore less relevant portions of the search tree•Cut down on time and memory!•Heuristic evaluation function•Approximate true utility of a state without a complete search (or with imperfect information)

Example: Full Game Tree

Example: Full Game TreeRoot is current state.Maxmakes a decision by computing a minimax valuefor everystate (not just successors) and picking the best successor to move to.To discuss later: Well, not everystate…

BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min3465464Exercise: Find the minimax values for the entire right subtree (rooted at C). Then, find the minimax value at A and choose Max's move.Computing Minimax Values by Hand

BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224Solution: Max should move A →B, and expects value 4 by the end of the game, assuming Min acts wisely.

minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -∞for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]maxValue(g, s, p)•sis a state where Max must move.•Returns [value of s, move]with value from p(Max) perspective.•If terminal, return utility (for Max)•Else finds the value of sby getting the maximum (>) value among its children.minValue(g, s, p)•sis a state where Min must move.•Returns [value of s, move] with value from p(Max) perspective.•If terminal, return utility (for Max)•Else finds the value of sby getting the minimum (<) value among its children.minimaxSearch(g, s)•Maxis at the root –only call minimax when it's Max's turn.•prepresents Max(always!)•Finds best move and returns it.Minimax PseudocodeUtility is alwaysfrom Max's perspective!

BDEHIJKPQRSTUVWMaxMinTerminalMax53468789CFGLMNOXYZ@#$%&A86562812Min346546478218224minimaxSearch(g, s) returns movepg.toMove(s)v, movemaxValue(g, s, p)return movemaxValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= -∞for each ain g.actions(s) dov2, a2minValue(g, g.result(s, a), p)if v2> vv, movev2, areturn [v, move]minValue(g, s, p) returns [v, move]if g.isTerminal(s)return [g.utility(s, p), null]v= ∞for each ain g.actions(s) dov2, a2maxValue(g, g.result(s, a), p)if v2< vv, movev2, areturn [v, move]sindicated withcurrent code with

This is the game tree for just the first moveof the game!Back to Tic Tac Toe…Max needs to make its 1stmove.

Back to Tic Tac Toe…At each turn, look at tree with current state as root.We might cachesome/all of the tree so we don't have to recompute…Soon we'll discuss heuristicsand pruningto evaluate just part of the tree.For now: Just pretend it's practical to recompute the entire tree from the current state for every move.Max needs to make its 2ndmove.Also note: We never need to compute the tree with Min at the root, because we're Max! Min makes its own decisions.

Properties•How efficient is minimax? Assume:•Maximum depth: m•Legal moves at each point: b•Minimax is a depth-first search•Time complexity: O(bm)•Space complexity:•O(bm) –generate all actions at once•O(m) –generate one action at a timeNot good! Will limit how far we can actually look ahead in the game tree.

Limitations•Chess•Branching factor of 35•Games often include 50 moves/player•Search space: 3550*2~ 10154•Traditional search infeasible•Problem: Under our current assumptions,minimax has to explore the entire tree!

What Can We Do?•We can ignore moves that are guaranteed not to be useful.•Prune the search tree: Alpha-Beta Pruning

Example where pruning is possibleMINIMAX(root) = max(min(3, 12, 8), min(2, x, y), min(14, 5, 2))= max(3, min(2, x, y), 2)= max(3, z, 2) where z = min(2, x, y) ≤2= 3B (3)C (?)D (2)(3)(12)(8)(2)xy(14)(5)(2)a1a2a3b1b2b3c1c2c3d1d2d3A (?)Decision independent of values of pruned leaves x and y!MAXMIN

Alpha-Beta Pruning –Bigger Example??????????MaxMinMaxMin

Alpha-Beta Pruning –Bigger Example??????????MaxMinMax

Alpha-Beta Pruning –Bigger Example?????????8+MaxMinMax8

Alpha-Beta Pruning –Bigger Example?????????8+7MaxMinMax8

Alpha-Beta Pruning –Bigger Example?????????873MaxMinMax8

Alpha-Beta Pruning –Bigger Example8-????????873MaxMinMax8

Alpha-Beta Pruning –Bigger Example8-????????873MaxMinMax89

Alpha-Beta Pruning –Bigger Example8-???????9+873MaxMinMax89

Alpha-Beta Pruning –Bigger Example8??8+????9+873MaxMinMax89

Alpha-Beta Pruning –Bigger Example8??8+???2+9+873MaxMinMax892

Alpha-Beta Pruning –Bigger Example8??8+???49+873MaxMinMax8924

Alpha-Beta Pruning –Bigger Example84-?8+???49+873MaxMinMax8924

Alpha-Beta Pruning –Bigger Example84-?8+3+??49+873MaxMinMax89243

Alpha-Beta Pruning –Bigger Example84-?8+9+??49+873MaxMinMax892439

Alpha-Beta Pruning –Bigger Example84-?8+9+??49+873MaxMinMax8924393

Alpha-Beta Pruning –Bigger Example84-9-8+9??49+873MaxMinMax8924393

Alpha-Beta Pruning –Bigger Example84-9-8+93+?49+873MaxMinMax89243933

Alpha-Beta Pruning –Bigger Example84-9-8+93+?49+873MaxMinMax892439331

Alpha-Beta Pruning –Bigger Example84-9-8+93+?49+873MaxMinMax8924393312

Alpha-Beta Pruning –Bigger Example84-38+93?49+873MaxMinMax8924393312

Alpha-Beta Pruning –Bigger Example84-3893?49+873MaxMinMax8924393312Conclusion: Go left, expecting value of 8.

Effectiveness is highly dependent on order in which states are examined84-3-83??49+873MaxMinMax8924312What if we happened to switch the order in which we look at these two subtrees?

•See R&N for pseudocode•Can be applied to trees of any depth•Often can prune entire subtrees!Alpha-Beta Overview

•Best case: O(bm/2) (if we can identify the best successors first)•Random case: O(b3m/4) •Worst case: O(bm)•Let b* be the effective branching factor when pruning is done. (When number of nodes to consider is reduced.)•In the best case (max branches pruned), we have:•For chess –branching decreases from 35 to ~6Time Complexity of Alpha-BetaMaximal depth: mLegal moves at each point: b

Another example to try on your own

Running in “real-time”How can we improve the efficiency of adversarial search?

Evaluation Functions•Minimax depends on utility function applied to a terminal state.•Idea:•Instead use an evaluation function applied to any state,then prune & compute min/max values from there•Evaluation function(a heuristic):•For terminal states, should return same as utility function •For non-terminal states, should return a reasonable prediction for final utility•Is Max winning by a lot? Just a little? Losing by a lot?•Computation shouldn't take too long.•Example:•Utility function: -1 for Max loss, +1 for Max win.•Evaluation function: … 0.9 for Max doing very well, 0.1 for Max just slightly ahead…

Evaluation Function via Features•One option: use properties of the environment•Features–describe current world state•Then, for example, we could use:•Assumes independence of the features, which could lead to less accurate evaluations.•Developing a good Eval can be very tricky…Uses properties of the environment (e.g., board positions)aka attributes, measurements…

Cutting Off the Search: H-Minimax•Problem we are addressing: Alpha-beta pruning has to search all the way to terminal states... •Approach #1: Use a depth-limited search, and apply Eval to states at the depth limit.•Approach #2: Use iterative deependinguntil out of time for that turn…•Approach #3: Use a more general cutoff test(not necessarily a depth limit) that determines when to not expand a node and instead apply Eval.•Quiescent search–don't expand quiescent states (unlikely to change much in near future)•Challenge: Horizon effect•Delaying inevitable bad situation to beyond the "horizon" (e.g., the depth limit)English vocab:quiescent: in a state or period of inactivity or dormancy"H" for heuristic(evaluation function)

Up Next•Constraint satisfaction!

Optional Section:Stochasticity in GamesAdding another layer of complexity…

Stochastic Games•Uncertainty also relevant for games•Can extend turn-playing model to include chance elements•→Bigger trees…•E.g., backgammon

Backgammon in a Nutshell•Goal:•Move all pieces off board•White moves towards 24•Black moves towards 1•Movement•Based on roll of dice•Piece can move to any position unless multiple opponents are present•If one opponent- can capture and force restart241

Why is this Hard?•What white knows:•Its own legal moves•What white doesn’t know:•What black will do•What black will roll –the new part…•How we will deal with this:•Chance nodes!

Expectiminimax•Bluenodes: Max•Rednodes: Min•Blacknodes: Chance(lower-case letters: possible events)Root472Cutoff / terminalChanceMinChanceMax2310…bcdvwxyzIn this example, assuming b, c, d equally likely:…54356219Max now knows what move to make for each roll. Hope for z! But will have to see…Max "rolls" v, w, x, y, or zMax chooses moveMin "rolls" b, c, or dMin chooses

Stepping Back•What if we knew the dice rolls?•Just minmax!•Time complexity: O(bm)•We must consider allpossible dice rolls•n= number of distinct chance outcomes (e.g., 6 for a single standard die)•Time complexity: O(bmnm)•Can’t leverage alpha-beta pruning in the same way…Maximal depth: mLegal moves at each point: bNumber of chance outcomes: n