This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision | ||

cs-236:optimizing-rule-evaluation [2017/11/29 13:56] brob144 [Rule Grouping and Evaluation Order] |
cs-236:optimizing-rule-evaluation [2017/12/22 21:21] (current) kylej13 [Rule Grouping and Evaluation Order] |
||
---|---|---|---|

Line 9: | Line 9: | ||

===Part 1: strongly connect components=== | ===Part 1: strongly connect components=== | ||

Implement a graph class to support the rule optimization algorithm. The class must be able to | Implement a graph class to support the rule optimization algorithm. The class must be able to | ||

- | * Report a topological ordering of all nodes in the graph (using post-order traversal) | + | * Report a topological ordering of all nodes in the graph (using post-order traversal, see Part 2 for more details) |

- | * Report all nodes reachable from a depth-first search started on a given node | + | |

* Invert the graph (reverse all the edges) | * Invert the graph (reverse all the edges) | ||

* Compute the set of all strongly connected components | * Compute the set of all strongly connected components | ||

Line 16: | Line 15: | ||

* '''Command line''': no command line arguments | * '''Command line''': no command line arguments | ||

* '''Input''': none | * '''Input''': none | ||

- | * '''Output''': A '''pass'''/'''fail''' report for each test | + | * '''Output''': A '''pass'''/'''fail''' report for each test, based on comparison of results with expected output |

The pass-off is based on the quality of tests and whether or not the solution passes. | The pass-off is based on the quality of tests and whether or not the solution passes. | ||

Line 30: | Line 29: | ||

1 passes: R0 | 1 passes: R0 | ||

</code> | </code> | ||

- | The input to output relation is given more precisely in the [[optimizing-rule-evaluation#Examples | example section]]. As before, order matters! See below for details. | + | The input to output relation is given more precisely in the [[optimizing-rule-evaluation#Examples | example section]]. As before, order matters! Be especially careful that when you have multiple rules in a SCC that you output them in least to greatest order numerically. See below for more details. |

* '''Command line''': a single argument indicating the input file | * '''Command line''': a single argument indicating the input file | ||

* '''Input''': a valid datalog program | * '''Input''': a valid datalog program | ||

Line 61: | Line 60: | ||

A strongly connected component is a maximal set of vertices such that every vertex in the set can reach every other vertex in the set through some path that never leaves the set. The SCCs in a graph partition the vertices into disjoint sets. The algorithm to computes SCCs proceeds in three distinct steps | A strongly connected component is a maximal set of vertices such that every vertex in the set can reach every other vertex in the set through some path that never leaves the set. The SCCs in a graph partition the vertices into disjoint sets. The algorithm to computes SCCs proceeds in three distinct steps | ||

# Create the reverse graph: all the edges are reversed | # Create the reverse graph: all the edges are reversed | ||

- | # Compute the topological ordering for the vertices using the reversed graph: the order of leaving the vertices in the depth-first search | + | # Compute the topological ordering for the vertices using the reversed graph (the order of leaving the vertices in the depth-first search) |

# Following the topological ordering on vertices from the last vertex the search left to the first vertex the search left, start a depth-first search on the original graph, and any vertex visited in that search belongs to the SCC. Repeat until all the vertices are visited. | # Following the topological ordering on vertices from the last vertex the search left to the first vertex the search left, start a depth-first search on the original graph, and any vertex visited in that search belongs to the SCC. Repeat until all the vertices are visited. | ||

Each step merits some further explanation. | Each step merits some further explanation. | ||

Line 67: | Line 66: | ||

Reversing the edges in step 1 is very direct. Returning to the example from the previous section, the reversed graph as an adjacency list is | Reversing the edges in step 1 is very direct. Returning to the example from the previous section, the reversed graph as an adjacency list is | ||

<code> | <code> | ||

- | Reverse Forest | + | Reverse Dependency Graph |

R0: R1 | R0: R1 | ||

R1: R0,R2 | R1: R0,R2 | ||

Line 74: | Line 73: | ||

R4: R4 | R4: R4 | ||

</code> | </code> | ||

- | The reverse graph is the graph used to compute the topological ordering. It is tempting to compute this ordering on the original forest, and in many cases it works, but such an approach does not work for any arbitrary graph (see section 3.4 of ''[http://www.cs.berkeley.edu/~vazirani/algorithms/chap3.pdf Algorithms]'' for an explanation). It is equally tempting to try to follow the edges backwards on the original graph, but why deal with such added complexity? | + | The reverse graph is the graph used to compute the topological ordering. It is tempting to compute this ordering on the original graph, and in many cases it works, but such an approach does not work for any arbitrary graph (see section 3.4 of ''[http://www.cs.berkeley.edu/~vazirani/algorithms/chap3.pdf Algorithms]'' for an explanation). It is equally tempting to try to follow the edges backwards on the original graph, but why deal with such added complexity? |

- | The topological ordering is computed by looping through the vertices (rules) in the order they appear in the Datalog program and performing a depth-first search starting from that node if it has not already been visited. During the post-order traversal, push each vertex onto a stack when the depth-first search '''leaves''' the vertex after visiting it. The topological ordering is based on the order of the vertices in the stack after all nodes have been visited. For the example, the topological ordering, having started the depth-first search from vertex ''R0'' is | + | The topological ordering is computed by looping through each vertex in the graph and performing a depth-first search starting from that vertex if it has not already been visited in a previous search. During post-order traversal, push each vertex onto a stack when the depth-first search ''leaves'' the vertex after visiting it. The topological ordering is based on the order of the vertices in the stack after all nodes have been visited. Be sure that the vertices are looped through in the same order that the rules appear in the Datalog program. For the example, the topological ordering, having started the depth-first search from vertex ''R0'' is |

<code> | <code> | ||

POTN(R2) = 0 | POTN(R2) = 0 | ||

Line 85: | Line 84: | ||

</code> | </code> | ||

- | The last step to compute SCCs starts a depth-first search from the greatest post-order traversal number (POTN) to the least POTN on the original graph. In the view of a stack, that is the top of the stack to the bottom of the stack. Any vertex visited in the search belongs to the same SCC. The process is repeated until everything is visited in the graph. For the running example, the first depth-first search is started from vertex ''R3'' to create the SCC of ''{R3}''. The ''R3'' vertex is removed from the stack. ''R4'' is now the top, the depth-first search creates the SCC ''{R4}'', and it is then removed from the stack as well. ''R0'' is now the top of the stack, so a depth-first search is started at ''R0''. '''The depth-first search must not reset the visited information between searches.''' The visited information is what prevents the search from moving into another already completed SCC. The second depth-first search from ''R0'' yields the SCC ''{R0, R1, R2}''. The vertex is popped from the stack leaving ''R1'' on top. ''R1'' has already been visited so it is popped. The same is true for ''R2''. | + | The last step to compute SCCs starts a depth-first search from the greatest post-order traversal number (POTN) to the least POTN on the ''original graph''. In the view of a stack, that is the top of the stack to the bottom of the stack. The process is repeated until everything is visited in the graph. This produces a DFS forest where each tree represents one SCC. For the running example, the first depth-first search is started from vertex ''R3'' to create the SCC of ''{R3}''. The ''R3'' vertex is removed from the stack. ''R4'' is now the top, the depth-first search creates the SCC ''{R4}'', and it is then removed from the stack as well. ''R0'' is now the top of the stack, so a depth-first search is started at ''R0''. '''The depth-first search must not reset the visited information between searches.''' The visited information is what prevents the search from moving into another already completed SCC. The second depth-first search from ''R0'' yields the SCC ''{R0, R1, R2}''. The vertex is popped from the stack leaving ''R1'' on top. ''R1'' has already been visited so it is popped. The same is true for ''R2''. If you printed out the strongly connected components in the order you discovered them it would look like: |

+ | <code> | ||

+ | Strongly Connected Components: | ||

+ | {R3} | ||

+ | {R4} | ||

+ | {R0, R1, R2} | ||

+ | </code> | ||

==Rule evaluation== | ==Rule evaluation== | ||

Rules are evaluated in groups defined by the SCCs, and the SCCs are processed in the order of their discovery from the algorithm (FIFO order). A given SCC is called ''trivial'' if it only contains a single rule and there is no self-loop on that rule in the dependency graph. A trivial SCC requires no iteration when evaluating the associated rule. A single rule evaluation is sufficient. A given non-trivial SCC must evaluate the associated rules in the SCC until a fix-point is reached. Both SCCs in the example are non-trivial and require iteration. All facts are known once the SCCs are processed. Evaluate rules in an SCC following the order of the Rule IDs. | Rules are evaluated in groups defined by the SCCs, and the SCCs are processed in the order of their discovery from the algorithm (FIFO order). A given SCC is called ''trivial'' if it only contains a single rule and there is no self-loop on that rule in the dependency graph. A trivial SCC requires no iteration when evaluating the associated rule. A single rule evaluation is sufficient. A given non-trivial SCC must evaluate the associated rules in the SCC until a fix-point is reached. Both SCCs in the example are non-trivial and require iteration. All facts are known once the SCCs are processed. Evaluate rules in an SCC following the order of the Rule IDs. | ||

Line 112: | Line 117: | ||

Yes. Please review the [[Project Standards|project standards]] for details. The time-bound is strictly enforced. | Yes. Please review the [[Project Standards|project standards]] for details. The time-bound is strictly enforced. | ||

+ | |||

+ | '''I thought my project is set up correctly, but the test cases are taking forever to run. What is happening?''' | ||

+ | |||

+ | Are you running your program in Visual Studio? Visual Studio in debug mode tends to slow down the running time of your project by a huge factor. Try running the test cases either outside of Visual Studio (in the command line) or set up Visual Studio to run in release mode ([[https://msdn.microsoft.com/en-us/library/wx0123s5.aspx|here]] are instructions from Microsoft on how to do that). | ||

+ | |||

+ | Other things that may be influencing your run time include what computer you are running your program on (run the program on the lab computers for the best approximation of how it will run on the TA computers) and using a vector to store the tuples in each Relation instead of a set (if you have to sort the vector manually, that can really slow down run time). | ||

==Submission== | ==Submission== | ||

Review the [[project-standards | project standards]] for creating a zip archive. | Review the [[project-standards | project standards]] for creating a zip archive. |