Monte Carlo Tree Search in Finding Feasible Solutions for Course Timetabling Problem

— We are addressing the course timetabling problem in this work. In a university, students can select their favorite courses each semester. Thus, the general requirement is to allow them to attend lectures without clashing with other lectures. A feasible solution is a solution where this and other conditions are satisfied. Constructing reasonable solutions for course timetabling problem is a hard task. Most of the existing methods failed to generate reasonable solutions for all cases. This is since the problem is heavily constrained and an e ﬀ ective method is required to explore and exploit the search space. We utilize Monte Carlo Tree Search (MCTS) in finding feasible solutions for the first time. In MCTS, we build a tree incrementally in an asymmetric manner by sampling the decision space. It is traversed in the best-first manner. We propose several enhancements to MCTS like simulation and tree pruning based on a heuristic. The performance of MCTS is compared with the methods based on graph coloring heuristics and Tabu search. We test the solution methodologies on the three most studied publicly available datasets. Overall, MCTS performs better than the method based on graph coloring heuristic; however, it is inferior compared to the Tabu based method. Experimental results are discussed.


I. INTRODUCTION
Course timetabling problem (CTP) involves allocating courses to time slots and rooms to produce a satisfactory timetable satisfying several constraints. CTP is a variant of the combinatorial optimization problem (COP). It is a widely studied NP-complete problem due to its practical importance to universities. A feasible course timetable prevents clashes of courses allowing students to attend all the lectures for courses registered. As a result, lecturers are not required to conduct replacement classes for the clashing courses. A feasible timetable ensures that lecture rooms fulfill the requirements of courses by having enough seats and teaching equipment such as overhead projector, audio/video, Internet connection etc. By having a feasible timetable, it is possible to conduct courses (possibly related) in a specific sequence. For instance, some courses may require that lectures be conducted before tutorials or vice-versa. Finally, a feasible course timetable allows busy lecturers to perform courses at certain preferred times.
MCTS is considered as a comparatively new search methodology. It has caught the attention of the Artificial Intelligence (AI) community because of its success in the games area. The impact of MCTS in the territory of the game has inspired us to figure out the performance of MCTS in CTP which are dominated by local search methods. We apply MCTS in finding feasible solutions (solutions that fulfill all the hard constraints) for the first time. We propose several enhancements to MCTS, such as simulation and tree pruning based on a heuristic. The performance of MCTS is compared with that of the classical graph coloring approach as well as Tabu Search.

A. Problem Description
We are utilizing publicly available datasets (standard benchmarks) in this research. The datasets are Socha consists of 11 cases, ITC02 consists of 20 cases, and ITC07 consists of 24 cases. All the datasets have hard constraints, namely, student can only attend one course at one time, a room must fulfill the features required by a course, a room must provide enough seats for a course, and only one course is allowed in every time slot and room. ITC07 has two extra hard constraints consist of a course that must be assigned to one of the preset time slots and a course may be required to appear in a certain sequence.

B. Related Work
MCTS based programs are now comparable with the best human players [1], [2], [3]. MCTS is well known in games AI but it is seldom used for COP. Examples of the application of MCTS in COP are job shop scheduling [4], one player puzzle [5], reentrant scheduling problem [6] and production management problems [7]. To our knowledge, MCTS has never been utilized on CTP.
Various approaches have been developed in finding feasible solutions for CTP. One common approach is utilizing graph coloring heuristics. A hybrid of graph coloring heuristics was employed to construct an early solution by Sabar et al. where events were randomly assigned to time slots and rooms after sorting them by heuristics such as largest degree, largest enrolment and saturation degree [8]. Interested readers may refer to [9] and [10].
Another common approach is based on Tabu search. To find a feasible solution, Cambazard et al. performed a local search on randomly generated solutions [11]. In order to avoid an event from being repeatedly allocated the identical time slots, a Tabu list is maintained for certain number of iterations. They used neighborhood structures such as transferring a course to a vacant place, exchanging two courses, exchanging two-time slots, matching (courses are unassigned and reassigned within a time slot), transferring a course using matching and Hungarian move. Other examples of this approach can be found in [12] and [13]. Some authors are using methods based on the combination of graph coloring heuristics and Tabu search in finding feasible solutions. 100% feasibility is attained for all the cases of ITC07 by Lewis and Thompson using constructive heuristics and then PARTIALCOL algorithm [14]. Unassigned events were handled using a Tabu mechanism [15]. Refer to [16] and [17] for other similar methods.

A. MCTS
In MCTS, every state is constituted by a node in the tree and a directed link constitutes every action (leading to the state). Every node contains a value and a visit count. MCTS comprises four key steps specifically selection, expansion, simulation and back-propagation. These steps are repeated within available computation resources. Inside the selection step, the tree is covered from the root until a non-terminal node (with unvisited action) arrives. Inside the expansion step, a new child node is appended for the unvisited action. Inside the simulation step, a playout is carried out from the child node to generate an outcome. Inside the backpropagation step, the covered nodes (plus the child node) are updated with values from the outcome. Finally, the best child (child node having the greatest average value or visit count) of the root node is the selected move. The process is depicted in Fig. 1, from [18]. In our MCTS implementation, we assign events into time slots and use maximal matching for room assignment. The node and action classes are defined as shown in Fig. 2. Algorithm 1 shows the MCTS method. We set the initial solution initSol to best solution bestSol, list of events E to unassigned events uassigned and best cost f(bestSol) to the number of events in E. We make a node as the root node rootNode.
The iteration stops when the terminationCondition is true (when either we found a feasible solution or the elapsed time passes execution time t). At the start of every iteration, we set the current solution curSol to initSol, the list of remaining events remaining to events in E and the list of visited nodes visitedNode to vacant. We append the root node to visitedNode. Nodes visited during tree traversal are kept in the visitedNode.   end while 15: end method TREEGROWTH method is given in Algorithm 2. We set the currentNode to rootNode. While currentNode is not a leaf node, we select one of the children of currentNode as the current node by using the SELECTION method in Algorithm 7. We then append the current node to visitedNode. The event of currentNode is assigned to curSol according to the time slot of currentNode. We then remove that event from remaining. We try to expand the tree from the leaf node if the currentNode is leaf node. All the possible actions are selected and stored in the list of actions A by the GETACTIONS method (Algorithm 8). If A is not vacant, we append all actions in A as the children of currentNode using the EXPANSION method (Algorithm 11). We randomly choose one of the children as the child node childNode and append it to visitedNode. The event of childNode is assigned to curSol according to the time slot of childNode. We then remove that event from remaining. SIMULATION method is presented in Algorithm 3. We assign events to time slots in curSol. An event in remaining is returned according to heuristics by SELECTEVENT. SELECTTIMESLOT returns a time slot that is suitable for event. Dynamic Search Rearrangement (DSR) is used for event and time slot selection as it is the most effective heuristic based on experience. unplaced is a list of unplaced events that keeps events without any compatible time slot. We calculate f(curSol) as the number of events in unplaced. bestSol, f(bestSol) and unassigned are updated if f(curSol) is superior than f(bestSol). As reward is specified as the ratio of assigned events to the number of events, reward in the range of 0 to 1 is returned.
for all node in visitedNode 3: node.updateVisit() 4: node.updateValue(reward) 5: end for 6: end method We describe the SELECTION, GETACTIONS and EXPANSION methods stated earlier in detail below. In SELECTION method (Algorithm 5), we return a child with the highest UCB value among the children of currentNode. We balance the search exploration and exploitation by adjusting the constant B. More priority is given to the less frequently visited nodes when B is set higher. We set B as 0.0001.

Algorithm 5 1: method SELECTION (currentNode) 2:
return arg max i ∈ children of currentNode value i + B 3: end method GETACTIONS method is given Algorithm 6. To avoid the tree from getting too wide, we apply several heuristics used in graph coloring to filtrate the actions. Effectively, we prune the tree based on heuristics. Bear in mind that an action comprises of an event and a time slot. Events in remaining are returned based on heuristics by GETEVENTS. GETTIMESLOTS returns time slots that are suitable for e.

Algorithm 6
for all e in E 5: TS ← GETTIMESLOTS (e, remaining, curSol) 6: for all ts in TS 7: A ← A ∪ action (e, ts) 8: end for 9: end for 10: return A 11: end method EXPANSION method is shown in Algorithm 7. We append all actions in A as children of currentNode. Unlike the general implementation of MCTS (where a child node is appended whenever an unvisited action is encountered), we append multiple nodes at one time. Our intention is to save computation cost with the sacrifice of some memory.
append all actions in A as children of currentNode 3: end method

B. Benchmark: Graph Colouring Heuristic (GCH)
In the graph coloring problem, GCH is considered a classical approach. The heuristics derived from graph coloring problem are often utilized in CTP. Usually, difficult events are assigned first with the hope that easier events will be assigned later as the environment getting more restricted. Algorithm 8 presents the GCH method. At every iteration, an event is selected and assigned to a selected time slot. It is a one-pass method. In our experiments, Dynamic Search Rearrangement (DSR) [19] is utilized. DSR is a heuristic often used in constraint satisfaction problem. It is dynamic as the next selected event is determined at every iteration. In DSR, we select an event randomly from the set E={events with the least number of suitable time slots}. Next, we select a timeSlot randomly from the set P={time slots suitable for event and fit the least number of left events}. PARTIALCOL [14] was originally utilized in addressing graph coloring problems. [16], [20] and [15] adapted the algorithm in solving CTP. The TS method we tested here is based on PARTIALCOL. A neighbor move is a move of one event from unplaced to a time slot in curSol. At every iteration, we evaluate all the neighborhood moves by taking into consideration all suitable non-Tabu time slots for entire events in unplaced. All events conflicting with e (precedence or clash constraint) are temporarily shifted from curSol to unplaced in order to move an event e into a time slot feasibly. Maximal matching is used for room assignment sparingly as it is computationally expensive. A room is selected randomly among the suitable rooms and the relevant event is shifted from curSol to unplaced in case matching could not find a room for the specific event. We assess solutions (curSol, canSol, bestSol) utilizing the cost function f based on the number of unplaced events as given in Equation 1: (1) We record the neighbor move with the lowest candidate cost f(canSol) as bestEvent and bestSlot. We move the events conflicting with bestEvent from curSol to unplaced. We applied the best neighbor move by moving the bestEvent from unplaced to the bestSlot of curSlot. If f(curSlot) is superior than f(bestSol), bestSol, f(bestSol) and unassigned are updated. We prevent the events conflicting with bestEvent from returning to their original time slots for some iterations by utilizing the Tabu tenure in Equation 2: where |unplaced| is the number of unplaced events. We use the value 10 for the random element as the same value was used in [14], [15] and [20] and more importantly, it works well for all the datasets that we are working on. The value of Tabu tenure determines the level of search exploration. Most of the available moves are not reachable thus restricting the search when the value of Tabu tenure is set too high. Meanwhile, cycling tends to occur which may stall the search when the value is set too low. The iteration stops when a feasible solution is found (unplaced is vacant) or the elapsed time passes execution time t.

III. RESULTS AND DISCUSSION
We conducted the experiments on Intel Xeon (3.1GHz) with 4Gb RAM machines. We coded the algorithms utilizing Java language. The computation time limit (which is set by executing a benchmark program) for every execution is T=190 seconds.

A. Random Simulation vs. Heuristic Based Simulation
Domain knowledge is incorporated into playouts in order to make the simulation in MCTS more realistic [21], [22]. Here, the results of random simulation (random selection of events and time slots) and heuristic-based simulation (DSR) are compared. We attained 100% feasibility for Socha and ITC02 cases (as shown in Table I and II) when a heuristic is applied in the simulation phase of MCTS. As shown in Table III, MCTS with heuristic-based simulation is more effective than the one with random simulation for all the ITC07 cases. The algorithm encountered insufficient heap memory issue in the tree growth phase of MCTS, therefore no result is available as indicated by the dash symbols in Table II and III. In fact, we have extended the default heap memory size from 256Mb to 1.5Gb. However, the alloted 1.5Gb heap memory was worn out during the executions. An error message was prompted indicating this issue. Note that the tree is expanded by using all the possible actions (every action involves assigning an event to a time slot). Obviously, the results of random simulation are improved by heuristic based simulation for all the datasets considered.

B. Heuristic Based Tree Pruning
We attempt to prune the tree in MCTS to address the memory issue faced earlier. We expand the tree by using a certain number of actions (an action involves assigning an event to a time slot) instead of all actions at one time (as in the previous section). Several heuristic-based pruning mechanisms is compared in this section. The idea is inspired by the work in [23], [24], [25] where the authors utilized domain knowledge for pruning. We present the descriptions of the heuristics based on Algorithm 8 in Table IV. Note that simulation based on DSR is used here due to its effectiveness, as shown in the previous section. As shown in Table V and VI, 100% feasibility is attained for Socha and ITC02 cases regardless of heuristics applied for tree pruning. The same result is achieved even without pruning showing that these datasets are not challenging.   Memory issues are no longer faced when tree pruning is used in ITC07 cases. The maximum heap memory commitment during 31 executions for every problematic case in the previous section is presented in Table VII. Now, the maximum heap memory commitment is way below 1.5Gb (the allotted size). Note that we measured the heap memory sizes utilizing a tool supplied by Java Development Kit (JDK) called as Java Monitoring and Management Console. Among the pruning heuristic tested, MV-All is the most promising one as feasible solutions were found for all cases exclude cases 1, 2, 9, 10 and 22 as evident in Table VIII. Excitingly, these results are compatible with those of a constraint programming approach [11] which also could not construct a feasible solutions for cases 1, 2, 9 and 10. The author did not consider case 22 in his experiment which is possibly hidden by the competition organizer at that point in time. From observation, we get better results when pruning is applied using any heuristic. With pruning, the tree size is considerably reduced. In effect, pruning eliminates poor choices and guides the search to concentrate more time on finer options.

C. Comparing MCTS with GCH and TS
The attainment of MCTS, GCH, and TS in finding feasible solutions is compared in this section. The GCH considered here is based on the DSR heuristic. While for the MCTS, MV-All and DSR heuristics are utilized for the tree pruning and simulation phase respectively. As shown in Table IX, all three methods found feasible solutions for all Socha cases. Both MCTS and TS attained 100% feasibility.

E. Discussion
We were faced with a heap memory issue when all possible actions expand the tree at one time. It was intentional as expanding the tree by one action at a time is computationally expensive. This decision is necessary as the CTP that we are working on, is restrained by an execution time limit because of competition rules. The heap memory issue was addressed by pruning the tree based on heuristics. The number of nodes appended to the tree (and therefore heap memory commitment) was greatly reduced by tree pruning. However paths to good solutions may also be cut off. Computational experience shows that results are affected by the value of B (selection part of MCTS). For longer execution times, a higher value of B allows MCTS to explore the search space. For shorter execution time, a lower value of B is preferred so that MCTS can exploit the search space.
Unlike games like Go, MCTS did not work well for CTP. MCTS is lacking the flexibility provided by local search methodologies (e.g. TS). In every iteration of MCTS, events are constructively assigned. In other words, moves made cannot be changed. This suits perfectly for Go however not for timetabling as events can be unassigned and reassigned at any time. As a result, the search space connectivity offered by MCTS is lacking compared to that of a local search. The effort to hybridize the algorithm with local search is also hampered by the rigid tree structure of MCTS. In fact, local search is the key for a similar learning-based algorithms such as Ant Colony Optimization (ACO) in obtaining good results. The use of learning-based algorithms (MCTS) is restricted by the time limit imposed on the CTP. Usually, reasonable computational resources are required for this type of algorithm to perform effectively.

IV. CONCLUSION
Random and heuristic simulation (DSR) for MCTS were compared. Heuristic-based simulation seems to be superior to a random simulation. We believe simulation is made more practical by heuristics in comparison to random simulation. Several types of tree pruning heuristics such as MV-All, LD-All, SD-All and DSR were also tested. The efficacy of MTCS in constructing feasible solutions is vastly improved by tree pruning regarding the average number of unassigned events. MV-All performed the best out of the heuristics as shown by the empirical results presented. Effectively, heuristic-based simulation and tree pruning improved the performance of the basic MCTS for the CTP.
MCTS, GCH and TS were compared in finding feasible solutions. In terms of performance, MCTS was useful for Socha and ITC02 cases but lacking for ITC07 cases. Even with expanded execution time, MCTS was unable to construct a feasible solution for cases 10 and 22 of ITC07. Overall, MCTS performed superior to GCH but worse than TS in finding feasible solutions. MCTS requires time to perform competently and well suited for games like Go but not for the competitive and time-restricted CTP (competitions). Meanwhile, TS shows excellent potential in finding feasible solutions for the CTP.