Developing Big Data Analytics Course for Non-ICT Major University Students

— In the fourth industrial revolution's education era, there is no boundary between majors or subjects, and it is common for university students to enrol in information and communication technology (ICT)-related courses as convergence education blends different disciplines. Today's job market is getting more competitive and requiring higher skills in ICT and computational thinking. Since non-ICT major students rarely have programming experiences and knowledge in regular classes, teaching a big data analytics course for non-ICT major university students is not easy. Thus, it is vital to develop a curriculum that comprises easy-to-follow and easy-to-understand modules. In this paper, we develop a big data analytics course for non-ICT major university students. The proposed big data analytics course for non-ICT major students comprises two parts: (1) basic programming skill modules with step-by-step guidelines and (2) extension to big data analytics modules with laboratory exercises, with the five principal programming modules based on the Python programming language. First, our investigation discusses the suggestions and limitations of the big data analytics course for non-ICT major university students. Then, we recommend programming languages, integrated development environments (IDEs), and useful tools that help learners perform programming exercises and milestone projects. The learning objectives and course design models are carefully selected based on Bloom's taxonomy with six thinking levels and five.


I. INTRODUCTION
We are living in a digitized era, where digital data are expecting to reach 44 zettabytes (1,000 7 bytes) by 2020, and 57 percent of the world population used the Internet [1], [2]. Moreover, new technologies such as 5G and the Internet of Things (IoT) are emerging, storing, processing, and analyzing big data are essential capabilities for job applicants [3], [4].
At the same time, many universities in Korea operate convergence education courses for whole students. These courses include programming languages (such as C, Java, Python, and Scratch) and computational thinking [5], [6]. Since convergence education blends different disciplines, students majoring in humanities (as well as information and communication technology (ICT) major students) are encouraged to enroll the convergence courses [7].
However, the convergence education courses, including programming languages and computational thinking, should be different from the courses for computer science (CS) major students [8], [9]. For non-ICT major students, the convergence education courses should be easy to follow and understandable. In this paper, we design and propose a big data analytics course for non-ICT major students to extend the convergence education courses.
The proposed big data analytics course for non-ICT major students is composed of two parts: basic programming skill modules with step-by-step guidelines and extensions to big data analytics modules with laboratory exercises. Since the expected attendances of the big data analytics course rarely have programming language knowledge and experiences, the basic programming skill modules are necessary. Furthermore, it should have extensions to the big data analytics modules with appropriate laboratory exercises, which help perform milestone projects.
The rest of this paper is organized as follows. Section 2 describes our learning objectives and related work for designing and implementing the ICT-related courses in universities. Section 3 details our proposed big data analytics course for non-ICT major university students. The findings and recommendations about programming languages, integrated development environments (IDEs), and useful tools for the big data analytics course are summarized in Section 4. Finally, Section 5 concludes the paper.

II. MATERIAL AND METHOD
This section provides learning objectives of the proposed big data analytics course for non-ICT major students and related work that addresses the ICT-related course developments in the universities.

A. Learning Objectives
The learning objectives of the proposed big data analytics course is to implement programming language practices and foster big data analytics skills. Fig. 1 The learning objectives based on bloom's taxonomy. Fig. 1 shows the learning objectives based on the Bloom's taxonomy [10]. There are six thinking skills in the Bloom's taxonomy: remember, understand, apply, analyze, evaluate, and create.
Level 1 (remember) can be basic programming knowledges such as grammar, syntax, variables, and types, conditional flows (if, if-else, if-then-else clauses), concepts of functions and classes, operators (+, -, *, /, etc.) and comments. Level 2 is for understanding the programming language; it can be 'how to classify tasks of the problem,' 'state tasks in the programming language,' 'outline major parts of the tasks,' 'define the problem,' 'clarify important terms,' and identify the solution.' Level 3 is to apply previously learned thinking skills to real-world problems in the programming language; it consists of 'write programming codes,' 'collect data set for the problem,' and 'extensions to big data analytics.' Based on the programming codes, it continues to level 4 to analyze the programming codes. There are two kinds of analyses for this step: static program analysis and dynamic program analysis. The static program analysis is for control and data flows, type and model checking. The dynamic program analysis is for code module testing and monitoring.
The next thinking skill level is evaluation (level 5). It evaluates the programming codes based on development quality, easy maintenance, reliability, safety, portability, efficiency, reusability, readability, and consistency. Finally, the final level is created (level 6). It is the highest order of thinking skills and is composed of 'brainstorm ideas', 'write design documents', 'build prototype programs', assemble a project team', comment on the programming codes', 'testing', and 'bug fixing'.
Based on the proposed learning objects, we develop a big data analytics course for non-ICT major students. Because the expected attendances have no programming language knowledge and experiences, it is vital to implement basic programming skills and big data analytics simultaneously.

B. Related Work and Method
Dawson et al. [11] observed that the performance of non-CS students is worse than expected. In response, the authors developed a CS0.5 course for non-CS major students. Then, they evaluated the course in terms of total pass rates, the degree of students' satisfaction, and attitudes.
Mohamed [12] designed CS1 programming course for a mixed class (CS major students and non-CS major students). To implement the course, the author used the flipped classroom model [13] and pair-programming [14], which resulted in higher student engagement.
Ketenci et al. [15] revealed the relationship between characteristics of middle school students and their performance metrics (self-efficacy, learners' interest, computer and programming experience, and performance) in the computing course using the structural analysis based on the equation modeling.
Although several studies about CS and programming courses, designing a big data analytics course for non-ICT major students has room for development. For example, although the flipped learning or blended learning class [16] could improve the learners' satisfaction, we develop a big data analytics course by focusing on instructor-learners' interactions, which help provide motivation and inspiration to the university students.

III. RESULT AND DISCUSSION
In this section, we present our big data analytics course for non-ICT major university students. First, we introduce our course design model consisting of five procedures (glance, roadmap, design, teaching, and review). Then, we describe the details of the big data analytics course with our suggestions and recommendation.

A. Our Course Design Model
Making a good university course is not an easy task, and a well-made course can help enhance learning satisfaction while minimizing students' misleadingness by providing firm expectations. To this end, having a well-organized course design model can support building a solid university course. Fig. 2 shows our course design model. The proposed course design model is composed of five procedures as below.
 Glance: In this procedure, we identify expected attendees. We should identify grade levels, majors, characteristics, background, academic performance, and personal attributes of students. In addition, we should reflect on the review of previous courses.  Roadmap: In this procedure, we outline course topics and modules based on learning objectives. In addition, we establish the teaching approach (e.g., instructorlearner interactions, specific educational methods, flipped learning, blended learning, and e-learning).  Design: In this procedure, we create weekly course plans and develop learning activities and assignments in detail. When building the course detail, we reflect learning objectives with thinking skills.  Teaching: In this procedure, we teach based on the developed course materials. The important thing for this procedure is that we should collect students' feedback for every course plan to reflect them next time. At the same time, when teaching, we should adjust learning progress based on students' overall academic performance and achievement.  Review: In this procedure, we keep the records of examinations and assignments to review the difficulty of assessments and testing results. Another crucial point for this procedure is perceiving students' needs and knowledge by interacting with students under the curriculum. Fig. 2 The proposed course design model.

B. Details of Course Design
In this subsection, we describe the proposed big data analytics course for non-ICT major university students. Table  1 shows the outlined course modules. There are five modules of the course: (1) programming environment setup, (2) computer programming basics, (3) functions, classes, and libraries, (4) programming for big data, and (5) mini-projects and exercises. The course outlines of the big data analytics for non-ICT university students are summarized in Table 2. We designed the course design for 16 weeks based on the proposed course modules described in Table 1. The first week is designed to introduce the course introduction and the programming environment setup. Since the big data analytics course uses a step-by-step approach, we encourage students to attend the first week. The programming environment setup includes installing Python and basic directions for the use of Interactive Python (IPython) and Jupyter Notebooks.
The second to fourth weeks are designed to deliver computer programming basics. This includes variables and assignment statements, arithmetic operations, input and output statements (input and print functions), and control/iterative statements (if, while, for, and Boolean operators). Because the big data analytics course is designed for non-ICT major university students, this course module should be easy-to-follow, while delivering enough background knowledge in a short period of time.
The fifth to seventh weeks are designed for the next level of programming skills in Python. In this module, we organize learning materials into three parts. That is, (1) functions (built-in functions, defining functions, and functions with parameters), (2) classes (defining user-defined classes, controlling access to attributes, and case studies), and (3) libraries (NumPy, Scipy, Matplotlib, Pandas, Sympy, Seaborn, Bokeh, Pygal and other big data-related ones).
Then, the students are asked to undertake programming projects for big data analytics based on the learning materials of the course (the twelfth to fifteenth weeks). The topics related to the programming projects are natural language processing (preferred in Japanese since students are majoring in Japanese), data mining Twitter, IBM Watson [27], [28] APIs, and machine learning.
In addition to the natural language processing project, the students are encouraged to use Japanese data sets. Finally, the final examination is scheduled for the sixteenth week. Note that in the examinations (mid-term and final), the students must use computers and program the code, then upload the written source code to the online learning management system for grading. In other words, we conduct practical examinations rather than pencil-and-paper tests.  Table 3 shows the details of laboratory exercises and projects corresponding to the course modules described in Table 2. Although the content of the first project is guided in the first week, we have the students perform the exercise at their homes for homework and exercises.
The details of four mini-projects are as follows: (1) TextBlob, tokenizing text, parts-of-speech tagging, and word frequencies for the natural language processing project; (2) Twitter APIs, Tweepy, searching tweets, and tweet sentiment analysis for the data mining Twitter project; (3) Watson SDK, Watson cloud services, and language translator for the IBM Watson APIs project; and (4) Scikit-Learn, k-nearest neighbors, regression, and clustering for machine learning and artificial intelligence projects. Machine learning -Scikit-Learn, k-nearest neighbors, regression, and clustering 1 week

C. Discussion
In this subsection, we discuss the proposed big data analytics course for non-ICT university students. We use the Python programming language for the course since there is no obstacle for everyone who has no previous programming experience and knowledge.  Furthermore, the Python-related tools that help study programming exercises are easy-to-use with no program installations when needed; the only required one is a web browser. Fig. 3 shows the Python shell in a web browser. With the Python shell in a web browser, anyone who wants to learn Python programming can access the programming environment. However, we teach the big data analytics course in a computer-based lecture room with Python and related tools installed for the learning effectiveness.
In addition to the big data analytics course lecture slides, we provide Jupyter Notebooks to the students, which can be easily accessed with a web browser as well. Fig. 4 shows an example of a Jupyter Notebook in a web browser. Like the Python shell in Fig. 3, Jupyter Notebooks can be used with and without program installations.
The above-mentioned Python-related tools are helpful since the students can utilize the tools with smart devices (e.g., tablets, smartphones, and laptops) and PC-based environments. Furthermore, with an Internet connection of the smart devices, the students can learn about the big data analytics course from anywhere.
We use a flexible turn in policy for milestone projects (the natural language processing, data mining twitter, and IBM Watson APIs projects). In other words, for the three milestones mentioned above projects, the students are allowed late turn in before the final examination.
We use the flexible turn-in policy to encourage the students to practice as much as possible without restrictions of firm deadlines. However, the period for the last milestone project (the machine learning project) is one week. In other words, no late turn-in is allowed for the machine learning project since the next week is the final examination.
If we extend the deadline of the milestone projects to the final week (final examination), the students may suffer from examination nerves for those who have not finished the milestone projects. Therefore, we do not deduct points for the late turn-in to remedy examination nerves and stress about the milestone projects.
Our course design of the milestone projects is an individual, not team-based, approach to avoid the free-rider problem. The additional advantages of an individual approach are as follows: (1) the students can build responsibility and ownership for the milestone projects; (2) no free-rider problem occurs; and (3) the students will get credits so far forth as they go.
The disadvantage of an individual approach is that the students could not have cooperation experiences of a teambased approach. However, the advantages outweigh the disadvantage of the big data analytics course since many of the students do not have programming experiences and knowledge before the semester.
As far as learning materials are concerned, we did not specify a textbook for the big data analytics course for two reasons. One is that we cannot find a suitable textbook for the big data analytics course for non-ICT major university students. The other one is that we do not limit the students' potential learning experiences. In other words, if we specify a textbook for the big data analytics course, the students' learning area may be limited by the textbook.
Instead, we provide various learning materials to the students, such as lecture slides, Jupyter Notebooks, free online course URLs for Python and big data analytics, and source repositories. We conjecture that this approach helps extend the students' learning domains.
After the final examination and its grading, we provide the solutions to the milestone projects for self-grading and review. We observed that students are suffering from no solution policy for laboratory exercises and projects. Specifically, some students want to review and check their solutions and perform troubleshooting.
The potential issue for providing solutions to laboratory exercises and projects is that students may share the solutions with juniors in the next year's class. Our resolution is to modify the project problems every semester by reflecting on recent technology trends and developments.

IV. CONCLUSION
In this paper, we developed a big data analytics course for non-ICT major university students. The learning objectives and course design models are carefully selected based on Bloom's taxonomy with six thinking levels and five procedures. The proposed course is composed of five principal modules. Since the proposed big data analytics course is designed for non-ICT major university students who do not have previous programming experiences and knowledge, the curriculum is developed considering conciseness and efficiency for lectures. To this end, we used useful tools that help learners to perform programming exercises and milestone projects effectively.