A Comprehensive Review on Big Data-Based Potential Applications in Marine Shipping Management

— Shipping is becoming a spearhead economic sector in sea countries. Over 90% of goods and raw materials are transported globally by marine shipping. However, the requests from the advanced transport ship generations include much more modern facilities, larger cargo space, faster moving, and flexible controlling cargo from inland with a shorter time. As a result, the demand for digitalization of maritime is increasing in a flexible virtual environment and under pressure to reduce costs. Big Data plays an extremely important role in marine shipping. Big Data helps determine what is traditional and non-traditional data to reap profits. Shipping companies often collect extremely large amounts of data from various sources such as frequent reports from ships, sensors, GPS devices, RFID tags, and traffic management systems. This way can boost forecasting and/or avoiding risks as well as saving the cost of transport. This article focuses on assessing the challenges and opportunities of Big data in marine shipping before comprehensively analyzing its applications in maritime transport including online vessel decision support, vessel performance optimization, fleet operation optimization, and predictive analysis. This review has gathered, provided, and highlighted the rich ways and vast opportunities to improve big data-driven shipping processes and operations of the shipping industry. The results achieved can provide a good direction for authorities and shipping companies to formulate and implement effective policies to cope with the growing pressures of a highly competitive shipping market.


I. INTRODUCTION
Besides containerization and diesel propulsion, it was commonly known that big data technology was an important innovation in the shipping industry since it would have effects on patterns of shipping operations in the further 10-20 years [1]. Resource trade plays an important part in the growth of the world economy due to its support to economic activities consuming energy [2] [3]. Moreover, carriers need to comprehend the flows of global trade because international bulk shipping which mainly depends on the supply and demand of energy largely fluctuates [4][5] [6]. While international trade statistics such as the UN Comtrade database are readily accessible, they are normally restricted to national levels.
Shipping now embarks on its fourth technological revolution, also referred to as cyber-shipping or Shipping 4.0. The first revolution was the move from sail to the steamer in 1800, followed by steam-powered diesel engines in 1910 and the third occurred in 1970 with the appearance of computerized systems and automation [7]. In all facets of shipping activities, the latest development was about digital data which was comparable to Industry 4.0 in the land sector [8]. The wide employment of novel technologies such as Cyber-Physical Systems, Internet of Things, and Internet of Services was included in shipping 4.0, which offered smarter embedded computers for onboard equipment, providing a variety of new information and data along with a range of shoreline facilities for used data [9] [10].
It is clear that we are living in the world of Big Data (BD), which is also emphasized by the United Nations, namely "the world is observing a revolution of data" [11] since big data affects all fields of the human community, especially the world of business. For decades, they have gathered numerous data, and data analysis was not a novel concept. However, "what distinguishes the present and the age of the big data is the change in behaviors encountered by governments, businesses and non-profit organizations... they want to use all of the data... to enhance the business" [12]. IBM and MIT conducted comprehensive scientific research which showed that high-performing businesses were more likely than lower ones to be advanced users of analytics and to use their analytics as a strategic differentiator [13]. Moreover, it was indicated in an MIT research of 179 big firms that the yield and productivity of enterprises making "data-driven decisions" were 5-6 percent higher than the expectation of other information technology usage and other investments. A 5% rise in production and productivity could significantly impact the chance of winning in the harsh competition of most sectors. The quantified investigation showed that the successful use of big data would help an organization make decisions, go through the information, and refine its processes [14] [15].
Based on a study of Oussous et al. [16], big data was considered high speed, high volume, and high variations, so it was not enough for the "conventional" technique of processing data to efficiently employ data in terms of control, decision-making, and analytics. With regard to power processing as well as computer capacity, this was hardly the case in current shipping. Although it was difficult or even impossible for vast sets of ship data to deal with certain basic tools, computer systems these days were not really challenged by big data. The information volume and sophistication, however, required novel tools and techniques so that users were able to correctly interpret the information, which, therefore, could be described as a big data challenge [17]. There has, over the last few years, been a strong controversy and discussion on how advanced technologies and ideas could interfere in the existing conditions of the marine ecosystem as a response to these changes and challenges [18] [19]. Cuttingedge sensor systems, for instance, are becoming more popular on all types of boats and vessels. To position and navigate, besides GPS and AIS, shipping industries also apply other sensors to the outcomes which include energy-wasting meters to measure the proportion of ships' motor vehicles into cruising speeds, humidity and temperature sensors, fish tracking radars, equipment determining condition data, and flows of vessel performance, as well as other hardware based on sonar [20] [21].
BD analytics was an unavoidable subject for players, along with the investment in digital technology, new ideas, and changes in the maritime organization. This research thus provided a case study of the implementation in marine organizations of the proposed system to investigate obstacles to BD analytics. Through the investigation of documents and analysis of challenges, for the majority of maritime companies, big data analytics was still in the initial phase where most research and applications concentrated on the military sector. Although great possibilities to control online or analyze offline were offered by the fourth shipping development, apart from the sheer sizes of data sets, many other issues emerged. This paper described several problems that arose in our big data work and suggested how to address them. Furthermore, the potential of analyzing big data in marine shipping was also covered in this research; thus, the study's findings would allow decision-makers to boost BD analytics in marine shipping.

II. MATERIALS AND METHOD
Big data is a term used to describe massive, complicated datasets which are hard to analyze and process with conventional methods and applications for data processing [22]. Moreover, big data also indicates collecting and subsequently analyzing any massive data with no structure in which hidden insights might exist [23]. Currently, big data is known as vast and complicated sets of data with distinct types including unstructured, structured, as well as semi-structured. Everything could be big data sources, yet the system of databases these days could not manage them. Based on a META Group (or Gartner) report in 2001, big data was considered data that had three fast-growing dimensions [24]. It is clear that big data increasingly grows day by day with no limit, and by 2020, the number of data generated every year was expected to rise by 4300% [25]. Analytics of big data means data collecting as well as analysis processing aiming at exploring tendencies, hidden insights, and correlations, which could create a competitive advantage for any industry.
Information from numerous sources (sensors) is included in big data, so it is challenging to capture, sort, analyze and manage data. There are four key features of big data known as the 4Vs including Velocity, Variety, Veracity, and Volume [26]. Volume means the vast number of data that is generated by sensors in terabytes, petabytes, and over while variety indicates the data form. In particular, datasets in big data are stored in various formats, and the variation of data distinguishes between traditional data and big data. Velocity refers to the data movement and creation speed as data is generated at different speeds and needs storing to process. In general, an immense quantity of data can be produced in realtime and the speed of data flows increases quickly themselves. Finally, veracity means the reliability and accuracy of data assets of data from distinct sources might employ distinct scales for measurement of a similar variable, which poses problems of the ways to remain quality of data. Thus, it is necessary to handle and maintain veracity during the lifetime of data [27].

A. Challenges of Big data in marine shipping
Big Data has gained popularity in the shipping industry which required an enormous amount of information to comprehend and enhance logistics, energy consumption, emissions, and maintenance. However, using big data also had limitations such as satellite communication, technical barriers in aggregating and employing big data, quality and expenses of sensors onboard, data ownership, and systems of data acquisition. Novel criteria of procedure enabled the process of data collection and organization which covered the e-navigation sphere to be simplified.
Apart from advantages, data employment also had great difficulties such as acquiring, managing, processing, storing, and analyzing data. The difference between big data and traditional data was its features of high-speed, high-volume, high-diversity of sources as well as requirements of data integration to analyze. Whereas, the management of traditional data along with analytical systems was dependent on database systems that were structured and relational [28] which was not suitable for the vast volume and non-uniformity of big data. It was clear that data analysis and management played a significant role in and greatly impacted various industries. Nevertheless, many big data problems and uncertainties emerged and many of them were problems for all industries while some were issues for specific sectors [27]. The challenges were identified in four main perspectives based on the review of LRF and DNV-GL, which included human resources, reasonable competitive conditions, security, and technology. Fig. 1 illustrated these classifications and particular problems in each group. According to these results, analyzing each challenge in maritime big data to figure out the core of issues, and then, practical resolutions to tackle proposed problems would be mentioned [29].
The difficulties could be considered as one general issue, which was a lack of "game rules" in the balance of responsibilities and rights of the secondary user, the holder/ data acquirer. When no rule was set to handle the problem, many relevant parties could be disallowed to share or exchange big data because they might be afraid of losing confidential information, business opportunities, or even profit. Well-founded data governance, for instance, was considered by DNV-GL to be one of the basic big data problems, so suitable resolutions needed building in systems from the beginning of development in this area [30]. The issue of not having clear rules in responsibilities and rights towards big data (which was shown in examples) should be considered a prerequisite so that the employment of maritime big data could be developed.
It is not hard to understand the necessity and significance of human resources, but the issues in this aspect to be handled is another story. It is becoming harder and harder to recruit well-qualified employees, which is a more brutal problem in the maritime industry. A good example of this is statistics, a typical expertise field relevant to big data. Through Varian's example [31], it can be seen that the maritime industry along with other sectors that exploited big data would want to attract more engineers in the big data domain. Koga supposed that such trends justified the personnel shortage, and many industries would have difficulty seeking data engineering specialists [32]. Moreover, there was a common basic point among these five factors. It can be combined with "collection biases", "correlation/causation", and "powerful tools" to create novel, high-performed electronic equipment and upgrade information processing technology. The two remaining elements could be studied then. Obviously, that upgrade and invention were significant for big data development since through those tools, data was analyzed, processed, and stored. Thus, the better tools were the more innovation could be gained [33].
For measurements taken from one device and employed in another, the basic sensor data acquisition background could be problematic. An example of this is that various sensors are of similar types, where and how they are related: As an instance, the vessel typically had many position sensors with its individual quality characteristics and local position, which might not be recognized outside the system of bridge. The use of raw navigation system location data could trigger issues with consistency or accuracy while links to the bridge as well as other data networks could pose a risk to safety or security. Any physical network or device link could be an error propagation vector or aggressive attacks; therefore, flag state and class authorities frequently did not approve such connections. Developments in safer and firewall technologies and safer Ethernet to link to outside networks IEC were made on the bridge networks [34]. It was difficult to calculate accurately complicated phenomena (e.g. environmental effects on vessel performance). Besides, caution needed to be taken when applying these data in calculations. The typical occurrence on most vessels was the impossible estimation of the total effects only from a point of measurement since waves, wind speed, speed through water, and other data fluctuated greatly around the vessel. From experience, a significant source of error was inputting the data in computer systems, reports, or AIS transceivers manually. Regarding AIS, this was especially obvious in the destination, sailing mode, and ship draught, yet it often appeared to be an illegal or incorrectly identified ship. AIS also had problems with the navigational data transmission, namely a turning rate and true heading needed to be attained from sensors outside the AIS. However, a significant number of AIS transmitters were not attached to the sensors and send data from the internal position or invalid data, which could be an unavoidable issue. Using automatic data input was an obvious resolution, yet it could be too expensive when physical links to other systems were required. Therefore, the most suitable short-term approach was an extensive validity review. Automated vessel reporting, nevertheless, was a priority of the e-navigation strategy implementation plan [35]. This is why some improvements in the field could be expected in upcoming years. Several reports from vessel to shore, for instance, for owner or charterer, contained commercial meanings. The reports could include speed, fuel consumption, and bunkering as well as contractual performance, or exemptions from unfavorable weather or safety problems of ships. Normally, some statistics became less credible because the operator had strong economic interests in disclosing erroneous results. Older vessel sensor data as well as hardly used sensor data could provide questionable quality, which was because of faulty sensors, disconnected or broken sensors, and other issues. It was generally typical to data generated from alarm and automation systems, so most parties agreed that any onboard calculated or otherwise produced data would be kept by the ship owners. Unless these data were readily accessible through standard interfaces, however, there was a likelihood that accessing them charged the operators or owners a high price. Software and special hardware could be normally required along with other service personnel could make it costly. Several systems might be supplied with special limitations on the internal data usage of the system. Conditional servicing and general control were, in some situations, rendered as a service to a shipowner in which the owner had no ownership or access to the fundamental data. The same services of third parties could also be employed to track or optimize different ship or fleet functions, which could also mean data ownership limits. For several forms of data collection, the shortage of open interface specifications was a major issue, but the most commonly used for interfaces to integrate control and automation systems. Navigational and bridge systems set up standards suited to open interface. Besides, standard formats were employed in several forms of ship reporting [36]. Lacking interface criteria meant that all interfaces for data acquisition needed designing for use and interface design might vary for all applications, even for similar ships. In overall onboard data collection, cyber protection was not seen as a significant factor, but it was important to be aware that data could be jammed or spooked through cyber-attack. Additionally, it can be observed from examples that even signals of GPS could be fake. Thus, all data based on wireless communications from other vessels or shores could be spoofed or jammed. Examples included AIS or radar targets gained from ships and then sent to shore as well as the data obtained by satellite or shore systems [37]. Therefore, a vast quantity of data is created in the shipping industry from various sources in distinct formats, including weather data, cargo data, machinery data, and traffic data. Because sensor technology is applied in the industry, volume and data variety increase each day. Normally, the data is gathered and analyzed from distance with a high rate of transmission, so big data analytics in the shipping industry is still new and many problems including integration and adaptability need addressing.

Fig. 2 Advantages and prospects of Big Marine Data Analysis
In the shipping industry, some data-driven terms which were based on the digital technology implementation were introduced including autonomous surface, the smart ship, underwater vehicle, and the connected ship. The ship intelligence was expected to drive the industry's future [27] as big data was a common concept in the maritime area. The transport industry has made vast amounts of data that lead to a shift towards large-scale data, in which analyzing and managing these data would be more and more important and have a major impact on the marine industries. A high rate of data transmission enabled to gather of more data in the shortest time. There was considerable potential to enhance onboard ship operations as well as maintenance by converting the data into value and using different sensor types. Various maritime industry data sources and the advantages arising from the data analysis were depicted in Fig. 2. For research, the ship data must be incorporated as data analysis would improve the optimization of the vessel, the use of assets, and its efficiency. The efficiency of operational scheduling could be increased through the navigation, maintenance, and communication of onboard data analytics related to onshore and onboard systems of decision support [27].

1) Online Ship Decision Support
The vessel data needed integrating to analysis as data analysis would increase the optimization, performance, and asset utilization while the efficiency of operational scheduling could be improved through communication, navigation, and maintenance handled by onboard data analysis which was linked to both onshore and onboard systems of supporting decisions. Vessels would be supervised continuously from distance and the data were collected using remote sensor networks. The shipping industry would require a strong wireless network that had high abilities of transmission. After going to the database, the real-time sensor data would be delivered to stakeholders to provide them with the latest information on everything occurring on the vessels. When meteorological data, ship performance, and the route were analyzed. the ship operator could conduct voyage planning based on ship performance on similar and distinct routes. Additionally, voyage planning also required a dependable prediction of ocean and wind data. Thus, data analyticsenabled to determine the most effective route, exactly anticipated time for arrival as well as alternative routes to avoid disturbance or delay [27].
Based on the analytics-as-a-service viewpoint by Demirkan and Delen [38], analytics were divided into three classifications which included prescriptive analytics, predictive analytics, and descriptive analytics (shown in Fig.  3), in which tit includes (1)-In descriptive analytics, the information on "what is occurring or occurred" could allow enterprises to determine pros and cons, (2)-In predictive analytics, technologies such as the mining of web, text, and data were in use for making probabilistic forecasts in future occurrences, (3)-In prescriptive analytics, methods including expert systems, decision support, and simulation were employed for investigating various alternatives and giving suggestions on actions of decision-makers. Fig. 3. Categories of Analytics [38] A wide application of the RFID technique was observed, from which the IoT concept was stem; nevertheless, IoT included various domains and stakeholders, so many sights emerged in academia and industries [39]. Essentially, there existed three viewpoints as follows [15]: (1) Things oriented: It concentrated on object visibility enhancement, namely the object traceability as well as the understanding of present location or status of the object and so on, (2) Internet-oriented: Its goal was to boost the network protocols such as the Internet Protocol which was considered the network technology to connect smart objects worldwide, (3) Semantic oriented: Its focus was on problems of representing, storing, interconnecting, searching and organizing information created by an enormous quantity of smart objects.
In information systems and computer science, literature on big data was popular [40] while in the current operation study sphere, applications of big data analysis received interest (Fig.  4). A new approach for integrating a qualitative decision model was presented by Choi, Lee, and Irani [41] with available big data on the internet for improving the process of public procurement. Random prediction regression was adopted by Fang, Jiang, and Song [42] into big data achieved from insurance firms so that insurance customer profitability could be forecasted. Whereas, according to Song and Wang [43], businesses joining the international value chain had a tendency of acquiring a higher level of green technology through regression analysis on data panels showing the difference of Chinese companies. Besides, documents on issues of dynamic vehicle routing were reviewed by Psaraftis et al. who also discussed the significant role of big data used in issues of vehicle routing to improve decision making. Moreover, it suggested that documents concentrate on the way of utilizing big data. Although an immense amount of data was processed in these investigations, the data nature and size in this study were enormous and complicated. Also, in this article, big data on weather archives included large weather observation data in various moments of the sea. Additionally, it could not directly access the archive data format by general tools of purpose programming, so it required pre-processing. Weather archive data was adopted in the research by Lee et al. [44]. That function of real fuel consumption could be estimated to tackle the issues of speed optimization. Particularly, the Copernicus data set was employed as a big data source and a data mining method was adopted for determining the influences of weather conditions according to a specific voyage route. Besides, a metaheuristic optimization approach called particle swarm optimization was employed for figuring out Pareto optimum resolutions to minimize the consumption of fuel as well as maximize SLA.
It was possible to verify the utility of the presented solution using the actual data collected from a liner business [44]. Moreover, OR played an important role as a tool of supporting the decision. In existing research on AIS data applications, the only field adopting, OR methods were route planning. Different efficient algorithms such as the genetic algorithm, other heuristic algorithms, and ant colony optimization were presented for route planning by Kim et al. [45].

Fig. 4. Big Data applications in marine shipping
Decision analysis could be viewed as a solution to the longstanding need of the business community to fully recognize the importance and relevance of modeling in business [27]. In this respect, it was supposed that IT and logistics were without a doubt connected, and this was successfully seen in developments in computational logistics which was related to up-to-date IT and IS used to design, make plans, and control logistics network along with other complicated tasks while words were borrowed from above. Due to the very active development of logistics and SCM, associated service networks found better decision support. Whilst state-of-theart IS and IT systems were essential elements in logistics and supply chains, their effective management depended on smart and organized logistics network decision-making [46].
It should be noted that e-navigation is now known as userdriven and not the system-driven definition. However, upgraded vessel navigation platforms with the capabilities to enable intelligent decision making [47] in which existed restrictions on human subjective considerations, were regarded under the same conditions [48]. Thus, this system could attain a detailed overview of the vessel's efficiency. Both systems promoted navigation information and ship performance which could be applied in IBSs for building facilities for decision support. These facilities ultimately contributed to the respective ship's energy efficiency navigation strategies. In order to enhance decision supporting facilities in terms of fleet navigation schemes, presented MI applications namely data analysis was incorporated in the corresponding data flow charts. In every phase of the data flow chart, a suitable MI application was therefore included and explained in the following section. For different decisionmaking features, particularly in applications of energy efficiency and in system reliability shipping, the pre-post processed data were in use. This paper's purpose was the development of a suitable methodology to manage datasets in navigation information and ship performance. Besides, this paper presented a flow chart of marine engine-centered data to manage that large-scale dataset and to promote the efficiency of the respective navigation strategy. There were two major areas of pre-and post-processing in the abovementioned data flow table. The data pre-processing part was an on-board program consisting of data compression stages, data classification, and faults detection while the data postprocessing part was application-based onshore such as at data centers and this part contained data regression stages, integrity verification, and data extension. Different MI programs including PCA, auto-encoders and GMMs with EM algorithms were displayed and applied in many data flow table domains for large-scale datasets management. They adopted these datasets to build suitable navigation schemes for the ship's energy-efficient operating conditions. Those energy-efficient navigating schemes [49] with the capabilities of intelligent decision supporting could finally become one of the e-navigation strategies globally and the SEEMP locally (such as in the ship) [18].

2) Ship performance optimization
The study of Anagnostopoulos showed a novel way to forecast the propulsion power employing big data methods so that assessment of vessel performance was enhanced for reducing emissions as well as for a greener operation in the future [50]. Big data was a potential technological approach in the enhancement of methods to assess the performance of vessels through generating values from data. Moreover, big data methods aimed to implement Machine Learning models to analyze data while regarding research, real data gathered from an LCTC M/V, especially the data relevant to the hull performance was in use. To forecast the propulsion, used features included speed through water, direction and intensity of wind, course, speed over ground, pitch, heading, forward and aft draft, rudder angle, and roll. The role of this data was like inputs for Machine Learning models to predict propulsive power. XGBoost and Multi-layer perceptron of the NN was employed Machine Learning models. These models were from the library Scikit-learn Python [51]. More clearly, the data is divided into voyages, so that predictions of part of the voyages are made. The results are assessed with the R 2 (coefficient of determination) and Mean Absolute Error, Machine Learning metrics, giving an accuracy of around 10% depending on the voyage.
Navigational environment such as direction and speed of the wind, along with velocity and depth of water was the main factor affecting the energy efficiency of domestic vessels (Fig.  4). It was hard to identify the optimum speed in various conditions of the environment for gaining the best efficiency of energy due to the complication of the domestic navigational environment. Dividing routes based on the features of environmental factors could offer a satisfactory resolution to optimize the speed of vessel engines in various conditions of navigation. According to Yan et al. [52], the distributed parallel k-means clustering algorithm was applied in this article to obtain a complex division of routes through the analysis of equivalent environmental factors according to a self-developed platform of big data analytics. As a result, a model of vessel energy-efficient optimization that noticed various environmental factors was set up by the analysis of energy transmission among the main engine, propeller, and hull. Next, decisions relevant to the optimum speed of the engine in distinct parts along the voyage were made before presenting a case study of the Yangtze River to confirm the current optimization approach. According to the results, the presented approach could lower carbon dioxide emissions for vessels and energy consumption effectively.
An intelligent vessel could gather numerous data on weather, machinery, and voyage, so analyzing big data for smart vessels played a significant role as it could be commonly adopted to enhance equipment life management, operational efficiency, and ship maintenance. In the study of Jeon et al. [53], the paper proposed a precise regression model for the main engine's fuel consumption with the use of an ANN through big data analysis such as data clustering, expansion, collection, and compression. In order to have an accurate regression model, we tested different numbers of neurons and hidden layers and various kinds of activation functions as well as their impacts on regression analysis efficiency and accuracy. Accordingly, the ANN regression model was more efficient and accurate in the prediction of main engine fuel consumption, compared to support vector machine and polynomial regression.
Normally, a very small portion of operational data that was tracked such as from the environment and sensors from OSVs were in use. Apart from construction and design data, and equipment performance data, operational data contributed to the vast amount of data with high diversity and veracity. In certain instances, the richness of data was not very well understood how to use it more effectively when designing and operating. Very often, the final operational performance of a ship design solution in initial design was measured using models and model tests, requiring too much time and money. It was argued in the research by Abbasian et al. [54] that it could be potential in integrating vessel lifetime data from its distinct operation stages in big data storage to deliberately evaluate. Moreover, performance assessment of real identical ships in the initial phases of the design process promoted performance criterion of solutions for later ship design generations. The development of the know-how from such a data source of boats required the latest data mining techniques, including clustering concepts and big data which were mentioned in this article, to find useful trends and connections between current actual fleet performance data and design parameters. The launched analytics model reviewed all related stages of data knowledge exploration such as preprocessing, processing, and post-processing.
Marine engine-focused data analytics were presented as a part of SEEMP, which implemented measures on emission control to enhance the energy efficiency of vessels by taking navigation data and ship performance into account. In the engine-propeller combinatory chart, namely a propeller shaft having a min engine with direct drive, the mentioned data analytics was developed. According to Perera and Mo [55], in the combinator diagram, the proposed data analysis identified three operating sections from the early data analysis to capture the forms of those areas. The data analytics included GMMs implementation to categorize the sections of the main engine that operated the most frequently. Besides, the EM algorithm also computed the GMMs parameters. Thanks to the data clustering algorithm, it was easier for an iterative process to know the operating areas of main engines with covariance matrices and respective mean; therefore, navigation conditions and ship performance could be tracked regarding operating regions of the engine which was a part of SEEMP. Moreover, it was expected that advanced mathematical models were developed to monitor vessel performance in the marine engine operational areas such as data clusters.
When the global economy experienced a downturn and emission reduction, as well as energy-saving, increased, the way of conducting measures of energy effective management to reduce emissions and save energy was a big challenge for shipping technology development. However, a brand-new idea for the investigation of managing the vessel energy efficiency optimization could be attained by mining and analyzing big data. The study by Wang et al. [56] designed a platform of big data analysis based on the Hadoop platform architecture that was commonly employed. A large amount of related data on the management of energy efficiency exceeded the processing capabilities of conventional methods, so using big data analysis techniques could divide routes depending on environmental conditions, laying a foundation for optimizing speed under various parts of a route. Ultimately, a simple approach to the optimum speed of the engine was presented based on the route division results, enhancing the energy efficiency of the vessel and reducing CO2 emissions.

3) Fleet optimization
The applications of optimization methods were adopted in maritime operations such as fleet management, vessel scheduling, and routing, bunkering, and disruption handling [57] [58]. Whereas, the materials on container liner shipping relevant to network design, management of fleet, and container routing were reviewed by Tran and Haasis [59]. Currently, available research on maritime operations from the viewpoint of decision support and sustainability was proposed by Tang et al. [60]. Furthermore, one of the significant issues was speed optimization for operations of sustainable maritime since fuel consumption identified by sip speeds directly influenced CO2 emission. According to Niazian et al. [61], and Jensen et al. [62], early research on the issue of speed optimization assumed strict time windows and port times. The proposed models limited the arrival of vessels to a 100 percent service level arrangement at the contractual time frames, which in reality was too strong and according to Lee et al. [44], only 55-89% of ships could be at ports timely. Because of weather conditions, handling, and congestion, the travel times and port could be changed [63]; therefore, current research by Aydin et al. [63] expanded the issue of speed optimization by considering uncertainties at routes and ports. Additionally, a ship scheduling model was presented by Chen et al. [64] for total fuel expense minimization by determining uncertain requirements of frequency and port times. According to their formulation, the limitations on port time frame were relaxed, so ships were permitted to arrive at any time. Whereas, the problem was extended by assessing bunkering decisions and time windows, according to Zhu et al. [63].
Varelas and Plitsos revealed that a case from the shipping industry was illustrated, using stream processing, vessel route optimization, analytics, alerting, and tracking via big data, which covered the business process modeling, management, and control of the infrastructure, along with the design and deployment of focused services requiring various parameterization and implementation function for stakeholders [65]. In addition to analysis of user roles and domain requirements, BigDataStack was shown to facilitate, support, and incorporate all requirements. More interestingly, in the global merchant fleet, bulk carriers generally worked between cargo and discharge harbor before running empty to the next cargo harbor. The price of shipping bulk trades varied significantly based on supply and demand; hence, there was tremendous potential for the proper preparation of boats in the bulk industry to increase shipping's profitability and economic efficiency. In the study by Li, Qi, and Song [64] article, a decision support system for vessel scheduling was considered for optimization. The popular optimization models for vessel scheduling were checked briefly and the underlying concept was graded. Next, a prototype MoDiSS was built on a PC-based with an appropriate GUI (so-called model-based DSS in ship scheduling). The system efficiency was checked and assessed with the use of different scheduling situations; thus, the efficacy of the system was satisfactorily confirmed.
Finally, In maritime shipping from distinct points of view, AI and big data were employed to attain better energy efficiency [66]. A large number of papers concentrated on ship speed optimization [44] [52]; while some emphasized vessel crane control [67], and route planning [68]. Besides, a larger case was considered and the network of shipping services for containers and optimization of speed was optimized by Brouer et al. [69]. In order to improve the ship's energy efficiency, slow steaming was often considered the best practice by liner firms. Moreover, weather conditions during ship speed optimization were not covered in most of the available documents, so according to Lee et al. [44], big data could be utilized to minimize the consumption of fuel and maximize SLA with the use of particle swarm optimization techniques. Whereas by adopting the distributed parallel k-means clustering algorithm, the optimum velocity for inland vessels was identified by Yan et al. [52], and their method was also useful in reduction in CO2 emissions and ship energy consumption.

4) Predictive analysis
Despite the domination of machine learning algorithms, predictive analysis in the maritime industry included a variety of applications such as prediction of ship propulsion failure and prediction of poisonous blooms in coastline waters. There were a lot of sensors in modern vessels that gathered data on a variety of elements such as pressure, flow rates, and temperature, yet in decision making, the enormous data were not adopted (Fig. 4). Thus, based on these data, the hidden failures of the system propulsion on the ship were predicted by Coraddu et al. [70] who also discovered that SVM was better than RLS. Additionally, Coraddu et al. [71] utilized two non-monitored models of machine learning to forecast the conditions of the ship hull, realizing that OCSVM and GKNN worked efficiently and shared the common results of forecast accuracy.
The sensor technology was widely used in the industry of shipping, enabling system and process controlling and realtime supervising. An immense amount of data could be generated from a vessel through various sensor systems, so this data quantity was enormous and sophisticated to process, leading to a headache for the industry. The analysis of big data could make it easy for operations including analyzing prediction of ship performance and scrutinizing emissions. Moreover, this system automatically discovered the ship operation according to sensor data, namely GPS [72] and flow meter [73]. The sailors did not need to update modes whenever the operational state of the vessel was changed by the employment of auto-mode detection systems [73]. Analyzing real-time data on velocity, distance, and fuel consumption. Without human involvement, the auto-mode detection system did not work, and it would summarize the energy consumption for a single-engine, emissions in various modes, and vessel running hours. Thus, these details could be used by onshore and onboard staff for the measurement of KPI and ship operational performance. Besides, this system allowed vessel operators to meet the MRV and EU regulations through supervising emissions and fuel consumption for various ship modes [74] [27]. The requirement for maintenance could be detected so that potential failure which was measurable and detectable could be avoided. Additionally, all data would be recorded and the risk of breakdown could be shown by the synchronization of relevant data (e.g. running hours, fuel consumption, and engine data); hence, the expenses of broken parts would be reduced and unscheduled downtime was minimized. This system depended on machine state scrutinizing to reveal when and what required maintaining before the happening of the breakdown so that additional time was not needed on scheduling and maintenance activities.
In the maritime industry, innovation was a slow incremental development. In this sub-cluster, a study of Lam and Zhang [75] presented customer-centric resolutions and obstacles to these solutions. Whereas, the innovative definitions of the IoV were considered by Tian et al. [76], in which all primary technologies were integrated into a platform. The IoV (the same as IoT) was a network of intelligent interconnected ships and inland facilities using a chain of digital entities. Then, seven requirements of design were proposed by Zhang and Lam [77] to identify their impacts on customer values in liner businesses. Based on their findings, the three most efficient resolutions included using container technology, eco vessels as well as big data resolution to manage vessel information and system automation. Whereas, pointed out that among maritime companies, three major challenges for the adoption of big data analytics included lack of knowledge in business improvement with analytics, executive sponsorships, and skills. Additionally, the use of historical information to assess the characteristics of new vessels or operating regulations was another important opportunity in the Big Data perspective. One important application of this was the virtual prototyping and simulation of novel vessel designs in proven and historically confirmed operating conditions [78] [79].
Although the maritime industry continued to rely primarily on a time-dependent, prescriptive maintenance approach and favors a properly organized conditions-based maintenance schedule through boarding, higher expectations and competitive criteria regarding ship availability, performance, and the effect of the data revolution on ship operations were based on (CBM) regime. According to Raptodimos and Lazakis [80], predictive maintenance strategies to help decision-makers to select suitable maintenance actions for critical vessel machinery could also be attained by using ANNs. A NARX ANN was developed in this article to predict values of emissions output temperature in a marine major engine cylinder. A deep analysis was carried out for checking the NARX model robustness and performance for variables in time series data, showing generalization capabilities and virtuous performance to forecast as well as the capability of using the model in tracking and prognostic applications. More clearly, Mojtahedzadeh et al. [81] inspected a robotic system in a real-life configuration that could securely remove items from randomly ordered containers. In case information on products was incomplete, a machine learning model based on the probabilistic application would well function (ibid). Berbić et al. [82] predicted that SVM was more exact than the ANN when forecasting real-time wave heights. Whereas a random forest model allowed to forecast PSP toxic dinoflagellate Alexandrium minimum emerging in waters of the northwest Adriatic Sea, and the prediction reached above 80 percent of accuracy [83].

IV. CONCLUSION
The 4th industrial revolution is providing valuable opportunities for the shipping industry to digitize its core structure, management systems, and strategic policies. Big data is emerging as an inevitable trend of improving freight capacity as well as reducing ocean freight rates under the pressure of applying strict emission policies from IMO. The article generally describes the challenges and opportunities that the shipping industry can embrace when applying the advances of the Internet of Things and Big Data. Arguably, the biggest obstacle for the shipping industry in implementing Big data to digitize its management system is the modernization of its infrastructure and fleet. Obviously, information technology with smart sensors and wireless networks is systematically integrated into the core of shipping companies' management and control systems in the design stage that is seen as a prerequisite for the successful deployment of big maritime data. By analyzing the results from recent publications, this assessment also clarified the typical applications of Big data such as online ship decision making, optimization of operation and performance, optimal fleet management, and predictive analysis. Therefore, an overall picture of the digitalization of the shipping industry has been reflected through the degree of application of Big data into energy management systems, routes, speeds, freight rates, cargo capacity, and maritime security. There is no doubt for a promising future for the great developments in maritime freight in terms of cargo capacity, reliability, safety, and optimal energy use as Big data is widely deployed globally.