Using Variation in Weighting Criteria and String Size Matching on Hybrid Model Schema Matching

Edhy Sutanta, Erna Kumalasari Nurnawati, Rosalia Arum Kumalasanti

Abstract


Schema matching plays a vital role in the information integration process from heterogeneous databases. Generally, the process of schema matching is to receive input, which are two databases (one as the source and another as a target), to match similarity attributes, and generate output in the form of mapping the similarity of the attribute pairs that are declared suitable. Furthermore, the user will assess these attribute pairs to determine whether the results obtained are correct or still need to be revised. Our previous study developed a model and software prototype of hybrid schema matching using a combination of constraint-based method and instance-based method. In this study, the model improved by adding new features. This paper discusses the increasing effectiveness of adding the features to customize the weight of matching criteria and string sizes matching. The hybrid model's best effectiveness is obtained when the weight of instance is 0.286, the type is 0.238, width is 0.190, nullable is 0.143, unique is 0.095, and the domain is 0.048. The matching process using a bigger string size increases the model effectiveness with the highest precision of 97.66 when the string size interval is between (length-100) and (length+100). The best combination of weight and string size variation obtains 97.66% precision, a 99.90% recall, and an f-measure of 98.74%.


Keywords


criteria; effectiveness; hybrid schema matching; string size variation; weight variation

Full Text:

PDF

References


D. Sulisworo, Tawar, and U. Ahdiani, "ICT Based Information Flows and Supply Chain in Integrating Academic Business Process," International Journal on Advanced Science, Engineering and Information Technology (IJASEIT), vol. 2, no. 6, p. 454-458, 2012, DOI:10.18517/ijaseit.2.6.243.

H. H. Do, S. Melnik, and E. Rahm, "Comparison of Schema Matching Evaluations," in The 2nd International Workshop Web and Databases, In Lecture Notes In Computer Science (LNCS) 2593, Springer-Verlag, Germany, 2002, p. 221-237, DOI: 10.1007/3-540-36560-5_17.

A. Algergawy, E. Schallehn, and G. Saake, "Combining Effectiveness and Efficiency for Schema Matching Evaluation," in Proceedings of The 1st International Workshop on Model-Based Software and Data Integration (MBSDI 2008), vol. 8 (Communications In Computer And Information Science (CCIS), Berlin, Germany, 2008, p. 19-30, DOI: 10.1007/978-3-540-78999-4_4.

L. A. P. P. Leme, M. A. Casanova, K. K. Breitman, and A. L Furtado, "OWL Schema Matching," Journal of the Brazilian Computer Society, vol. 16, no. 5, p.21-34, 2010, DOI: 10.1007/s13173-010-0005-3.

E. Rahm and P. A. Bernstein, "A Survey of Approaches to Automatic Schema Matching," Very Large Databases (VLDB) Journal, vol. 10, no. 4, p. 334-350, 2001, DOI: 10.1007/s007780100057.

C. Kavitha, G. S. Sadasivam, and S. N. Shenoy, "Ontology-Based Semantic Integration of Heterogeneous Databases," European Journal of Scientific Research, vol. 64, no. 1, p. 115-122, 2011.

B. Villanyi, P. Martinek, and B. Szikora, "A Framework for Schema Matchers Composition," WSEAS Transactions on Computers Journal, vol. 9, no. 10, p. 1235-1244, 2001, URL: http://dl.acm.org/citation. cfm?id=1865307.1865322.

B. He and K.C.C. Chang, "Statistical Schema Matching Across Web Query Interfaces," in Proceedings of The ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 2003, p. 217-228, DOI: 10.1145/872757.872784.

A. H. Doan, P. Domingos, and A. Y. Halevy, "Reconciling Schemas of Disparate Data Sources-A Machine-Learning Approach," in Proceedings of The ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, USA, 2001, p. 509-520, DOI: 10.1145/376284.375731.

S. Bergamaschi, S. Castano, M. Vincini, and D. Beneventano, "Semantic Integration of Heterogeneous Information Sources," Data and Knowledge Engineering, vol. 36, no. 3, p. 215-249, 2001, DOI: 10.1016/S0169-023X(00)00047-1.

W. S. Li and C. Clifton, "SEMINT-A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Network," Data and Knowledge Engineering Journal, vol. 33, no. 1, p. 49-84, 2000, DOI: 10.1016/S0169-023X(99)00044-0.

T. Milo and S. Zohar, "Using Schema Matching to Simplify Heterogeneous Data Translation," in Proceedings of The 24th International Conference on Very Large Data Bases (VLDB), New York, USA, 1998, p. 122-133, http://dl.acm.org/citation.cfm?id=645924.671326.

M. A. Hernández, R. J. Miller, and L. M. Haas, "CLIO-A Semi-Automatic Tool for Schema Mapping, Software Demonstration," in Proceedings of The ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, USA, 2001, p. 607, DOI: 10.1145/376284.375767.

F. Naumann, C. T. Ho, X. Tian, L. Haas, and N. Megiddo, "Attribute Classification Using Feature Analysis (Poster)," in Proceedings of The 18th International Conference on Data Engineering (ICDE), San Jose, California, USA, 2002, p. 271, DOI: 10.1109/ICDE.2002.994725.

L. Popa, M. A. Hernández, F. Naumann, Y. Velegrakis, H. Ho, R. J. Miller, "Mapping XML and Relational Schemas with CLIO (Software Demonstration)," in Proceedings of The International Conference on Data Engineering (ICDE), San Jose, California, USA, 2002, p. 498-499, DOI: 10.1109/ICDE.2002.994768.

L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth, "CLIO Grows Up: From Research Prototype to Industrial Tool," in Proceedings of The ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, 2005, p. 805-810, DOI: 10.1145/1066157.1066252.

J. Madhavan, P. A. Bernstein, and E. Rahm, "Generic Schema Matching with CUPID," in Proceedings of The 27th International Conference on Very Large Data Bases (VLDB), Roma, Italy, 2001, p. 49-58, http://dl.acm.org/citation.cfm?id=645927.67219.

B. C. Chien and S. Y. He, "A Hybrid Approach for Automatic Schema Matching," in Proceedings of The 9th International Conference on Machine Learning and Cybernetics, Qingdao, China, 2010, p. 2881-2886, DOI: 10.1109/ICMLC.2010.5580776.

J. Kang and J. F. Naughton, "On Schema Matching with Opaque Column Names and Data Values," in Proceedings of The ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 2003, p. 205-216, DOI: 10.1145/872757.872783.

W. S. Li and C. Clifton, "Semantic Integration in Heterogeneous Databases Using Neural Networks," in Proceedings of The 20th International Conference on Very Large Data Bases (VLDB), Santiago de Chile, Chile, 1994, p. 1-12.

W. S. Li, C. Clifton, and S. Y. Liu, "Database Integration Using Neural Networks: Implementation and Experiences," Knowledge and Information Systems Journal, vol. 2, no. 1, p. 73-96, 2000, DOI: 10.1007/s101150050004.

H. H. Do and E. Rahm, "COMA-A System for Flexible Combination of Schema Matching Approach," in Proceedings of The 28th Conference on Very Large Data Bases (VLDB), Hong Kong, China, 2002, p. 610-621, DOI: 10.1016/B978-155860869-6/50060-3.

H. H. Do, "Schema Matching and Mapping-Based Data Integration," Interdisciplinary Center for Bioinformatics and Department of Computer Science, University of Leipzig, Leipzig, Germany, Ph.D. Thesis, 2005.

E. Rahm, "Schema Matching and Mapping: Towards Large-Scale Schema and Ontology Matching," in Data-Centric Systems and Applications, Z. Bellahsene, A. Bonifati, and E. Rahm, New York: Springer, 2011, p.3-27, DOI: 10.1007/978-3-642-16518-4_1.

J. Madhavan, P. A. Bernstein, K. Chen, A. Halevy, and P. Shenoy, "Corpus-Based Schema Matching," in Proceedings of The IJCAI-03 Workshop on Information Integration on the Web (IIWeb), Acapulco, Mexico, 2003, p. 59-63, DOI: 10.1109/ICDE.2005.39.

R. Dhamankar, Y. Lee, A. Doan, A. Halevy, and P. Domingos, "IMAP-Discovering Complex Semantic Matches between Database Schemas," in Proceedings of The ACM SIGMOD International Conference on Management of Data, Paris, Franc, 2004, p. 383-394, DOI: 10.1145/1007568.1007612.

P. A. Bernstein, S. Melnik, M Petropoulos, and C Quix, "Industrial-Strength Schema Matching," ACM SIGMOD Record, vol. 33, no. 4, p. 38-43, 2004, DOI: 10.1145/1041410.1041417.

E. Dragut and R. Lawrence, "Composing Mappings Between Schemas Using a Reference Ontology," in Proceedings of The International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE), Larnaca, Cyprus, 2004, p. 783-800, DOI: 10.1007/978-3-540-30468-5_50.

P. Mork and P. A. Bernstein, "Adapting a Generic Match Algorithm to Align Ontologies of Human Anatomy," in Proceedings of The 20th International Conference on Data Engineering (ICDE), Boston, Massachusetts, USA, 2004, p. 787-790, DOI: 10.1109/ICDE.2004.1320047.

K. W. Tu and Y. Yu, "CMC: Combining Multiple Schema-Matching Strategies Based on Credibility Prediction," in Proceedings of The 10th International Conference on Database Systems for Advanced Applications (DASFAA), Beijing, China, 2005, p. 888-893, DOI: 10.1007/11408079_80.

D. Engmann and S. Massmann, "Instance Matching with COMA++," in Datenbank Systeme in Business, Technologie und Web (BTW Workshop): Model Management and Metadata, Aachen, Germany, 2007, p.28-37, https://dbs.uni-leipzig.de/file/BTW-Workshop_2007_EngmannMassmann.pdf.

N. Jian, W. Hu, G. Cheng, and Y. Qu, "Falcon-AO: Aligning Ontologies with Falcon," in Proceedings of The K-CAP Workshop on Integrating Ontologies (K-CAP’05), Banff, Canada, 2005, p. 85-91, http://ceur-ws.org/Vol-156/paper13.pdf.

Y. R. Jean-Mary, E. P. Shironoshita, and M. R. Kabuka, "Ontology Matching with Semantic Verification," Web Semantics Journal, vol. 7, no. 3, p. 235-251, 2009, DOI: 10.1016/j.websem.2009.04.001.

P. A. Bernstein, M. Jayant, and E. Rahm, "Generic Schema Matching, Ten Years Later," in The 33th International Conference on VLDB Endowment, vol. 4, Seattle, Washington, 2011, p.695-701, http://www. vldb.org/pvldb/vol4/p695-bernstein_madhavan_rahm.pdf.

J. A. Larson, S. B. Navathe, and R. Elmasri, "A Theory of Attribute Equivalence in Databases with Application to Schema Integration," IEEETrans Software Engineering Journal, vol. 16, no. 4, p. 449-463, 1989, DOI: 10.1109/32.16605.

S. Hayne and S. Ram, "Multi User View Integration System (MUVIS): An Expert System for View Integration," in Proceedings of The 6th International Conference Data Engineering (ICDE), Los Angeles, California, 1990, p. 402-409, DOI: 10.1109/ICDE.1990.113493.

W. Gotthard, P. C. Lockemann, and A. Neufeld, "System Guided View Integration for Object Oriented Databases," Journal of IEEE Transaction Knowledge and Data Engineering, vol. 4, no. 1, p. 1-22, 1992, DOI: 10.1109/69.124894.

S. Spaccapietra and C. Parent, "View Integration: A Step Forward in Solving Structural Conflicts," IEEE Transaction Knowledge and Data Engineering, vol. 6, no. 2, p. 258-274, 1992, DOI: 10.1109/69.277770.

B. S. Lerner, "A Model for Compound Type Changes Encountered in Schema Evolution," ACM Transaction Database Systems, vol. 25, no. 1, p. 83-127, 2000, DOI: 10.1145/352958.352983.

P. Mitra, G. Wiederhold, and M. Kersten, "Graph-Oriented Model for Articulation of Ontology Interdependencies," in Proceedings of The 7th International Conference Extending Database Technology (EDBT), Konstanz, Germany, 2000, p. 86-100, http://dl.acm.org/citation.cfm?id=645339.650198.

S. Castano, V. D. Antonellis, and S. D. C. di Vimercati, "Global Viewing of Heterogeneous Data Sources," International Journal of IEEE Transaction Knowledge and Data Engineering, vol. 13, no. 2, p. 277-297, 2001, DOI: 10.1109/69.917566.

E. Bertino, G. Guerrini, and M. Mesiti, "A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and Its Applications, Information Systems," vol. 29, no. 1, p. 23-46, 2004, DOI: 10.1016/S0306-4379(03)00031-0.

J. Berlin and A. Motro, "Automatch: Database Schema Matching Using Machine Learning with Feature Selection," in Proceedings of The 14th International Conference on Advanced Information Systems Engineering (CAiSE '02), 2002, Toronto, Ontario, Canada, 2002, p. 452-466, http://dl.acm.org/citation.cfm?id=646090.680403.

J. Berlin and A. Motro, "Autoplex: Automated Discovery of Content for Virtual Databases," in Proceedings of The 9th International Conference Cooperative Information Systems (CoopIS), In Cooperation with VLDB 2001, Trento, Italy, 2001, p. 108-122, DOI: 10.1007/3-540-44751-2_10.

D. W. Embley, D. Jackmann, and L. Xu, "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration," in Proceedings of the International Workshop on Information Integration on the Web (WIIW’01), Rio de Janeiro, Brazil, 2001, p. 110-117.

L. Xu and D. Embley, "Discovering Direct and Indirect Matches for Schema Elements," in Proceedings of The 8th International Conference on Database Systems for Advanced Applications (DASFAA), Kyoto, Japan, 2003, p. 39-46, DOI: 10.1109/DASFAA.2003.1192366.

A. H. Doan, J. Madhavan, P. Domingos, and A. Halevy, "Learning to Map between Ontologies on the Semantic Web," in Proceedings of The 11th International Conference on World Wide Web (WWW), Honolulu, Hawaii, 2002, p. 662-673, DOI: 10.1145/511446.511532.

J. Wang, J. Wen, F. Lochovsky, and W. Ma, "Instance-Based Schema Matching for Web Databases by Domain-Specific Query Probing," in Proceedings of The 13th International Conference on Very Large Databases (VLDB), Toronto, Canada, 2004, p. 408-419, http://dl.acm.org/citation.cfm?id=1316689.1316726.

T. Hoshiai, Y. Yamane, D. Nakamura, and H. Tsuda, "A Semantic Category Matching Approach to Ontology Alignment," in Proceedings of The 3rd International Workshop Evaluation of Ontology Based Tools (EON), Hiroshima, Japan, 2004, p. 67-78, http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-128/EON2004_EXP_Hoshiai.pdf.

A. Bilke and F. Naumann, "Schema Matching Using Duplicates," in Proceedings of The 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, 2005, p. 69-80, DOI: 10.1109/ICDE.2005.126.

E. Sutanta, R. Wardoyo, K. Mustofa, and E. Winarko, "Survey: Models and Prototypes of Schema Matching," International Journal of Electrical and Computer Engineering (IJECE), vol. 6, no. 3, p.1011-1022, 2016, DOI: 10.11591/ijece.v6i3.pp1011-1022.

E. Sutanta, R. Wardoyo, K. Mustofa, and E. Winarko, "A Hybrid Model Schema Matching Using Constraint-Based and Instance-Based," International Journal of Electrical and Computer Engineering (IJECE), vol. 6, no. 3, p. 1048-1058, 2016, DOI: 10.11591/ijece.v6i3.pp 1048-1058.

M. T. Özsu and P. P. Valduriez, Principles of Distributed Database Systems, 3rd Edition, New York: Pearson Education, Inc., Springer, 2011.

Y. Karasneh, H. Ibrahim, M. Othman, and R. Yaakob, "Integrating Schemas of Heterogeneous Relational Databases Through Schema Matching," in Proceedings of The 11th International Conference on Information Integration and Web-based Applications and Service, 2009, DOI: 10.1145/1806338.1806380.

M. B. Shuaibu, "Determining an Appropriate Weight Attribute in Fraud Call Rate Data Using Case-Based Reasoning," International Journal on Advanced Science, Engineering and Information Technology (IJASEIT), vol. 4, no. 1, p.34-36, 2014, DOI: 10.18517/ijaseit.4.1.357.

C. J. V. Rijsbergen, Information Retrieval, 2nd Edition, London: Butterworths, 1979.

C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, London, England: The Massachusetts Institute of Technology Press, 1999.

M. Ehrig and S. Staab, "QOM-Quick Ontology Mapping," in Proceedings of The 3rd International Semantic Web Conference (ISWC), Hiroshima, 2004, p. 683-697, DOI: 10.1007/978-3-540-30475-3_47.

P. Avesani, F. Giunchiglia, and M. Yatskevich, "A Large Scale Taxonomy Mapping Evaluation," in Proceedings of The 4th International Conference on The Semantic Web Conference (ISWC), Galway, Ireland, 2005, p. 67-81, DOI: 10.1007/11574620_8.

J. Li, J. Tang, Y. Li, and Q. Luo, "RiMOM: A Dynamic Multistrategy Ontology Alignment Framework," IEEE Transaction Knowledge Data Engineering, vol. 21, no. 8, p.1218-1232, 2009, DOI: 1109/TKDE.2008.202.

P. Martinek, "Schema Matching Methodologies and Runtime Solutions in SOA Based Enterprise Application Integration," Department of Electronics Technology, Faculty of Electrical Engineering & Informatics, Budapest University of Technology and Economics, Hungary, Ph.D. Thesis, 2009.

Y. Karasneh, H. Ibrahim, M. Othman, and R. Yaakob, "An Approach for Matching Relational Database Schemas," Journal of Digital Information Management, vol. 8, no. 4, p.260-269, 2010, https://dblp.org/rec/bib/ journals/jdim/KarasnehIOY10.




DOI: http://dx.doi.org/10.18517/ijaseit.11.1.6650

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development