Browsing by Author "Coskun, Mustafa"

Now showing 1 - 10 of 10

Consensus embedding for multiple networks: Computation and applications
(CAMBRIDGE UNIV PRESS32 AVENUE OF THE AMERICAS, NEW YORK, NY 10013-2473, 2022) Li, Mengzhen; Coskun, Mustafa; Koyuturk, Mehmet; 0000-0002-2266-4313; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Coşkun, Mustafa
Machine learning applications on large-scale network-structured data commonly encode network information in the form of node embeddings. Network embedding algorithms map the nodes into a lowdimensional space such that the nodes that are “similar” with respect to network topology are also close to each other in the embedding space. Real-world networks often have multiple versions or can be “multiplex” with multiple types of edges with different semantics. For such networks, computation of Consensus Embeddings based on the node embeddings of individual versions can be useful for various reasons, including privacy, efficiency, and effectiveness of analyses. Here, we systematically investigate the performance of three dimensionality reduction methods in computing consensus embeddings on networks with multiple versions: singular value decomposition, variational auto-encoders, and canonical correlation analysis (CCA). Our results show that (i) CCA outperforms other dimensionality reduction methods in computing concensus embeddings, (ii) in the context of link prediction, consensus embeddings can be used to make predictions with accuracy close to that provided by embeddings of integrated networks, and (iii) consensus embeddings can be used to improve the efficiency of combinatorial link prediction queries on multiple networks by multiple orders of magnitude
Developing a label propagation approach for cancer subtype classification problem
(TUBITAK SCIENTIFIC & TECHNICAL RESEARCH COUNCIL TURKEYATATURK BULVARI NO 221, KAVAKLIDERE, ANKARA 00000, TURKEY, 2022) Guner, Pinar; Bakir-Gungor, Burcu; Coskun, Mustafa; 0000-0001-5979-0375; 0000-0002-2272-6270; 0000-0003-4805-1416; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Guner, Pinar; Bakir-Gungor, Burcu; Coskun, Mustafa
Cancer is a disease in which abnormal cells grow uncontrollably and invade other tissues. Several types of cancer have various subtypes with different clinical and biological implications. Based on these differences, treatment methods need to be customized. The identification of distinct cancer subtypes is an important problem in bioinformatics, since it can guide future precision medicine applications. In order to design targeted treatments, bioinformatics methods attempt to discover common molecular pathology of different cancer subtypes. Along this line, several computational methods have been proposed to discover cancer subtypes or to stratify cancer into informative subtypes. However, existing works do not consider the sparseness of data (genes having low degrees) and result in an ill-conditioned solution. To address this shortcoming, in this paper, we propose an alternative unsupervised method to stratify cancer patients into subtypes using applied numerical algebra techniques. More specifically, we applied a label propagationbased approach to stratify somatic mutation profiles of colon, head and neck, uterine, bladder, and breast tumors. We evaluated the performance of our method by comparing it to the baseline methods. Extensive experiments demonstrate that our approach highly renders tumor classification tasks by largely outperforming the state-of-the-art unsupervised and supervised approaches.
Fast computation of Katz index for efficient processing of link prediction queries
(SPRINGERVAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS, 2021) Coskun, Mustafa; aggag, Abdelkader; Koyuturk, Mehmet; 0000-0003-4805-1416; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Coskun, Mustafa
Network proximity computations are among the most common operations in various data mining applications, including link prediction and collaborative filtering. A common measure of network proximity is Katz index, which has been shown to be among the best-performing path-based link prediction algorithms. With the emergence of very large network databases, such proximity computations become an important part of query processing in these databases. Consequently, significant effort has been devoted to developing algorithms for efficient computation of Katz index between a given pair of nodes or between a query node and every other node in the network. Here, we present LRC-Katz, an algorithm based on indexing and low rank correction to accelerate Katz index based network proximity queries. Using a variety of very large real-world networks, we show that LRC-Katzoutperforms the fastest existing method, Conjugate Gradient, for a wide range of parameter values. Taking advantage of the acceleration in the computation of Katz index, we propose a new link prediction algorithm that exploits locality of networks that are encountered in practical applications. Our experiments show that the resulting link prediction algorithm drastically outperforms state-of-the-art link prediction methods based on the vanilla and truncated Katz.
Integrated querying and version control of context-specific biological networks
(OXFORD UNIV PRESS, GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND, 2020) Cowman, Tyler; Coskun, Mustafa; Grama, Ananth; Koyuturk, Mehmet; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü;
Motivation: Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. Results: We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. Conclusion: Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. Availability and Implementation: VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion Contact: tyler.cowman@case.edu
Intrinsic graph topological correlation for graph convolutional network propagation
(ELSEVIER, 2022) Coskun, Mustafa; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Coskun, Mustafa
Recently, Graph Convolutional Networks (GCNs) and their variants become popular to learn graph-related tasks. These tasks include link prediction, node classification, and node embedding, among many others. In the node classification problem, the input is a graph with some labeled nodes and the features associated with these nodes and the objective is to predict the unlabeled nodes. While the GCNs have been successfully applied to this problem, some caveats that are inherited from classical deep learning remain unsolved. One such inherited caveat is that, during classification, GCNs only consider the nodes that are a few neighbors away from the labeled nodes. However, considering only a few steps away nodes could not effectively exploit the underlying graph topological information. To remedy this problem, the state-of-the-art methods leverage the network diffusion approaches, such as personalized PageRank and its variants, to fully account for the graph topology. However, these approaches overlook the fact that the network diffusion methods favour high degree nodes in the graph, resulting in the propagation of the labels to the unlabeled,hub nodes. In order to overcome bias, in this paper, we propose to utilize a dimensionality reduction technique, which is conjugate with personalized PageRank. Testing on four real-world networks that are commonly used in benchmarking GCNs' performance for the node classification task, we systematically evaluate the performance of the proposed methodology and show that our approach outperforms existing methods for wide ranges of parameter values. Since our method requires only a few training epochs, it releases the heavy training burden of GCNs. The source code of the proposed method is freely available at https://github.com/mustafaCoskunAgu/ScNP/blob/master/TRJMain.m.
Linear vs. Non-Linear Embedding Methods in Recommendation Systems
(Institute of Electrical and Electronics Engineers Inc., 2022) Gurler, Kerem; Coskun, Mustafa; Karagenc, Safak; Orun, Gokhan; Pak, Burcu Kuleli; Gungor, Vehbi Cagri; 0000-0003-0803-8372; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Coskun, Mustafa; Pak, Burcu Kuleli; Gungor, Vehbi Cagri
Predicting customer interest in items is very crucial in direct marketing as it can potentially boost sales. Data mining techniques are developed to predict which items a particular user might be interested in based on their purchase history or explicit feedback in form of ratings or comments. Recently, non-linear and linear methods have been developed for this purpose. In this study, we applied Neighborhood based Collaborative Filtering (CF), Matrix Factorization (MF), Singular Value Decomposition (SVD), Neural Graph CF (NGCF) and Light Graph Convolutional Network (LightGCN) on explicit user product rating data which is acquired from the online gaming and mobile entertainment platform called HADI. We compared the results of node embedding methods in terms of Precision@k, Recall@k and NDCG@k values. SVD and LightGCN showed the best test performance and SVD was significantly superior to LightGCN in terms of training speed. To further increase predictive performance of SVD, we have applied classification with Logistic Regression and Deep Random Forest on user and item embeddings created by the SVD.
Node similarity-based graph convolution for link prediction in biological networks
(OXFORD UNIV PRESSGREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND, 2021) Coskun, Mustafa; Koyuturk, Mehmet; 0000-0003-4805-1416; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Coskun, Mustafa
Background: Link prediction is an important and well-studied problem in network biology. Recently, graph representation learning methods, including Graph Convolutional Network (GCN)-based node embedding have drawn increasing attention in link prediction. Motivation: An important component of GCN-based network embedding is the convolution matrix, which is used to propagate features across the network. Existing algorithms use the degree-normalized adjacency matrix for this purpose, as this matrix is closely related to the graph Laplacian, capturing the spectral properties of the network. In parallel, it has been shown that GCNs with a single layer can generate more robust embeddings by reducing the number of parameters. Laplacian-based convolution is not well suited to single-layered GCNs, as it limits the propagation of information to immediate neighbors of a node. Results: Capitalizing on the rich literature on unsupervised link prediction, we propose using node similarity-based convolution matrices in GCNs to compute node embeddings for link prediction. We consider eight representative node-similarity measures (Common Neighbors, Jaccard Index, Adamic-Adar, Resource Allocation, Hub- Depressed Index, Hub-Promoted Index, Sorenson Index and Salton Index) for this purpose. We systematically compare the performance of the resulting algorithms against GCNs that use the degree-normalized adjacency matrix for convolution, as well as other link prediction algorithms. In our experiments, we use three-link prediction tasks involving biomedical networks: drug-disease association prediction, drug-drug interaction prediction and protein-protein interaction prediction. Our results show that node similarity-based convolution matrices significantly improve the link prediction performance of GCN-based embeddings. Conclusion: As sophisticated machine-learning frameworks are increasingly employed in biological applications, historically well-established methods can be useful in making a head-start.
OFFER : Referees Suggester for the Journal Editors
(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2019) Coskun, Mustafa; Hacilar, Hilal; Gezer, Cengiz; Gungor, Vehbi Cagri; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
Assigning appropriate referees to a journal or conference paper is a vital task for many reasons, including enhancing the journal venue quality and reliance, fair judgement of the papers, and among many others. While assigning the referees to the papers, the editors of a journal venue need to find suitable referees who are both related to field of the given paper and have no conflict of interest with the authors of the paper. Editorial-wise this referee assignment process is implemented in a hand-crafted manner, i.e., the editor needs to find the most suitable referees to the paper via a search engine and manually refines the all unrelated and having conflict of interest authors to the paper authors. Clearly, such a manual referee searching process is tedious and time consuming for the editors. In this paper, we present an alternate automated approach for assigning referees problem using intrinsic random walk with restart proximity measure. In our experiments based on a comprehensive DBLP networks, we show that our approach, called OFFER, significantly outperforms state-of-the-art the random walk with restart based method.
Topological feature generation for link prediction in biological networks
(PEERJ INC, 2023) Temiz, Mustafa; Bakir-Gungor, Burcu; Sahan, Pinar Guner; Coskun, Mustafa; 0000-0002-5736-5495; 0000-0002-2272-6270; 0000-0001-5979-0375; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Temiz, Mustafa; Bakir-Gungor, Burcu; Sahan, Pinar Guner
Graph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.
Traffic Light Management Systems Using Reinforcement Learning
(Institute of Electrical and Electronics Engineers Inc., 2022) Can, Sultan Kubra; Thahir, Adam; Coskun, Mustafa; Gungor, Vehbi Cagri; 0000-0003-0803-8372; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Can, Sultan Kubra; Thahir, Adam; Coskun, Mustafa; Gungor, Vehbi Cagri
While reducing traffic congestion and decrease the number of traffic accidents in the intersections, most of the traffic light management approaches cannot adapt well to fast changing traffic dynamics and growing demands of the intersections with modern world developments. To overcome this problem, adaptive traffic controllers are developed, and detectors and sensors are added to systems to enable adoption and dynamism. Recently, reinforcement learning has shown its capability to learn the dynamics of complex environments, such as urban traffic. Although it was studied in single junction systems, one of the problems was the lack of consistency with how the real world system works. Most of the systems assume that the environment is fully observable or actions would be freely executed using simulators. This study aims to merge usefulness of reinforcement learning methods with real-world traffic constraints. Comparative performance evaluations show that the reinforcement learning algorithm (Advantage Actor-Critic (A2C)) converges well while staying stable under changing traffic dynamics.