Tomaž Hočevar (2017) *Counting small patterns in networks*. PhD thesis.

| PDF Download (1598Kb) |

## Abstract

Networks are an often employed tool that can help us visualize and analyze binary relationships by representing the entities as a set of nodes and the relations between them as edges in the network. One type of relations in the field of bioinformatics that is often modeled by networks are interactions between pairs of proteins. Recent studies have focused on analyzing the local structure of such networks by observing small connected patterns consisting of 4 or 5 nodes, which are also known as graphlets. The nodes of graphlets are further divided into orbits by their "roles" or symmetries. The number of times a node from the network participates in each orbit forms a signature of the node's local network topology. Working under the assumption that the node's local topology is correlated with its function in the network, researchers have successfully used graphlets to predict new protein functions. The bottleneck of graphlet-based approaches is usually in the time required to count them. This restriction is becoming even more pronounced with a growing amount of available data. This dissertation focuses on improving existing graphlet counting techniques that are based on simple exhaustive enumeration. We present the algorithm Orca that counts graphlets and their orbits instead of enumerating them. It exploits relations between orbit counts to construct a system of equations that can be set up efficiently. Orca achieves this by enumerating (k-1)-node graphlets to count k-node graphlets, effectively obtaining a speed-up by a factor proportional to the maximum degree of a node in the network. In practical terms, it counts graphlets in larger protein-protein interaction networks about 50-100 times faster. Orca was designed for counting graphlets with 4 and 5 nodes. However, we adapt the approach to counting edge-orbits in addition to the original node-orbits with the same gains in run time. We also show that this approach can be generalized to graphlets of arbitrary size by identifying the necessary conditions and proving that these conditions can be fulfilled even for larger graphlets. Finally, we consider the problem of generating random graphs with prescribed graph\-let distributions. This motivated the adaptation of Orca for dynamic or changing networks, where edges can be added or removed. These changes can be a consequence of the procedure for generating a random graph or can be inherent in the network and the process it models. The generated graphs closely match the desired graphlet counts and as a consequence approximate other structural measures as well. The developed algorithm is a valuable tool for graphlet-based network analysis and a significant stepping stone towards analyzing larger and denser networks. As the fastest graphlet counting method it also presents a basis for further development of efficient pattern counting methods in graphs. This doctoral dissertation is based on three published papers that together with a chapter containing some unpublished work form the core of the dissertation.

Item Type: | Thesis (PhD thesis) | ||||||
---|---|---|---|---|---|---|---|

Keywords: | graphlets, orbits, network, graph, subgraph, pattern, counting | ||||||

Number of Pages: | 126 | ||||||

Language of Content: | English | ||||||

Mentor / Comentors: |
| ||||||

Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537692099) | ||||||

Institution: | University of Ljubljana | ||||||

Department: | Faculty of Computer and Information Science | ||||||

Item ID: | 4034 | ||||||

Date Deposited: | 09 Jan 2018 10:38 | ||||||

Last Modified: | 17 Jan 2018 09:59 | ||||||

URI: | http://eprints.fri.uni-lj.si/id/eprint/4034 |

### Actions (login required)

View Item |