We are surrounded by continuously evolving financial, health, transportation, social, citation, and sensor networks. By modeling these networks as graphs whose vertices represent entities and edges represent relationships between entities, network evolution can be captured and harnessed for analytic purposes by recording periodic snapshots of these graphs. Data scientists using these graph snapshots can analyze network evolution to discover trends and insights crucial to many fields including networks of electronic health records, geolocation trackers, diagnostic data from connected devices (e.g., smart watches), and much more.
Graph snapshots allow us to analyze the evolution of a network over time by examining variations of certain features such as the distribution of vertex degrees and clustering coefficients, network density, size of connected components, shortest distance between pairs of vertices, and the centrality or eccentricity of vertices. Trends discovered by these analyses play a crucial role in sociopolitical science, marketing, security, transportation, epidemiology, and many other areas.
G* (pronounced "jee star") compresses dynamic graph data based on commonalities among the graphs in the series for deduplicated storage on multiple servers. In addition to the obvious space-saving advantages, large-scale graph processing tends to be I/O bound, so faster reads from and writes to stable storage enables faster results. Unlike traditional database and graph processing systems, G* executes complex queries on large graphs using distributed operators to process graph data in parallel. It speeds up queries on multiple graphs by processing graph commonalities only once and sharing the results across relevant graphs.
G* Studio is an interactive environment for G* available as an embeddable component or as a stand-alone application. Its features include the following: