1.6. Analyzing Results

Note

The following instructions are for analyzers with output type json. See analyzer.json to learn more about this and other output types.

Now that your analyzer has been run in the web UI (and completed successfully, we hope!) we’ll write some Python to load and graph the results.

When you run an analyzer on the r2c platform we store the results in a PostgreSQL database. To interact with this database, we’ll use a mix of sqlalchemy, psycopg2, pandas, and matplotlib Python libraries. Let’s install these in your local environment:

$ pip3 install sqlalchemy psycopg2-binary pandas matplotlib

Next, create a file in a scratch directory called minifinder-results.py:

$ touch minifinder-results.py

Copy the following code into minifinder-results.py and update the constants with your org information:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import os
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
import matplotlib.pyplot as plt

###################################
# EDIT THESE CONSTANTS
###################################

DB_PASSWORD = "DB-PASSWORD-FROM-EMAIL"
AUTHOR_NAME = "YOUR-NAME"

ANALYZER_NAME = f"beta/{AUTHOR_NAME}-minifinder"
ANALYZER_VERSION = "0.1.0"
CORPUS_NAME = "r2c-1000"

###################################
# END EDIT SECTION
###################################

# Canonical SQL query to get job-specific results back.
JOB_QUERY = """
SELECT *
FROM   result,
       commit_corpus
WHERE  result.commit_hash = commit_corpus.commit_hash
       AND analyzer_name = %(analyzer_name)s
       AND analyzer_version = %(analyzer_version)s
       AND corpus_name = %(corpus_name)s
"""

QUERY_PARAMS = {
    "corpus_name": CORPUS_NAME,
    "analyzer_name": ANALYZER_NAME,
    "analyzer_version": ANALYZER_VERSION,
}

# Connect to PostgreSQL host and query for job-specific results
engine = create_engine(f"postgresql://notebook_user:{DB_PASSWORD}@db.r2c.dev/postgres")
job_df = pd.read_sql(JOB_QUERY, engine, params=QUERY_PARAMS)

# Print pandas dataframe to stdout for debugging
print("Raw job dataframe:")
print(job_df[1:10])

# Helper method to compute % whitespace from the num_whitespace and size fields in our 'extra' column
def get_percent_whitespace(row):
    size = row.extra["size"]
    # Avoid 'division by zero' exceptions.
    return row.extra["num_whitespace"] / size if size else 0


# Add 'percent_whitespace' column
job_df["percent_whitespace"] = job_df.apply(get_percent_whitespace, axis=1)

# Create a histogram of our data using the `percent_whitespace` column
job_df.hist(column="percent_whitespace", bins=100)
plt.show()

Finally, run minifinder-results.py, which will produce a graph of results:

$ python3 minifinder-results.py
A histogram plot of minifinder results, bucketed by percent whitespace.

Success! You’ve gone from a question, “how much of each file in a project is whitespace”, to analyzing 1000 npm projects, to graphing the results!

Next up is writing your own analyzer, interrogating your data, and making the JavaScript ecosystem a better place. If you have any questions or concerns, please don’t hesitate to reach out to us at .