top of page
Upgrade to Get Unlimited Access
($10 One Off Payment)

Develop a Data Analytics Web App in 3 Steps

Updated: Feb 24, 2023

Use Streamlit to Create Your First YouTube Analytics App

For the majority of the time, data science/data analytics projects end up delivering a static report or dashboard, which tremendously downgrades the efforts and thoughts being put into the process. Alternatively, web app is a great way to demonstrate your data analytics work, which can be further expanded as a service through self-served and interactive platforms. However, as data scientists or data analysts, we are not trained for developing softwares or websites. In this article, I would like to introduce tools like Streamlit and Plotly that allows us to easily develop and serve your data analytics projects through a web app, with the following three steps:

  • Extract Data and Build Database

  • Define Data Analytics Process as Functions

  • Construct Web App Interface

At the end of this article, hope that you would be able to create a web app like below. If you like to get the source code of this web app, please consider becoming our premium member and visit our code snippet page for many more notebooks.


Step 1. Extract Data and Build Database

In this exercise, we will use YouTube Data, since it is relevant to our daily life. Feel free to try it out using your own dataset. YouTube Data API allows us to get public YouTube data, such as video statistics (e.g. number of likes, views) or content details (e.g. tags, title, comments). To set up the YouTube API, it is required to sign up a Google Developer account and set up an API key. Here are some resources I found helpful to get myself started on using YouTube API.

These resources takes us through how to create a YouTube API key and install required library (e.g. googleapiclient.discovery). After these dependencies have been resolved, we set up the connection to the API using Python and configure your own API key api_key, using the following command:

from googleapiclient.discovery import build
youtube = build('youtube', 'v3', developerKey=api_key)

After establishing the connection, it's time to explore what data is available for your data science projects. To do this, take a look at the YouTube Data API documentation, which provides an overview of the different kinds of data that can be accessed.

We will use “Videos” as an example for this project and the list() method allows us to request the “Video Resource” by passing the part parameter and several filters. "part" parameter specifies which components from the Video Resource you would like to fetch and here I am getting snippet, statistics, and contentDetails. Have a look at this documentation which details all fields you can get from videos().list() method. And we specify the following "filter" parameters to limit the results returned from this request.

  • chart='mostPopular': get the most popular videos

  • regionCode='US': videos from US

  • videoCategoryId=1: get the videos from a specific video category (e.g. 1 is for Film & Animation). Here is the catalog of video category ID).

  • maxResults=20: return a maximum number of 20 videos

video_request = youtube.videos().list(
                part='snippet,statistics,contentDetails',
                chart='mostPopular',
                regionCode='US',
                videoCategoryId=1,
                maxResults=20
		      )
response = video_request.execute()

We then execute the request using video_request.execute() and the response will be returned as JSON format, which typically looks like the snapshot below.

All information are stored in the “items” in the response. Then we extract the ‘items’ key and create the dataframe video_df by normalizing the JSON format.

video_df = json_normalize(response['items'])

As the result, we tidy up the output into a structure that is easier to manipulate.

To take a step further of working with JSON using Python, I recommend reading the article "How to Best Work with JSON in Python".



Step 2. Define Data Analytics Process as Function

We can package multiple lines of code statements into one function, so that it can be iteratively executed and easily embedded with other web app components at later stage.


Define extractYouTubeData()

For instance, we can encapsulate the data extraction process above into a function: extractYouTubeData(youtube, categoryId), which allows us to pass a categoryId variable and output the top 20 popular videos under that category as video_df. In this way, we can get user's input on which category they would like to select, then feed the input into this function and get the corresponding top 20 videos.

def extractYouTubeData(youtube, categoryId):
    video_request = youtube.videos().list(
    part='snippet,statistics,contentDetails',
    chart='mostPopular',
    regionCode='US',
    videoCategoryId=categoryId,
    maxResults=20
    )
    response = video_request.execute()
    video_df = json_normalize(response['items'])
    return video_df

We can use video_df.info() to get all columns in this dataframe.

With this valuable dataset we can carry out a large variety of analysis, such as exploratory data analysis, sentiment analysis, topic modeling etc.

I would like to start with designing the app for some exploratory data analysis and displaying:

  • the video duration vs. the number of likes

  • the most frequently occurred tags among these most popular videos

In the future articles, I will explore more techniques such as topic modeling and natural language processing to analyze the video title and comments. Therefore, if you would like to read more of my articles and support my work, consider treating me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.


Define plotVideoDurationStats()

I would like to know whether video duration has obvious correlation with the number of likes for these popular videos. To achieve this, we need to transform the “contentDetails.duration” from ISO datetime format into numeric values using isodate.parse_duration().total_seconds(). Then we can use scatter plot to visualize the video duration against the like count. This is carried out using Plotly which allows more interactive experience for end users. The code snippet below returns the Plotly figure which will be embedded into our web app.

import isodate
import plotly.express as px

def plotVideoDurationStats(video_df):
    video_df['contentDetails.duration'] = video_df['contentDetails.duration'].astype(str)
    video_df['duration'] = video_df['contentDetails.duration'].apply(lambda x: isodate.parse_duration(x).total_seconds())
    fig = px.scatter(video_df, x="duration", y='statistics.likeCount', color_discrete_sequence=px.colors.qualitative.Safe)
    return fig

To explore more tutorials based on Plotly, check out these blogs below:

Define plotTopNTags()

This function creates the figure of top N tags of the certain video category. Firstly, we iterate through all snippet.tags and collect all tags into a tag list. We then create the tags_freq_df that describe the counts of top N most frequent tags. Lastly, we use px.bar() to display the chart.

def plotTopNTags(video_df, topN):
    tags = []
    for i in video_df['snippet.tags']:
        if type(i) != float:
            tags.extend(i)
    tags_df = pd.DataFrame(tags)
    tags_freq_df = tags_df.value_counts().iloc[:topN].rename_axis('tag').reset_index(name='frequency')
    fig = px.bar(tags_freq_df, x='tag', y='frequency')
    return fig

Step 3. Construct Web App Interface

We will use Streamlit to develop the web app interface. It is the easiest tool I found so far for web app development running on top of Python. It saves us the hassle to handle the HTTP request, define routes, or write HTML and CSS code.

Run "!pip install streamlit" to install Streamlit to your machine, or use the documentation to install Streamlit in your preferred development environment.

Creating a web app component is very easy using Streamlit. For example displaying a title is as simple as below:

import streamlit as st
st.title('Trending YouTube Videos')

Here we need several components to develop the web app.

  • input: a dropdown menu for users to select video category

This code snippet allows us to create a select box with the prompt “Select YouTube Video Category” and options to choose from 'Film & Animation', 'Music', 'Sports', 'Pets & Animals'.

videoCategory = st.selectbox(
    'Select YouTube Video Category',
    ('Film & Animation', 'Music', 'Sports', 'Pets & Animals')
)
  • input: a slider for users choose the number of tags

This defines the slider and specifies the slider range from 0 to 20.

topN = st.slider('Select the number of tags to display',0, 20)
  • output: a figure that displays the video duration vs. number of likes

We firstly create the videoCategroyDict to convert the category name into categoryId, then pass the categoryId through the extractYouTubeData() function that we defined previously. Check out this page for a full list of the video category their corresponding categoryId.

We then call the plotVideoDuration() function and display the plotly chart using st.plotly_chart().

videoCategoryDict = {'Film & Animation': 1, 'Music': 10, 'Sports': 17, 'Pets & Animals': 15}
categoryId = videoCategoryDict[videoCategory]
video_df = extractYouTubeData(youtube, categoryId)
duration_fig = plotVideoDurationStats(video_df)
fig_title1 = 'Durations(seconds) vs Likes in Top ' + videoCategory + ' Videos'
st.subheader(fig_title1)
st.plotly_chart(duration_fig)
  • output: a figure that displays the top tags in that video category

The last component requires us to feed user’s input of number of tags to the function plotTopNTags(), and create the plot by calling the function.

tag_fig = plotTopNTags(video_df, topN)
fig_title2 = 'Top '+ str(topN) + ' Tags in ' + videoCategory + ' Videos'
st.subheader(fig_title2)
st.plotly_chart(tag_fig)

These code statements can be all contained in a single Python file python (e.g. YoutTubeDataApp.py). Then we navigate to the command line and use "!streamlit run YouTubeDataApp.py" to run the app in a web browser.


Take-Home Message

Developing a web app may sound intimidating for data analysts and data scientists. This post covers following three steps to get your hands on developing your first web app and extend your data analytics projects to a self-served platform.

  • Extract Data and Build Database

  • Define Data Analytics Process as Functions

  • Construct Web App Interface


2,594 views1 comment
bottom of page