Implementing linear regression with one variable to predict profits for a restaurant franchise

shahporan

9 min readFeb 7, 2024

#Problem Statement

Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet.

- You would like to expand our business to cities that may give our restaurant higher profits.

- The chain already has restaurants in various cities and you have data for profits and populations from the cities.

- You also have data on cities that are candidates for a new restaurant.

- For these cities, you have the city population.

Can you use the data to help you identify which cities may potentially give our business higher profits?

import numpy as np
import matplotlib.pyplot as plt
from utils import *
import copy
import math
%matplotlib inline

First, let’s run the cell below to import all the packages

numpy is the fundamental package for working with matrices in Python.
matplotlib is a famous library to plot graphs in Python.
utils.py contains helper functions for this assignment. You do not need to modify code in this file.

def load_data():
    data = np.loadtxt("data/ex1data1.txt", delimiter=',')
    X_train = data[:,0]
    y_train = data[:,1]
    return X_train, y_train

load_data()

The load_data() function shown below loads the data into variables x_train and y_train

x_train is the population of a city
y_train is the profit of a restaurant in that city. A negative value for profit indicates a loss.
Both X_train and y_train are numpy arrays.

View the variables

Before starting on any task, it is useful to get more familiar with our dataset.

A good place to start is to just print out each variable and see what it contains.

The code below prints the variable x_train and the type of the variable.


# print x_train
print("Type of x_train:",type(x_train))
print("First five elements of x_train are:\n", x_train[:5])

x_train is a numpy array that contains decimal values that are all greater than zero.

These values represent the city population times 10,000
For example, 6.1101 means that the population for that city is 61,101

Now, let’s print y_train

# print y_train
print("Type of y_train:",type(y_train))
print("First five elements of y_train are:\n", y_train[:5])

Similarly, y_train is a numpy array that has decimal values, some negative, some positive.

These represent our restaurant’s average monthly profits in each city, in units of $10,000.
For example, 17.592 represents $175,920 in average monthly profits for that city.
-2.6807 represents -$26,807 in average monthly loss for that city.

Check the dimensions of our variables

Another useful way to get familiar with our data is to view its dimensions.

Please print the shape of x_train and y_train and see how many training examples you have in our dataset.

print ('The shape of x_train is:', x_train.shape)
print ('The shape of y_train is: ', y_train.shape)
print ('Number of training examples (m):', len(x_train))

Visualize our data

It is often useful to understand the data by visualizing it.

For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population).
Many other problems that you will encounter in real life have more than two properties (for example, population, average household income, monthly profits, monthly sales).When you have more than two properties, you can still use a scatter plot to see the relationship between each pair of properties.

# Create a scatter plot of the data. To change the markers to red "x",
# we used the 'marker' and 'c' parameters
plt.scatter(x_train, y_train, marker='x', c='r') 

# Set the title
plt.title("Profits vs. Population per city")
# Set the y-axis label
plt.ylabel('Profit in $10,000')
# Set the x-axis label
plt.xlabel('Population of City in 10,000s')
plt.show()

our goal is to build a linear regression model to fit this data.

With this model, you can then input a new city’s population, and have the model estimate our restaurant’s potential monthly profits for that city.

Compute Cost —

Gradient descent involves repeated steps to adjust the value of our parameter (w,b) to gradually get a smaller and smaller cost j(w,b).

At each step of gradient descent, it will be helpful for you to monitor our progress by computing the cost j(w,b) as (w,b) gets updated.
In this section, you will implement a function to calculate j(w,b). so that you can check the progress of our gradient descent implementation.

Cost function

As you may recall from the lecture, for one variable, the cost function for linear regression j(w,b) is defined as

Model prediction

# UNQ_C1
# GRADED FUNCTION: compute_cost

def compute_cost(x, y, w, b): 
    """
    Computes the cost function for linear regression.
    
    Args:
        x (ndarray): Shape (m,) Input to the model (Population of cities) 
        y (ndarray): Shape (m,) Label (Actual profits for the cities)
        w, b (scalar): Parameters of the model
    
    Returns
        total_cost (float): The cost of using w,b as the parameters for linear regression
               to fit the data points in x and y
    """

    # number of training examples
    m = x.shape[0] 
    
    total_cost = 0
    
    ### START CODE HERE ###
    for i in range(m):                                
        f_wb_i = np.dot(x[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        
        total_cost = total_cost + (f_wb_i - y[i]) **2       #scalar
        
    total_cost = total_cost / (2 * m)                      #scalar   
    ### END CODE HERE ### 

    return total_cost

Let’s, explain this code, Here iscompute_costfunction which takes in four parameters: x, y, w, and b

x is the Input to the model (Population of cities)

y is the Label (Actual profits for the cities).

w and b are the parameters of the model.

Return — total_cost (float): The cost of using w,b as the parameters for linear regression to fit the data points in x and y.

Predication of the model — f_wb(xi) = w * xi + b

the cost — cost_i = (f_wb(xi) — yi)²

Total Cost over all examples — J(w, b) = (1/2m) * sum(cost_i)

# Compute cost with some initial values for paramaters w, b
initial_w = 2
initial_b = 1

cost = compute_cost(x_train, y_train, initial_w, initial_b)
print(type(cost))
print(f'Cost at initial w: {cost:.3f}')

# Public tests
from public_tests import *
compute_cost_test(compute_cost)

# UNQ_C2
# GRADED FUNCTION: compute_gradient
def compute_gradient(x, y, w, b): 
    
    # Number of training examples
    m = x.shape[0]
    
    # You need to return the following variables correctly
    dj_dw = 0
    dj_db = 0
    
    ### START CODE HERE ###
    for i in range(m):
        
        # Compute prediction model
        fw_b = w * x[i] + b
        
        # Compute gradient parameters w, b
        dj_dw_i = (fw_b - y[i]) * x[i]
        dj_db_i = fw_b - y[i]
        
        # update the total gradient
        dj_db += dj_db_i
        dj_dw += dj_dw_i
    
    # Return average gradient update
    dj_db = dj_db / m
    dj_dw = dj_dw / m
    
    ### END CODE HERE ### 
        
    return dj_dw, dj_db

Here is the explanation of the code with this equation

Computes the gradient for linear regression

x (ndarray): Shape (m,) Input to the model (Population of cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b

prediction of the model f_wb = w * X[i] + b

gradient for the parameter b ∂J(w,b)/∂b^(i) = f_wb - y[i]

gradient for the parameter w ∂J(w,b)/∂w^(i) = (f_wb - y[i]) * X[i]

total gradient update for the parameters w, b∂J(w,b)/∂b = (1/m) * ∑(∂J(w,b)/∂b^(i)) ∂J(w,b)/∂w = (1/m) * ∑(∂J(w,b)/∂w^(i))

Run the cells below to check our implementation of the compute_gradient function with two different initializations of the parameters w, b

# Compute and display gradient with w initialized to zeroes
initial_w = 0
initial_b = 0

tmp_dj_dw, tmp_dj_db = compute_gradient(x_train, y_train, initial_w, initial_b)
print('Gradient at initial w, b (zeros):', tmp_dj_dw, tmp_dj_db)

compute_gradient_test(compute_gradient)

# Compute and display cost and gradient with non-zero w
test_w = 0.2
test_b = 0.2
tmp_dj_dw, tmp_dj_db = compute_gradient(x_train, y_train, test_w, test_b)

print('Gradient at test w, b:', tmp_dj_dw, tmp_dj_db)

Learning parameters using batch gradient descent

You will now find the optimal parameters of a linear regression model by using batch gradient descent. Recall batch refers to running all the examples in one iteration.

You don’t need to implement anything for this part. Simply run the cells below.
A good way to verify that gradient descent is working correctly is to look at the value of j(w,b) and check that it is decreasing with each step.
Assuming you have implemented the gradient and computed the cost correctly and you have an appropriate value for the learning rate alpha, j(w,b) should never increase and should converge to a steady value by the end of the algorithm.

def gradient_descent(x, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x :    (ndarray): Shape (m,)
      y :    (ndarray): Shape (m,)
      w_in, b_in : (scalar) Initial values of parameters of the model
      cost_function: function to compute cost
      gradient_function: function to compute the gradient
      alpha : (float) Learning rate
      num_iters : (int) number of iterations to run gradient descent
    Returns
      w : (ndarray): Shape (1,) Updated values of parameters of the model after
          running gradient descent
      b : (scalar)                Updated value of parameter of the model after
          running gradient descent
    """

# number of training examples
    m = len(x)
    
    # An array to store cost J and w's at each iteration — primarily for graphing later
    J_history = []
    w_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_dw, dj_db = gradient_function(x, y, w, b )  

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               
        b = b - alpha * dj_db               

        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            cost =  cost_function(x, y, w, b)
            J_history.append(cost)

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            w_history.append(w)
            print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}   ")
        
    return w, b, J_history, w_history #return w and J,w history for graphing

Now let’s run the gradient descent algorithm above to learn the parameters for our dataset.

# initialize fitting parameters. Recall that the shape of w is (n,)
initial_w = 0.
initial_b = 0.

# some gradient descent settings
iterations = 1500
alpha = 0.01

w,b,_,_ = gradient_descent(x_train ,y_train, initial_w, initial_b, 
                     compute_cost, compute_gradient, alpha, iterations)

We will now use the final parameters from gradient descent to plot the linear fit.

Recall that we can get the prediction for a single example

To calculate the predictions on the entire dataset, we can loop through all the training examples and calculate the prediction for each example. This is shown in the code block below.

m = x_train.shape[0]
predicted = np.zeros(m)

for i in range(m):
    predicted[i] = w * x_train[i] + b

We will now plot the predicted values to see the linear fit.

# Plot the linear fit
plt.plot(x_train, predicted, c = "b")

# Create a scatter plot of the data. 
plt.scatter(x_train, y_train, marker='x', c='r') 

# Set the title
plt.title("Profits vs. Population per city")
# Set the y-axis label
plt.ylabel('Profit in $10,000')
# Set the x-axis label
plt.xlabel('Population of City in 10,000s')

our final values of w,b can also be used to make predictions on profits. Let’s predict what the profit would be in areas of 35,000 and 70,000 people.

The model takes in population of a city in 10,000s as input.
Therefore, 35,000 people can be translated into an input to the model as np.array([3.5])
Similarly, 70,000 people can be translated into an input to the model as np.array([7.])

predict1 = 3.5 * w + b
print('For population = 35,000, we predict a profit of $%.2f' % (predict1*10000))

predict2 = 7.0 * w + b
print('For population = 70,000, we predict a profit of $%.2f' % (predict2*10000))

Congratulations we complete the linear regression