EECS 442: Computer Vision (Winter 2024)

\( \newcommand{\EB}{\mathbf{E}} \newcommand{\FB}{\mathbf{F}} \newcommand{\IB}{\mathbf{I}} \newcommand{\KB}{\mathbf{K}} \newcommand{\MB}{\mathbf{M}} \newcommand{\RB}{\mathbf{R}} \newcommand{\XB}{\mathbf{X}} \newcommand{\pB}{\mathbf{p}} \newcommand{\tB}{\mathbf{t}} \newcommand{\zeroB}{\mathbf{0}} \)

Homework 6 – 3D Deep Learning

Instructions

This homework is due at 11:59 p.m. on Wednesday April 17th, 2024.

The submission includes two parts:

  1. To Canvas: submit a zip file containing a single directory with your uniqname as the name that contains all your code and anything else asked for on the Canvas Submission Checklist. Don’t add unnecessary files or directories.

    - We have indicated questions where you have to do something in code in red. If Gradescope asks for it, also submit your code in the report with the formatting below. Please include the code in your gradescope submission.

    Starter code is given to you on Canvas under the “Homework 6” assignment. You can also download it here. Clean up your submission to include only the necessary files. Pay close attention to filenames for autograding purposes.

    Submission Tip: Use the Tasks Checklist and Canvas Submission Checklist at the end of this homework.

  2. To Gradescope: submit a pdf file as your write-up, including your answers to all the questions and key choices you made.

    - We have indicated questions where you have to do something in the report in green. For coding questions please include in the code in the report.

    The write-up must be an electronic version. No handwriting, including plotting questions. \(\LaTeX\) is recommended but not mandatory.

    For including code, do not use screenshots. Generate a PDF using a tool like this or using this Overleaf LaTeX template. If this PDF contains only code, be sure to append it to the end of your report and match the questions carefully on Gradescope.

NERF:

Python Environment

The autograder uses Python 3.7. Consider referring to the Python standard library docs when you have questions about Python utilties.

To make your life easier, we recommend you to install the latest Anaconda for Python 3.7. This is a Python package manager that includes most of the modules you need for this course. We will make use of the following packages extensively in this course:

Temple
Temple
zrtrans
zrtrans
reallyInwards
reallyInwards
Figure 1: Epipolar lines for some of the datasets

Estimation of the Fundamental Matrix and Epipoles

Data: we give you a series of datasets that are nicely bundled in the folder task1/. Each dataset contains two images img1.png and img2.png and a numpy file data.npz containing a whole bunch of variables. The script task1.py shows how to load the data.

Credit: temple comes from Middlebury’s Multiview Stereo dataset.

Task 1: Estimating \(\FB\) and Epipoles

  1. (15 points) Fill in find_fundamental_matrix in task1.py. You should implement the eight-point algorithm mentioned in the lecture. Remember to normalize the data and to reduce the rank of \(\FB\). For normalization, you can scale the image size and center the data at 0. We want you to “estimate” the fundamental matrix here so it’s ok for your result to be slighly off from the opencv implementation.

  2. (10 points) Fill in compute_epipoles. This should return the homogeneous coordinates of the epipoles – remember they can be infinitely far away! For computing the nullspace of F, using SVD would be helpful!

  3. (5 points) Show epipolar lines for temple, reallyInwards, and another dataset of your choice.

  4. (5 points) Report the epipoles for reallyInwards and xtrans.

3D Generation

Task 2: Neural radiance fields

We will fit a neural radiance field (NeRF) to a collection of photos (with their camera pose), and use it to render a scene from different (previously unseen) viewpoints. To estimate the color of a pixel, we will estimate the 3D ray that exist the pixel. Then, we will walk in the direction of the ray and query the network at each point. Finally, we will use volume rendering to obtain the pixel’s RGB color, thereby accounting for occlusion.

It is an MLP \(F_\Theta\) such that

\[F_\Theta(x, y, z, \theta, \phi) = (R, G, B, \sigma)\]

where \((x, y, z)\) is a 3D point in the scene, and \((\theta, \phi)\) is a viewing direction. It returns a color \((R, G, B)\) and a (non-negative) density \(( \sigma)\) that indicates whether this point in space is occupied.

  1. (10 points) Implement the function positional_encoder(x, L_embed = 6) that encodes the input x as \(\gamma(x) = (x, \sin(2^{0}x), \cos(2^{0}x), \ldots, \sin(2^{L_{embed}-1}x), \cos(2^{L_{embed}-1}x)).\)

  2. (10 points) Implement the code that samples 3D points along a ray in render. This will be used to march along the ray and query \(F_\Theta\)

  3. (10 points) After having walked along the ray and queried \(F_\Theta\) at each point in render, we will estimate the pixel’s color, represented as rgb_map. We will also compute, depth_map, which indicates the depth of the nearest surface at this pixel.

  4. (10 points) Please implement part of the train(model, optimizer, n_iters) function. In the training loop, the model is trained to fit one image randomly picked from the dataset at each iteration. You need to tune the near and far point parameter in get rays to make maximize the clarity of the RGB prediction image

  5. (5 points) Please include the best picture (after parameter tuning) of your RGB prediction, depth prediction, and groud truth figure for different view points.

We can now render the NeRF from different viewpoints. The predicted image should be pretty similar to the ground truth but it may have less clarity.

Tasks Checklist

This section is meant to help you keep track of the many things that go in the report:

Canvas Submission Checklist

In the zip file you submit to Canvas, the directory named after your uniqname should include the following files: