Skip to main content

Search

Search pages, projects, and writing

2D to 3D Camera Calibration

How do you translate a GPS coordinate into a pixel on a webcam image? We explored camera calibration uncertainty using leave-one-out and Monte Carlo analysis on two real-world webcams — one urban, one on a remote harbor island.

Published

November 15, 2025

14 min read

Topics

Computer VisionOpenCVCamera CalibrationResearch
2D to 3D Camera Calibration

In this project, we explore the robustness and reliability of camera calibration by performing uncertainty analysis on real-world data. Camera calibration is the process of determining a camera's intrinsic parameters (like focal length) and extrinsic parameters (position and orientation in 3D space) using known reference points. But how reliable are these estimates? That's what we set out to discover.

The Original Calibration Code

The initial code (provided by our professor) performs a straightforward camera calibration using 6 known points with their GPS coordinates (latitude, longitude, altitude) and their corresponding pixel locations (x, y) in an image. The goal is to estimate the camera's intrinsic matrix K (containing focal length and principal point) and the camera's 3D location in GPS coordinates.

The calibration process uses OpenCV's calibrateCamera function, which minimizes the reprojection error between the observed pixel locations and where the 3D points would project given the estimated camera parameters. The original code successfully calibrated the camera with a reprojection error of approximately 39 pixels.

Key outputs from the original calibration:

  • Focal length: ~1,328 pixels
  • Camera location: approximately 43.07°N, -89.41°W, at ~323m altitude
  • Horizontal FOV: 88.62°, Vertical FOV: 72.42°

Part (a): Leave-One-Out Analysis

To test the robustness of our calibration to individual points, we implemented leave-one-out cross-validation. This technique systematically removes one point at a time and recalibrates the camera using the remaining 5 points, generating 6 different camera location estimates.

Results

The leave-one-out analysis revealed good stability in the calibration:

  • Camera Location Variability:
    • Latitude std: 0.0001833° (~20 meters)
    • Longitude std: 0.0003581° (~27 meters)
    • Altitude std: 7.41 meters
  • Focal Length Variability:
    • Mean: 1,425.63 pixels
    • Std: 250.89 pixels

All 6 estimated camera locations form a tight cluster in 3D space. The altitude distribution shows 6 distinct values ranging from ~312m to ~334m, while the focal length histogram reveals a bimodal distribution with most estimates clustering around 1,250 and 1,450 pixels, with one outlier at ~1,934 pixels.

The relatively small spread suggests the calibration is not overly dependent on any single point — no single point is an obvious outlier causing instability.

Part (b): Noise Addition Analysis

To simulate real-world measurement uncertainty, we added Gaussian noise to our data and ran 100 calibration trials:

  • 3D coordinate noise: Gaussian with 0 mean, 1-meter std (simulating GPS/altitude error)
  • 2D pixel coordinate noise: Gaussian with 0 mean, 1-pixel std (simulating image measurement error)

Results

The noise addition analysis revealed extreme sensitivity to measurement errors:

  • Camera Location Variability:
    • Latitude std: 0.0123° (~1,370 meters) — 67× larger than leave-one-out
    • Longitude std: 0.5564° (~42,500 meters) — 1,555× larger
    • Altitude std: 13,512 meters — 1,824× larger
  • Focal Length Variability:
    • Mean: 71,117 pixels, Median: 1,342 pixels
    • Std: 304,675 pixels — 1,214× larger than leave-one-out

While most camera locations still cluster around the original estimate (~43.07°N, -89.41°W), several outliers show dramatically different estimates — some with altitudes of -60,000m or +60,000m. The focal length histogram is even more revealing: ~95 of 100 trials produced reasonable focal lengths, but a handful produced absurdly large values (up to 1.5 million pixels), completely skewing the mean.

This reveals a critical insight: with only 6 points, small measurement errors can occasionally lead to catastrophically wrong solutions. The optimization can converge to local minima that produce geometrically valid but physically unrealistic results.

Looking at the combined comparison:

  1. Leave-one-out (blue) shows a tight, well-behaved cluster — calibration is structurally sound when data quality is good.
  2. Noise addition (red) shows both the tight cluster and extreme outliers.

Part 2: Taking from the Real World

Part 2 is all about the real world. We needed to understand how to map a 3D GPS coordinate to a 2D pixel in a static webcam feed. It sounds like sci-fi, but it's a classic computer vision problem. We wanted to experiment with taking that static image and building a model that could translate any 3D GPS coordinate into a 2D pixel.

This was a journey, and we learned a ton — including the major mistake that almost derailed the whole thing.

Image One: Northeastern University Webcam

For our first image, we selected the Northeastern University webcam in Boston.

Initially it was very hard to figure out exactly where the image was from. After digging into it and decoding some URLs, we found the source. This camera is part of a weather camera system. We were able to see the exact data it records:

It also had a direct link to the camera's coordinates, elevation, and heading:

Camera details we uncovered:

  • Camera Coordinates: 42°20'09.2"N 71°05'17.8"W
  • Decimal: 42.33587843449229, -71.08826335575469

The next step was identifying 5–10 immovable landmarks to use as calibration points. Using a small script, we loaded the image, tagged 10 points, and extracted their 2D pixel coordinates.

We then found the real-world latitude, longitude, and altitude of each point using Google Earth, Google Maps, and Apple Maps. Working together, we found the most accurate representations possible.

img_ximg_ylabelmap_latmap_lngmap_alt
306855p142.347251-71.08874598
7251045p242.336671-71.08777143
678779p342.345504-71.084017229
735808p442.346976-71.082614226
827867p542.346431-71.081467160
1039859p642.3492-71.075302235
1378847p742.337178-71.08551775
1435937p842.336955-71.08567650
17691022p942.336214-71.08559823
13871065p1042.336888-71.08597123

Our first attempt at calibration was a total disaster. We fed 10 points in and the results came back completely off — all test points were projecting onto wrong pixels. After investigating, we found the problem: the calibration function expected a 1920×1080 snapshot, but we had used a 960×540 image to get the 2D points. The resolution mismatch threw off the entire calibration.

We fixed the issue by retagging the image at full resolution, and the points lined up perfectly:

Fixed calibration: retagged points at 1920x1080

Reference Origin

To establish a local East-North-Up (ENU) coordinate system, we set the origin near the camera:

  • Latitude: 42.3359°N
  • Longitude: 71.0883°W
  • Altitude: 44 meters

Initial Camera Calibration Results

Using OpenCV's calibrateCamera with all 10 calibration points:

Intrinsic Matrix (K):

[[1075.35    0.00   960.00]
 [   0.00 1075.35   540.00]
 [   0.00    0.00     1.00]]

Key parameters:

  • Focal length: 1,075.35 pixels
  • Horizontal FOV: 83.51°, Vertical FOV: 53.33°
  • Reprojection error: 48.35 pixels

Original vs. reprojected calibration points

The reprojection error of 48.35 pixels is reasonable given the challenges of precisely identifying ground truth locations in an urban environment with varying building heights and perspectives.

Part 2a: Leave-One-Out Uncertainty Analysis

We performed leave-one-out cross-validation: remove one point, recalibrate with the remaining 9, record camera location and focal length. Repeat for all 10 points.

Success rate: All 10 calibrations succeeded.

Camera Location Point Cloud (LOO Analysis) — 2D lat/lon scatter, altitude histogram, focal length distribution

Camera Location Statistics (GPS):

  • Mean position: Lat = 42.33525°N, Lon = 71.08857°W, Alt = 42.36m
  • Std: Lat = 0.000197° (~22m), Lon = 0.000280° (~21m), Alt = 22.21m

Focal Length Statistics:

  • Mean: 1,082.30 pixels
  • Std: 102.07 pixels (9.4% of mean)

3D visualization of LOO camera locations showing moderate clustering

Individual LOO Results

Point RemovedCamera AltitudeFocal LengthReprojection Error
p147.75 m1363.69 px30.00 px
p227.92 m1062.24 px27.72 px
p352.14 m1071.43 px49.34 px
p452.40 m1079.17 px49.97 px
p5-20.48 m1065.29 px116.50 px
p655.13 m1099.74 px48.61 px
p752.88 m1023.94 px46.84 px
p852.46 m1041.25 px47.18 px
p951.58 m945.72 px30.52 px
p1051.87 m1070.49 px49.91 px

Key observation: Removing point 5 produces an outlier result — negative altitude (-20.48m, physically impossible for a rooftop camera) and the highest reprojection error (116.50 pixels). This flags p5 as either critical or problematic.

The LOO analysis shows good overall stability with one exception. Nine out of ten results have altitudes between 28–56 meters (realistic for an urban rooftop). The 22-meter lat/lon std indicates acceptable position stability.

Part 2b: Monte Carlo Noise Sensitivity Analysis

We ran 100 calibration trials with Gaussian noise added to all points:

  • 3D coordinate noise: σ = 1 meter (simulating GPS/altitude uncertainty)
  • 2D pixel noise: σ = 1 pixel (simulating click/identification uncertainty)

Success rate: 100/100 trials succeeded with physically reasonable solutions.

Camera Location Statistics:

  • Mean position: Lat = 42.33525°N, Lon = 71.08857°W, Alt = 42.36m
  • Std: Lat = 0.000024° (~2.7m), Lon = 0.000023° (~1.7m), Alt = 1.12m

Focal length histogram showing tight Monte Carlo distribution

Focal Length: Mean = 1,075.35 px, Std = 8.60 px (0.8%)

3D visualization showing extremely tight clustering of Monte Carlo camera locations

The NEU webcam excels in Monte Carlo analysis because 10 well-distributed urban points provide strong geometric constraints. Buildings and rooftops offer precise, stable landmarks. The good altitude variation (23m to 235m) constrains camera height well. Even with noise added to all points, the overdetermined system finds the correct solution reliably.

Comparison: Part 2a vs. Part 2b

MetricLeave-One-OutMonte CarloRatio
Std Latitude0.000197° (~22m)0.000024° (~2.7m)8.2×
Std Longitude0.000280° (~21m)0.000023° (~1.7m)12.3×
Std Altitude22.21 m1.12 m19.8×
Focal Length std102.07 px8.60 px11.9×
Mean Altitude42.36 m42.36 mMatch
Outliers1 (p5: -20.48m)0

Combined comparison: LOO (blue) with one outlier vs. Monte Carlo (red) with tight clustering

Leave-one-out shows 8–20× more variation than Monte Carlo. This makes sense: removing a point changes the geometric configuration entirely, while adding noise to all points still preserves the overall structure. Both analyses agree on the camera location (~42.36m altitude), giving us confidence in the result.

Conclusions for Image One

We successfully calibrated the NEU webcam with excellent real-world robustness:

  • Focal length: 1,075.35 pixels, σ = 8.6 px (only 0.8%)
  • Camera location: 42.33525°N, 71.08857°W, 42.36m altitude
  • Position uncertainty: ~2 meters horizontal, ~1 meter vertical
  • Success rate: 100/100 under 1m/1px noise

Why does removing point 5 cause failure? When p5 is removed, the camera gets placed at -20.48m with 116.50 px reprojection error. Point 5 sits at 160m altitude — right in the middle of our range (23m to 235m). It likely provides critical altitude triangulation. Without it, the optimizer falls into a wrong but locally optimal solution. The fact that Monte Carlo succeeds 100/100 even with noise added to p5 proves the point isn't fundamentally wrong — it's a critical but marginal anchor.

Image Two: Boston Harbor Islands Webcam

For the second image, we chose something completely different. We moved from an urban environment to a remote island on the outskirts of Boston, using the National Park Service's webcam overlooking the harbor.

We located the public record on the NPS website and found a Google Street View of the island itself. Since the camera is part of a lighthouse, we figured out the lighthouse height and made adjustments based on the camera's relative position.

Flat lighthouse in Google Earth 3D View

One complication: Google Earth's 3D satellite imagery for the area hasn't been updated since 2022. We could see two distinct islands from the camera's view, but the satellite shows them connected. This meant guessing some points in an already structurally difficult environment.

Point Selection in a Marine Environment

Calibrating a camera overlooking open water presents unique challenges compared to urban environments. We carefully selected 9 distinct landmarks — coastal features, islands, and man-made structures.

Unlike urban environments with well-defined building corners, identifying precise points on natural coastal features and distant islands proved difficult. Tidal variations, wave action, and atmospheric haze all contribute to measurement uncertainty.

img_ximg_ylabelmap_latmap_lngmap_alt
8911980p142.327871-70.8905745
3441356p242.32737-70.8916412
17241263p342.328117-70.8912484
13151239p442.327967-70.8922540.21
2744998p542.332095-70.8959263
2086912p642.346274-70.95421644
724941p742.327193-70.9248723
27171925p842.32821-70.8905556
3646971p942.334612-70.89444422

Reference origin (ENU): 42.3279°N, 70.8901°W, 23m altitude

Initial Calibration Results

Using all 9 calibration points:

  • Focal length: 1,824.61 pixels
  • Horizontal FOV: 92.92°, Vertical FOV: 61.24°
  • Reprojection error: 204.01 pixels

The 204-pixel reprojection error is significantly higher than typical calibrations — a direct consequence of the difficulty in precisely identifying natural coastal landmarks.

Original vs. reprojected calibration points for harbor webcam

Part 2a: Leave-One-Out Analysis

Success rate: All 9 LOO calibrations succeeded.

Camera Location Point Cloud (LOO Analysis) — harbor webcam

Camera Location Statistics:

  • Mean position: Lat = 42.32790°N, Lon = 70.89011°W, Alt = 14.04m
  • Std: Lat = 0.000650° (~72m), Lon = 0.001802° (~137m), Alt = 46.32m

Focal Length: Mean = 1,607.48 px, Std = 332.31 px

Point 2 stands out as a critical outlier — removing it causes the altitude estimate to go to -116.54m (well below sea level). This is consistent with the challenges we noted in identifying precise coastal landmarks.

Part 2b: Monte Carlo Analysis

Despite the high reprojection error, Monte Carlo performs well:

Success rate: 100/100 trials with physically reasonable solutions

Camera Location Statistics:

  • Mean position: Lat = 42.32790°N, Lon = 70.89011°W, Alt = 28.25m
  • Std: Lat = 0.000011° (~1.2m), Lon = 0.000024° (~1.7m), Alt = 1.23m

3D visualization of LOO camera locations — harbor webcam

Focal Length: Mean = 1,826.07 px, Std = 27.72 px

When all 9 points are used together, even with noise, the consensus overwhelms individual measurement errors. Small uniform noise doesn't drastically change the geometric configuration.

Focal length histogram showing tight distribution — harbor webcam Monte Carlo

Comparison: Part 2a vs. Part 2b

MetricLeave-One-OutMonte CarloRatio
Std Latitude0.000650° (~72m)0.000011° (~1.2m)59.1×
Std Longitude0.001802° (~137m)0.000024° (~1.7m)75.1×
Std Altitude46.32 m1.23 m37.7×
Focal Length std332.31 px27.72 px12.0×
Mean Altitude14.04 m (outlier affected)28.35 m (realistic)

Camera Location Point Cloud (Monte Carlo) — harbor webcam showing very tight clustering

Combined comparison: LOO (blue) with wide scatter vs. Monte Carlo (red) with tight clustering

The extreme LOO/Monte Carlo disparity (58–115× more variation) reveals high sensitivity to geometric configuration changes. The system is fragile to geometric changes (removing points changes the configuration drastically) but robust to measurement noise (small uniform perturbations don't break the consensus).

Final Comparison: Urban vs. Marine

MetricNU Webcam (Urban)Harbor Webcam (Marine)Winner
LOO Lat Std0.000197°0.000650°NU (3.3× better)
LOO Alt Std22.21 m46.32 mNU (2.1× better)
MC Lat Std0.000020°0.000011°Harbor (1.8× better)
MC Alt Std1.12 m1.23 mNU (slightly better)
MC Focal Std6.19 px27.72 pxNU (4.5× better)
LOO/MC Ratio10–20×37–75×NU (much better)

Urban environments win for geometric stability. Well-defined building corners provide more reliable landmark identification than natural coastal features. The NU webcam shows better overall stability across 4 of 6 metrics.

Both achieve ~1 meter altitude precision under realistic noise. Despite the NU webcam's advantages in LOO stability, both calibrations are robust to measurement uncertainty when all points are present — Monte Carlo alt std is 1.12m (NU) vs. 1.23m (Harbor).

The LOO/MC ratio tells the real story. NU's 10–20× ratio vs. Harbor's 37–75× indicates that marine calibrations are much more fragile to changes in geometric configuration. Coastal features are harder to pin down, and the calibration is correspondingly more dependent on any single point being present and accurate.

The key lesson: Camera calibration from webcam feeds is surprisingly achievable with enough carefully chosen landmarks, even without direct access to the camera's manufacturer specs. But the choice of environment — and the quality of point identification — dramatically affects reliability. When you can't control your landmarks, Monte Carlo analysis tells you how much to trust your estimates.