This project started with a simple question: if you know a few GPS points in a webcam image, how well can you recover the camera's position? Camera calibration estimates a camera's intrinsic parameters, like focal length, and extrinsic parameters, like position and orientation in 3D space. The math works on paper. We wanted to know how much it wobbles on real webcam data.

The original calibration code

The initial code (provided by our professor) performs a straightforward camera calibration using 6 known points with their GPS coordinates (latitude, longitude, altitude) and their corresponding pixel locations (x, y) in an image. The goal is to estimate the camera's intrinsic matrix K (containing focal length and principal point) and the camera's 3D location in GPS coordinates.

The calibration process uses OpenCV's calibrateCamera function, which minimizes the reprojection error between the observed pixel locations and where the 3D points would project given the estimated camera parameters. The original code calibrated the camera with a reprojection error of about 39 pixels.

Key outputs from the original calibration:

Focal length: ~1,328 pixels
Camera location: approximately 43.07°N, -89.41°W, at ~323m altitude
Horizontal FOV: 88.62°, Vertical FOV: 72.42°

Part (a): leave-one-out analysis

To see how much one point could sway the result, we used leave-one-out cross-validation. Remove one point, recalibrate with the remaining 5, repeat until every point has had a turn being left out.

Results

The leave-one-out analysis revealed good stability in the calibration:

Camera Location Variability:
- Latitude std: 0.0001833° (~20 meters)
- Longitude std: 0.0003581° (~27 meters)
- Altitude std: 7.41 meters
Focal Length Variability:
- Mean: 1,425.63 pixels
- Std: 250.89 pixels

All 6 estimated camera locations form a tight cluster in 3D space. The altitude distribution shows 6 distinct values ranging from ~312m to ~334m, while the focal length histogram reveals a bimodal distribution with most estimates clustering around 1,250 and 1,450 pixels, with one outlier at ~1,934 pixels.

The relatively small spread suggests the calibration is not overly dependent on any single point. No single point is an obvious outlier causing instability.

Part (b): noise addition analysis

To simulate real-world measurement uncertainty, we added Gaussian noise to our data and ran 100 calibration trials:

3D coordinate noise: Gaussian with 0 mean, 1-meter std (simulating GPS/altitude error)
2D pixel coordinate noise: Gaussian with 0 mean, 1-pixel std (simulating image measurement error)

Results

The noise addition analysis revealed extreme sensitivity to measurement errors:

Camera Location Variability:
- Latitude std: 0.0123° (~1,370 meters) — 67× larger than leave-one-out
- Longitude std: 0.5564° (~42,500 meters) — 1,555× larger
- Altitude std: 13,512 meters — 1,824× larger
Focal Length Variability:
- Mean: 71,117 pixels, Median: 1,342 pixels
- Std: 304,675 pixels — 1,214× larger than leave-one-out

While most camera locations still cluster around the original estimate (~43.07°N, -89.41°W), several outliers show dramatically different estimates. Some have altitudes of -60,000m or +60,000m. The focal length histogram is even more revealing: ~95 of 100 trials produced reasonable focal lengths, but a handful produced absurdly large values (up to 1.5 million pixels), completely skewing the mean.

This was the uncomfortable part: with only 6 points, small measurement errors can occasionally lead to catastrophically wrong solutions. The optimization can converge to local minima that look valid geometrically but make no physical sense.

Looking at the combined comparison:

Leave-one-out (blue) shows a tight, well-behaved cluster. Calibration is structurally sound when data quality is good.
Noise addition (red) shows both the tight cluster and extreme outliers.

Part 2: taking it into the real world

Part 2 is all about the real world. We needed to understand how to map a 3D GPS coordinate to a 2D pixel in a static webcam feed. It sounds like sci-fi, but it's a classic computer vision problem. We wanted to experiment with taking that static image and building a model that could translate any 3D GPS coordinate into a 2D pixel.

We also made one very basic mistake that almost derailed the whole thing.

Image one: Northeastern University webcam

For our first image, we selected the Northeastern University webcam in Boston.

Initially it was very hard to figure out exactly where the image was from. After digging into it and decoding some URLs, we found the source. This camera is part of a weather camera system. We were able to see the exact data it records:

It also had a direct link to the camera's coordinates, elevation, and heading:

Camera details we uncovered:

Camera Coordinates: 42°20'09.2"N 71°05'17.8"W
Decimal: 42.33587843449229, -71.08826335575469

The next step was identifying 5–10 immovable landmarks to use as calibration points. Using a small script, we loaded the image, tagged 10 points, and extracted their 2D pixel coordinates.

We then found the real-world latitude, longitude, and altitude of each point using Google Earth, Google Maps, and Apple Maps. Working together, we found the most accurate representations possible.

img_x	img_y	label	map_lat	map_lng	map_alt
306	855	p1	42.347251	-71.088745	98
725	1045	p2	42.336671	-71.087771	43
678	779	p3	42.345504	-71.084017	229
735	808	p4	42.346976	-71.082614	226
827	867	p5	42.346431	-71.081467	160
1039	859	p6	42.3492	-71.075302	235
1378	847	p7	42.337178	-71.085517	75
1435	937	p8	42.336955	-71.085676	50
1769	1022	p9	42.336214	-71.085598	23
1387	1065	p10	42.336888	-71.085971	23

Our first attempt at calibration was a total disaster. We fed 10 points in and the results came back completely off. All test points were projecting onto wrong pixels. After investigating, we found the problem: the calibration function expected a 1920×1080 snapshot, but we had used a 960×540 image to get the 2D points. The resolution mismatch threw off the entire calibration.

We fixed the issue by retagging the image at full resolution, and the points lined up perfectly:

Fixed calibration: retagged points at 1920x1080

Reference Origin

To establish a local East-North-Up (ENU) coordinate system, we set the origin near the camera:

Latitude: 42.3359°N
Longitude: 71.0883°W
Altitude: 44 meters

Initial camera calibration results

Using OpenCV's calibrateCamera with all 10 calibration points:

Intrinsic Matrix (K):

[[1075.35    0.00   960.00]
 [   0.00 1075.35   540.00]
 [   0.00    0.00     1.00]]

Key parameters:

Focal length: 1,075.35 pixels
Horizontal FOV: 83.51°, Vertical FOV: 53.33°
Reprojection error: 48.35 pixels

Original vs. reprojected calibration points

The reprojection error of 48.35 pixels is reasonable given the challenges of precisely identifying ground truth locations in an urban environment with varying building heights and perspectives.

Part 2a: Leave-One-Out Uncertainty Analysis

We performed leave-one-out cross-validation: remove one point, recalibrate with the remaining 9, record camera location and focal length. Repeat for all 10 points.

Success rate: All 10 calibrations succeeded.

Camera Location Point Cloud (LOO Analysis) — 2D lat/lon scatter, altitude histogram, focal length distribution

Camera Location Statistics (GPS):

Mean position: Lat = 42.33525°N, Lon = 71.08857°W, Alt = 42.36m
Std: Lat = 0.000197° (~22m), Lon = 0.000280° (~21m), Alt = 22.21m

Focal Length Statistics:

Mean: 1,082.30 pixels
Std: 102.07 pixels (9.4% of mean)

3D visualization of LOO camera locations showing moderate clustering

Individual LOO Results

Point Removed	Camera Altitude	Focal Length	Reprojection Error
p1	47.75 m	1363.69 px	30.00 px
p2	27.92 m	1062.24 px	27.72 px
p3	52.14 m	1071.43 px	49.34 px
p4	52.40 m	1079.17 px	49.97 px
p5	-20.48 m	1065.29 px	116.50 px
p6	55.13 m	1099.74 px	48.61 px
p7	52.88 m	1023.94 px	46.84 px
p8	52.46 m	1041.25 px	47.18 px
p9	51.58 m	945.72 px	30.52 px
p10	51.87 m	1070.49 px	49.91 px

Removing point 5 produces an outlier result: negative altitude (-20.48m, physically impossible for a rooftop camera) and the highest reprojection error (116.50 pixels). That makes p5 either critical or problematic.

The LOO analysis shows good overall stability with one exception. Nine out of ten results have altitudes between 28–56 meters (realistic for an urban rooftop). The 22-meter lat/lon std indicates acceptable position stability.

Part 2b: Monte Carlo Noise Sensitivity Analysis

We ran 100 calibration trials with Gaussian noise added to all points:

3D coordinate noise: σ = 1 meter (simulating GPS/altitude uncertainty)
2D pixel noise: σ = 1 pixel (simulating click/identification uncertainty)

Success rate: 100/100 trials succeeded with physically reasonable solutions.

Camera Location Statistics:

Mean position: Lat = 42.33525°N, Lon = 71.08857°W, Alt = 42.36m
Std: Lat = 0.000024° (~2.7m), Lon = 0.000023° (~1.7m), Alt = 1.12m

Focal length histogram showing tight Monte Carlo distribution

Focal Length: Mean = 1,075.35 px, Std = 8.60 px (0.8%)

3D visualization showing extremely tight clustering of Monte Carlo camera locations

The NEU webcam excels in Monte Carlo analysis because 10 well-distributed urban points provide strong geometric constraints. Buildings and rooftops offer precise, stable landmarks. The good altitude variation (23m to 235m) constrains camera height well. Even with noise added to all points, the overdetermined system finds the correct solution reliably.

Comparison: Part 2a vs. Part 2b

Metric	Leave-One-Out	Monte Carlo	Ratio
Std Latitude	0.000197° (~22m)	0.000024° (~2.7m)	8.2×
Std Longitude	0.000280° (~21m)	0.000023° (~1.7m)	12.3×
Std Altitude	22.21 m	1.12 m	19.8×
Focal Length std	102.07 px	8.60 px	11.9×
Mean Altitude	42.36 m	42.36 m	Match
Outliers	1 (p5: -20.48m)	0	—

Combined comparison: LOO (blue) with one outlier vs. Monte Carlo (red) with tight clustering

Leave-one-out shows 8–20× more variation than Monte Carlo. This makes sense: removing a point changes the geometric configuration entirely, while adding noise to all points still preserves the overall structure. Both analyses agree on the camera location (~42.36m altitude), giving us confidence in the result.

Conclusions for image one

We successfully calibrated the NEU webcam with excellent real-world robustness:

Focal length: 1,075.35 pixels, σ = 8.6 px (only 0.8%)
Camera location: 42.33525°N, 71.08857°W, 42.36m altitude
Position uncertainty: ~2 meters horizontal, ~1 meter vertical
Success rate: 100/100 under 1m/1px noise

Why does removing point 5 cause failure? When p5 is removed, the camera gets placed at -20.48m with 116.50 px reprojection error. Point 5 sits at 160m altitude — right in the middle of our range (23m to 235m). It likely provides critical altitude triangulation. Without it, the optimizer falls into a wrong but locally optimal solution. The fact that Monte Carlo succeeds 100/100 even with noise added to p5 proves the point isn't fundamentally wrong. It's a critical but marginal anchor.

Image two: Boston Harbor Islands webcam

For the second image, we chose something completely different. We moved from an urban environment to a remote island on the outskirts of Boston, using the National Park Service's webcam overlooking the harbor.

We located the public record on the NPS website and found a Google Street View of the island itself. Since the camera is part of a lighthouse, we figured out the lighthouse height and made adjustments based on the camera's relative position.

Flat lighthouse in Google Earth 3D View

One complication: Google Earth's 3D satellite imagery for the area hasn't been updated since 2022. We could see two distinct islands from the camera's view, but the satellite shows them connected. This meant guessing some points in an already structurally difficult environment.

Point Selection in a Marine Environment

Calibrating a camera over open water is much messier than calibrating a city webcam. We selected 9 distinct landmarks: coastal features, islands, and man-made structures.

Unlike urban environments with well-defined building corners, identifying precise points on natural coastal features and distant islands proved difficult. Tidal variations, wave action, and atmospheric haze all contribute to measurement uncertainty.

img_x	img_y	label	map_lat	map_lng	map_alt
891	1980	p1	42.327871	-70.890574	5
344	1356	p2	42.32737	-70.891641	2
1724	1263	p3	42.328117	-70.891248	4
1315	1239	p4	42.327967	-70.892254	0.21
2744	998	p5	42.332095	-70.895926	3
2086	912	p6	42.346274	-70.954216	44
724	941	p7	42.327193	-70.924872	3
2717	1925	p8	42.32821	-70.890555	6
3646	971	p9	42.334612	-70.894444	22

Reference origin (ENU): 42.3279°N, 70.8901°W, 23m altitude

Initial Calibration Results

Using all 9 calibration points:

Focal length: 1,824.61 pixels
Horizontal FOV: 92.92°, Vertical FOV: 61.24°
Reprojection error: 204.01 pixels

The 204-pixel reprojection error is much higher than a typical calibration. That tracks with the real problem here: natural coastal landmarks are hard to identify precisely.

Original vs. reprojected calibration points for harbor webcam

Part 2a: Leave-One-Out Analysis

Success rate: All 9 LOO calibrations succeeded.

Camera Location Point Cloud (LOO Analysis) — harbor webcam

Camera Location Statistics:

Mean position: Lat = 42.32790°N, Lon = 70.89011°W, Alt = 14.04m
Std: Lat = 0.000650° (~72m), Lon = 0.001802° (~137m), Alt = 46.32m

Focal Length: Mean = 1,607.48 px, Std = 332.31 px

Point 2 stands out as a critical outlier. Removing it causes the altitude estimate to go to -116.54m (well below sea level). This is consistent with the challenges we noted in identifying precise coastal landmarks.

Part 2b: Monte Carlo Analysis

Despite the high reprojection error, Monte Carlo performs well:

Success rate: 100/100 trials with physically reasonable solutions

Camera Location Statistics:

Mean position: Lat = 42.32790°N, Lon = 70.89011°W, Alt = 28.25m
Std: Lat = 0.000011° (~1.2m), Lon = 0.000024° (~1.7m), Alt = 1.23m

3D visualization of LOO camera locations — harbor webcam

Focal Length: Mean = 1,826.07 px, Std = 27.72 px

When all 9 points are used together, even with noise, the consensus overwhelms individual measurement errors. Small uniform noise doesn't drastically change the geometric configuration.

Focal length histogram showing tight distribution — harbor webcam Monte Carlo

Comparison: Part 2a vs. Part 2b

Metric	Leave-One-Out	Monte Carlo	Ratio
Std Latitude	0.000650° (~72m)	0.000011° (~1.2m)	59.1×
Std Longitude	0.001802° (~137m)	0.000024° (~1.7m)	75.1×
Std Altitude	46.32 m	1.23 m	37.7×
Focal Length std	332.31 px	27.72 px	12.0×
Mean Altitude	14.04 m (outlier affected)	28.35 m (realistic)	—

Camera Location Point Cloud (Monte Carlo) — harbor webcam showing very tight clustering

Combined comparison: LOO (blue) with wide scatter vs. Monte Carlo (red) with tight clustering

The extreme LOO/Monte Carlo disparity (58–115× more variation) shows high sensitivity to geometric configuration changes. The system is fragile to geometric changes (removing points changes the configuration drastically) but robust to measurement noise (small uniform perturbations don't break the consensus).

Final comparison: urban vs. marine

Metric	NU Webcam (Urban)	Harbor Webcam (Marine)	Winner
LOO Lat Std	0.000197°	0.000650°	NU (3.3× better)
LOO Alt Std	22.21 m	46.32 m	NU (2.1× better)
MC Lat Std	0.000020°	0.000011°	Harbor (1.8× better)
MC Alt Std	1.12 m	1.23 m	NU (slightly better)
MC Focal Std	6.19 px	27.72 px	NU (4.5× better)
LOO/MC Ratio	10–20×	37–75×	NU (much better)

Urban environments win for geometric stability. Well-defined building corners provide more reliable landmark identification than natural coastal features. The NU webcam shows better overall stability across 4 of 6 metrics.

Both achieve ~1 meter altitude precision under realistic noise. Despite the NU webcam's advantages in LOO stability, both calibrations are robust to measurement uncertainty when all points are present. Monte Carlo alt std is 1.12m (NU) vs. 1.23m (Harbor).

The LOO/MC ratio tells the real story. NU's 10–20× ratio vs. Harbor's 37–75× indicates that marine calibrations are much more fragile to changes in geometric configuration. Coastal features are harder to pin down, and the calibration is correspondingly more dependent on any single point being present and accurate.

The lesson: camera calibration from webcam feeds is surprisingly achievable with enough carefully chosen landmarks, even without access to the camera's manufacturer specs. But the environment matters a lot. When you can't control your landmarks, Monte Carlo analysis tells you how much to trust your estimates.