Semantics-Guided Zero-shot Metric Depth
Integrated vision foundation models with state-of-the-art estimators to achieve scale-accurate quantitative outputs for metrology. Designed semantic-guided geometric scaling for unconstrained environments.
Computer Vision Researcher & Engineer
York University
I am a Computer Vision Researcher & Engineer at Elder Lab, York University, specializing in 3D scene understanding and metric depth estimation under the supervision of Prof. James Elder. My work bridges the gap between mathematical theory and production, integrating geometric reasoning with robust deep learning pipelines.
Prior to this, I completed my B.Sc. in Electrical Engineering at the University of Tehran (2021), where I conducted research on single-view 3D reconstruction of symmetrical objects at the Computational Audio-Vision Lab with Prof. Reshad Hosseini.
View my CV here. Have questions about my background or research? Chat with my AI Assistant
My research explores the frontier of 3D scene understanding, blending classical geometric constraints with modern generative approaches. I focus on developing Semantics-Guided Zero-Shot Perception systems that leverage vision foundation models to recover scale-accurate metric depth from single images. Simultaneously, I am investigating the synergy between Gaussian Splatting and Diffusion Models for high-fidelity single-view reconstruction and real-time Dense Neural SLAM, aiming to solve scale ambiguity and improve mapping accuracy in autonomous environments.
A showcase of my engineering and applied research work.
Integrated vision foundation models with state-of-the-art estimators to achieve scale-accurate quantitative outputs for metrology. Designed semantic-guided geometric scaling for unconstrained environments.
Formulated a geometric reconstruction algorithm for symmetric objects from single images. Benchmarked against NeRF and Gaussian Splatting methods (One-2-3-45, DreamGaussian) on Pix3D.
Engineered a complete pipeline for football player detection. Fine-tuned YOLOv11, containerized with Docker, automated CI/CD via GitHub Actions, and deployed on AWS.
Built a CNN–Transformer framework for 3D orientation estimation and deterministic fiber tracking, applying transferable geometric modeling techniques for robust quantitative analysis.
Selected peer-reviewed publications.
Spencer, et al., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
S.M.H. Hosseini, S.M. Nasiri, R. Hosseini, H. Moradi, 2022
S.M.H. Hosseini, M. Hassanpour, S. Masoudnia, S. Iraji, S. Raminfard, M. Nazem-Zadeh, Neuroscience Informatics, 2022.