Here's how Qualcomm plans to change mobile photography forever
Cameras are easily among the most important factors in a smartphone today, and can often be the tiebreaker when two phones are otherwise evenly matched. I certainly don't consider myself a professional photographer, but I value having the necessary tools to capture great-looking content at any given moment, even when I don't feel like packing and carrying my dedicated mirrorless camera.
That's why I was excited to sit in on a video call with Judd Heape, senior director of product management for Camera and Computer Vision at Qualcomm, who went in great detail answering my questions about Qualcomm's vision for future advancements in both photography and videography on smartphones.
Smartphone photography has really come a long way in the last few years, but video hasn't necessarily made as many strides. We have 8K video, for example, which is great, but the bitrate is still fairly limited, of course, because we're working with limited hardware. Is there any kind of big step forward we can expect on the video side in particular?
Heape: Video is a lot more challenging because of the data rates and the resolutions that are going through the ISP (image signal processor). For Snapshot, of course, you can always process things in the background or have a little bit of latency until the photo appears in the gallery, but with video you've got really strict timing deadlines that you have to meet for every single frame. It's important that video is done in hardware, and that it's power-optimized so the phone doesn't melt in your hand when you're trying to shoot 8K30 video, for example.
I've spoken previously about the merging of three cameras, where you can seamlessly zoom from ultra-wide to wide to telephoto. That will be improved to be much smoother and easier to control in future revisions. Of course, we also want to do a lot of work to improve the HDR experience in video so that the whole system can utilize the innovations that are coming out in image sensors to do quad CFA-based (color filter array) HDR, staggered HDR, multi-frame HDR ... those are really nice features that not only affect the quality of Snapshots, but also the video recorded stream and the viewfinder.
In terms of pixel processing, we're going to be devoting more hardware to video in the future, as well. In the past, we've done things like local motion compensation so that you can handle noise, not just with panning and global moving objects, but also with objects that are moving locally within the frame. We're also capitalizing on our depth and motion engines to do things like bokeh in video, which can be done at any resolution, and in the more distant future, we'll be looking at understanding the content within a video and what each pixel is.
I alluded to this before when Morpho was talking about semantic segmentation; when the pixels are understood by the camera, whether it's skin, fabric, grass, sky, etc., these are the types of understandings that help process those different pixels for factors like color, texture, and noise. In the future, we'll be doing this not just for Snapshot, but also for video.
Get the top Black Friday deals right in your inbox: Sign up now!
Receive the hottest deals and product recommendations alongside the biggest tech news from the Android Central team straight to your inbox!
I think the first video bokeh effect I saw was on the LG G8 last year. It may not have been quite to the level of taking a photo in portrait mode, but it was still impressive. Of course, Google's Pixel line is able to do incredible things like semantic segmentation and its various night modes, as well.
We want to move a lot of those features to video; it's the logical next step. But video is already a power problem, especially if you're shooting in, say, 8K30 or 4K120, so adding those features on top of already quite a saturated thermal budget is a challenge, but that's what we're working on in the future.
And on the flip side of that, what kind of advancements is Qualcomm working towards on the photography side of things in terms of features like portrait mode and other types of creative shooting modes?
We're really looking now at expanding our reach for the camera into heterogeneous computing, and making sure that the camera really interfaces and communicates seamlessly with the AI engine on Snapdragon. For photographs, what you'll see us doing more in the future is using AI for things like denoising, and we can get really good performance for getting rid of noise but preserving detail in low light, beyond what you can do with traditional techniques like with standard types of filters that everybody uses.
Another area that I touched on with video is HDR. We'll be using the AI engine along with the captured photographs to pick the best parts of the scene. So a thing we might do in AI to automatically adjust the image is an intelligent retouch, where we're doing content-aware processing for tonal content, shadow content, highlights, and color.
That's something that we think will be really powerful; you won't have to worry about retouching your phones, the AI engine will make sure that they're completely optimized in all of those areas going forward.
A third area that people don't necessarily think about is face detection. In the past, we've used more traditional techniques when the camera is active to detect faces, and it's actually driving how the camera works. When the camera sees that there's a face in the image, it uses that face to manage some of the items in the 3A process. It can use the face to determine if you have the right exposure, or it can use the face to be the automatic point of autofocus.
In the future, I think we'll be using more of deep learning approach, where we can use our AI engine to detect faces more accurately, and with fewer false positives. We'll be able to be a lot more flexible in determining if faces are detectable in different orientations, distances, etc.
On my Sony a7 III, there's a feature that lets you take a photo of a particular face and tell the camera to prioritize that person of interest for things like autofocus, even when other faces are in the shot. Is that something we could potentially see in a phone in the future?
You can do that pretty easily with AI without going deep into security and the things you have to do to recognize faces for things like payments and unlocking your phone. You can basically do this just in-camera, and know if it's face A or face B — not necessarily if it's the face that's supposed to unlock the phone, but just a face of interest. That's all possible, and will be possible with that upgraded engine that we'll be doing for deep learning face detection.
And I just have to ask. Canon's recently announced EOS R5 can obviously shoot 8K video, but more interesting to me is its ability to shoot oversampled 4K in-camera, which condenses information from 8K source footage to achieve sharper 4K video without needing to do it yourself in post and deal with the massive file sizes of 8K. Is that something we might see in phones at some point, or does this call back to limitations regarding heating and bitrates?
That's a good question. That's something our OEMs might do; of course, we offer native modes for shooting in 4K and 8K, but because 8K is also quite power-hungry, it's certainly viable to do either up- or down-conversion. One of the things — maybe the problem in reverse — we're also looking at doing is intelligent upscaling for video.
Today on the photo side, you can use multiple frames to create more pixels and get a more dense resolution image, but the same thing in video is also possible. You can shoot at a lower resolution and use the slight movements in the camera from frame to frame to even upconvert maybe as high as 3x without any perceptible degradation.
I also wanted to ask about the overhead when it comes to Qualcomm supporting so many different kinds of smartphones from different manufacturers, and meeting the various demands of each as companies try to differentiate themselves through unique camera features. Especially now that we're seeing multiple cameras on pretty much every phone, regardless of price — that's gotta be a lot to worry about.
It is! Because the camera is such an important feature, every OEM wants to differentiate on its cameras. So Qualcomm will release the hardware platform and the camera software, which has a plethora of capabilities, but then of course, one OEM wants to be different from another OEM. So they're choosing different lenses, different image sensors, they're arranging the sensors differently on the front and back, they're adding things like depth sensors or laser-assisted focus or macro cameras...
A lot of customers want to differentiate in the software, as well. Maybe they want to do their own algorithm; to do a specific function in the camera on their own; they want to slightly improve on the way that something like Demosaic is done.
So the challenge we have is servicing all of those customizations and differentiations, but we have a really good systems team and customer engineering team whose job 24/7 is to make sure that customers are successful and can integrate their own features.
One thing that really sets Qualcomm apart from other vendors that provide camera IP is that we have a really strong network of third-party providers that we really do nurture, and we want to make sure that when we have a third-party provider that might be working with a customer, we're all working together.
When we engage with an OEM and they're engaged with a third party like Morpho or ArcSoft, the third party is directly in touch with us as well. So if they want to do something with triple cameras or AI, we'll work with that third party to make sure that they have the latest and greatest development platforms, baseline software, and APIs, and that they have the ability to leverage our hardware blocks, both inside and outside of the camera.
Android Central Podcast #452: A Conversation with Qualcomm
Something the third party might do in the CPU, they might find that they can do it with lower power if they leverage some block in our ISP, or in our computer vision — our EVA engine. Maybe if they move the algorithm from CPU to DSP, like the HVX (Hexagon Vector Extensions) engine, they might get better performance and lower power. We're very closely in touch with every ISV (independent software vendor) in our third-party network to make sure that whatever solutions we're coming up with to help the OEM customize are as streamlined and low-power as possible.
Sort of an offshoot of that question, how do you balance Qualcomm's own feature sets and those of a given client? Coming back to Google, I'd love to see the Pixel 4's astrophotography mode come to other phones, but where do you draw the line and leave that sort of development up to the OEM?
It's a constant sort of thing we think about. How do we balance that? How do we let our OEMs and ISVs differentiate, versus what features are we going to come out with as baselines that may go out to everybody, and in turn remove that differentiation from some specific OEMs? I think our driving force is — it's two things. Anything that we feel is going to improve the camera experience and push the whole ecosystem forward, we want to approach that from a user experience perspective across the industry.
So if there's a certain feature that we believe is going to benefit everybody and really push the whole mobile camera system more towards something like a mirrorless camera, we'll integrate that. The other thing we look at is image quality. If it's something that specifically will impact image quality scores from, say, a third-party benchmarking house like DxOMark, for example, we want to have that capability in house. For things like zoom or noise reduction, better detection of faces, segmented processing, HDR, and so on, these are all things that are measured in the industry, so we want to make sure that the offering we provide to all of our customers has those areas as optimized as they can be.
So those are the two driving factors; we don't want to step on the toes of our customers and our third-party network who might be wanting to innovate, but on the other hand, if it really pushes the whole ecosystem forward or if it impacts something like a DxOMark score, we really want to try to offer that to everybody to move everything forward.
You mentioned earlier how Qualcomm is looking to improve the seamless transition between lenses as you zoom in and out. I just did a retrospective review of last year's Galaxy Note 10, and I was still impressed by how consistent the imaging is across each lens. There are slight differences, of course; the ultra-wide in particular is quicker to blow out highlights, but the colors are really spot-on, and while there's a split second of delay during the transition between lenses, it's very impressive. I'm excited to see that improve even further.
That's not easy. You have three different image sensors, and usually they're not even the same type of image sensor. You've got multiple lenses, you have to tune those cameras so that the color is spot on; that the focus transition and exposure are the same; the white balance is the same; the color is the same; the basic texture and noise tuning is the same ... otherwise, your eye is going to see it. It's really good at picking up these discontinuities.
We're trying to build more and more hooks into hardware so that can be done easily as you transition, and when you go from wide to ultra-wide, it's not just about matching those parameters. It's also about when you're at that transition point, where you said there's a slight delay; there's also fusion going on between those two images to make sure that the orientation and lineup of those images is dead-on, and that's actually done in real time with a hardware block in the ISP that manages the orientation and warping to make those two images line up perfectly.
There's a lot to that, especially in those really tiny transition regions where you want it to be ultra-smooth; there's a lot of hardware behind that that's making it happen.
Hayato was a product reviewer and video editor for Android Central.