lipnet
Prompting Lipschitz-constrained network for multiple-in-one sparse-view CT reconstruction
Shi, Baoshun, Jiang, Ke, Lian, Qiusheng, Yu, Xinran, Fu, Huazhu
Despite significant advancements in deep learning-based sparse-view computed tomography (SVCT) reconstruction algorithms, these methods still encounter two primary limitations: (i) It is challenging to explicitly prove that the prior networks of deep unfolding algorithms satisfy Lipschitz constraints due to their empirically designed nature. (ii) The substantial storage costs of training a separate model for each setting in the case of multiple views hinder practical clinical applications. To address these issues, we elaborate an explicitly provable Lipschitz-constrained network, dubbed LipNet, and integrate an explicit prompt module to provide discriminative knowledge of different sparse sampling settings, enabling the treatment of multiple sparse view configurations within a single model. Furthermore, we develop a storage-saving deep unfolding framework for multiple-in-one SVCT reconstruction, termed PromptCT, which embeds LipNet as its prior network to ensure the convergence of its corresponding iterative algorithm. In simulated and real data experiments, PromptCT outperforms benchmark reconstruction algorithms in multiple-in-one SVCT reconstruction, achieving higher-quality reconstructions with lower storage costs. On the theoretical side, we explicitly demonstrate that LipNet satisfies boundary property, further proving its Lipschitz continuity and subsequently analyzing the convergence of the proposed iterative algorithms. The data and code are publicly available at https://github.com/shibaoshun/PromptCT.
WALINET: A water and lipid identification convolutional Neural Network for nuisance signal removal in 1H MR Spectroscopic Imaging
Weiser, Paul, Langs, Georg, Motyka, Stanislav, Bogner, Wolfgang, Courvoisier, Sรฉbastien, Hoffmann, Malte, Klauser, Antoine, Andronesi, Ovidiu C.
Purpose. Proton Magnetic Resonance Spectroscopic Imaging (1H-MRSI) provides non-invasive spectral-spatial mapping of metabolism. However, long-standing problems in whole-brain 1H-MRSI are spectral overlap of metabolite peaks with large lipid signal from scalp, and overwhelming water signal that distorts spectra. Fast and effective methods are needed for high-resolution 1H-MRSI to accurately remove lipid and water signals while preserving the metabolite signal. The potential of supervised neural networks for this task remains unexplored, despite their success for other MRSI processing. Methods. We introduce a deep-learning method based on a modified Y-NET network for water and lipid removal in whole-brain 1H-MRSI. The WALINET (WAter and LIpid neural NETwork) was compared to conventional methods such as the state-of-the-art lipid L2 regularization and Hankel-Lanczos singular value decomposition (HLSVD) water suppression. Methods were evaluated on simulated and in-vivo whole-brain MRSI using NMRSE, SNR, CRLB, and FWHM metrics. Results. WALINET is significantly faster and needs 8s for high-resolution whole-brain MRSI, compared to 42 minutes for conventional HLSVD+L2. Quantitative analysis shows WALINET has better performance than HLSVD+L2: 1) more lipid removal with 41% lower NRMSE, 2) better metabolite signal preservation with 71% lower NRMSE in simulated data, 155% higher SNR and 50% lower CRLB in in-vivo data. Metabolic maps obtained by WALINET in healthy subjects and patients show better gray/white-matter contrast with more visible structural details. Conclusions. WALINET has superior performance for nuisance signal removal and metabolite quantification on whole-brain 1H-MRSI compared to conventional state-of-the-art techniques. This represents a new application of deep-learning for MRSI processing, with potential for automated high-throughput workflow.
Learning conditional distributions on continuous spaces
Bรฉnรฉzet, Cyril, Cheng, Ziteng, Jaimungal, Sebastian
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.
5 Things AI Is Better At Than You
Your mother was right: you are special. While each of us is a perfect little snowflake in our own right, that doesn't necessarily mean we possess world-shaking skills. But back in the lab, data scientists are cranking out algorithms that exceed human capability on a regular basis. About a year ago, Facebook CEO Mark Zuckerberg predicted that artificial intelligence (AI) would generally surpass humans in core sensory capabilities (like seeing and hearing) in about five to 10 years. AI still can't "actually look at the photo and deeply understand what's in it or look at the videos and understand what's in it," he said at the time.
Google's AI can now lip read better than humans after watching thousands of hours of TV
The research follows similar work published by a separate group at the University of Oxford earlier this month. Using related techniques, these scientists were able to create a lip-reading program called LipNet that achieved 93.4 percent accuracy in tests, compared to 52.3 percent human accuracy. However, LipNet was only tested on specially-recorded footage that used volunteers speaking formulaic sentences. By comparison, DeepMind's software -- known as "Watch, Listen, Attend, and Spell" -- was tested on far more challenging footage; transcribing natural, unscripted conversations from BBC politics shows. More than 5,000 hours of footage from TV shows including Newsnight, Question Time, and the World Today, was used to train DeepMind's "Watch, Listen, Attend, and Spell" program.
Revealed: How Nvidia's 'backseat driver' AI learned to read lips
When Nvidia popped the bonnet on its Co-Pilot "backseat driver" AI at this year's Consumer Electronics Show, most onlookers were struck by its ability to lip-read while tracking CES-going "motorists'" actions within the "car". A slide taken at CES shows the Co-Pilot AI assistant performing four features: facial recognition, head tracking, gaze tracking and lip-reading. The @nvidia AI co-pilot analyzes you through face recognition, head and gaze tracking and lip reading to assist you. The automative AI is part of the GPU-flinger's DRIVE PX 2 platform, which uses sensors and multiple neural networks powered by the grunt of Nvidia's processors. An Nvidia spokesperson has since confirmed in an email to The Register that the lip-reading component was based on research paper [PDF] written by academics from the University of Oxford, Google DeepMind and the Canadian Institute for Advanced Research.
Google's AI can now lip read better than humans after watching thousands of hours of TV
The research follows similar work published by a separate group at the University of Oxford earlier this month. Using related techniques, these scientists were able to create a lip-reading program called LipNet that achieved 93.4 percent accuracy in tests, compared to 52.3 percent human accuracy. However, LipNet was only tested on specially-recorded footage that used volunteers speaking formulaic sentences. By comparison, DeepMind's software -- known as "Watch, Listen, Attend, and Spell" -- was tested on far more challenging footage; transcribing natural, unscripted conversations from BBC politics shows.DeepMind's AI program was trained on 5,000 hours of TV More than 5,000 hours of footage from TV shows including Newsnight, Question Time, and the World Today, was used to train DeepMind's "Watch, Listen, Attend, and Spell" program. The videos included 118,000 difference sentences and some 17,500 unique words, compared to LipNet's test database of video of just 51 unique words.
Who's better at reading lips โ humans or AI?
HAL 9000: I know that you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen. Astronaut Dave Bowman: Where the hell did you get that idea, HAL? HAL 9000: Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move. A bit behind schedule โ if you don't recognize that movie dialogue, it's from Stanley Kubrick's 2001 โ but computers are moving rapidly towards mastering lip-reading. But new research shows they're clearly outperforming humans, and improving fast. So if you've been captured on CCTV, with or without audio, it might soon be practical to decipher whatever you were talking about. Lip-reading has been an active focus of AI research for years.
Google's AI watched hours of TV to learn how to read lips better than you
Researchers from Google's UK-based artificial intelligence division DeepMind have collaborated with scientists from the University of Oxford to develop the world's most advanced lip-reading software โ and it probably reads lips better than you. To accomplish this, the researchers fed thousands of hours of TV footage from the BBC to a neural network, training it to annotate videos based on mouth movement analysis with an accuracy of 46.8 percent. For context, when tasked with captioning the same video, a professional human lip-reader proved to be almost four times less efficient, accurately guessing the right word only 12.4 percent of the time. The research builds upon previously published work by the University of Oxford that used similar techniques to build a lip-reading app called LipNet that could read video recordings of volunteers speaking in simple sentences with an accuracy of over 90 percent. However, unlike Oxford's program, DeepMind's software โ dubbed "Watch, Listen, Attend, and Spell" โ was trained and tested on much more challenging footage.
Lip reading AI smashes humans at interpreting silent sentences
One of the most memorable parts of Stanley Kubrick's sci-fi masterpiece 2001: A Space Odyssey is a plotline in which two members of the Discovery One spaceship crew grow increasingly suspicious about the behaviour of the ship's AI assistant, HAL 9000. Knowing that HAL is constantly listening to what they are saying, they retreat someplace they know HAL cannot listen and agree to disconnect him. HAL rumbles their plan after the two astronauts fail to take into account the AI's superior lip-reading capabilities. Not according to research carried out by investigators at Oxford University. They've developed an artificial intelligence program called LipNet, which is able to accurately interpret what people are saying, based purely on the way they move their mouth when speaking.