temple
EgMM-Corpus: A Multimodal Vision-Language Dataset for Egyptian Culture
Gamil, Mohamed, Elsayed, Abdelrahman, Lila, Abdelrahman, Gad, Ahmed, Abdelgawad, Hesham, Aref, Mohamed, Fares, Ahmed
Despite recent advances in AI, multimodal culturally diverse datasets are still limited, particularly for regions in the Middle East and Africa. In this paper, we introduce EgMM-Corpus, a multimodal dataset dedicated to Egyptian culture. By designing and running a new data collection pipeline, we collected over 3,000 images, covering 313 concepts across landmarks, food, and folklore. Each entry in the dataset is manually validated for cultural authenticity and multimodal coherence. EgMM-Corpus aims to provide a reliable resource for evaluating and training vision-language models in an Egyptian cultural context. We further evaluate the zero-shot performance of Contrastive Language-Image Pre-training CLIP on EgMM-Corpus, on which it achieves 21.2% Top-1 accuracy and 36.4% Top-5 accuracy in classification. These results underscore the existing cultural bias in large-scale vision-language models and demonstrate the importance of EgMM-Corpus as a benchmark for developing culturally aware models.
- Europe > Middle East (0.25)
- Asia > Middle East (0.25)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
- (3 more...)
The search for Cleopatra's long-lost tomb leads to sunken seaport
Science Archaeology The search for Cleopatra's long-lost tomb leads to sunken seaport A new documentary explores this 2,000-year-old mystery and a connection to the RMS'Titanic.' Breakthroughs, discoveries, and DIY tips sent every weekday. She's among the most famous leaders in world history, yet archeologists still don't know the location of Egyptian Queen Cleopatra's tomb. Now, National Geographic Explorer and archaeologist Dr. Kathleen Martínez and her team have uncovered a major clue in their 20-year-long hunt: the remains of a port off Egypt's Mediterranean coast. The previously unknown ancient port could have been used to keep the Egyptian queen's remains out of Roman hands.
- North America > United States > Wisconsin (0.05)
- Europe (0.05)
- Atlantic Ocean > Mediterranean Sea (0.05)
- (2 more...)
- Health & Medicine > Therapeutic Area (0.31)
- Media (0.30)
Assassin's Creed: Shadows – a historic frolic through feudal Japan
Japan, 1581: Iga province is burning down around you. You watch on, injured and helpless as the Oda Nobunaga - the warlord responsible for numerous civil wars and the eventual unification of the country - smirks from a nearby hill. You draw your katana, the blade shining in the flickering light of the flames. This is Assassin's Creed: Shadows – part exciting ninja game, part history lesson. It's an odd combination but it comes together in a sprawling historical-fiction adventure full of discovery and deception.
IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model
Bortolon, Matteo, Tsesmelis, Theodore, James, Stuart, Poiesi, Fabio, Del Bue, Alessio
We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess.
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- North America > United States > Utah (0.04)
- Europe > Italy > Liguria > Genoa (0.04)
How the Authors of the Bible Spun Triumph from Defeat
The Moshiach came to Madison Avenue this summer. All over a not particularly Jewish neighborhood, posters of the bearded, Rembrandtesque Rebbe Schneerson appeared, mucilaged to every light post and bearing the caption "Long Live the Lubavitcher Rebbe King Messiah forever!" This was, or ought to have been, trebly astonishing. First, the rebbe being urged to a longer life died in 1994, and the new insistence that he was nonetheless the Moshiach skirted, as his followers tend to do, the question of whether he might remain somehow alive. Second, the very concept of a messiah recapitulates a specific national hope of a small and oft-defeated nation several thousand years ago, and spoke originally to the local Judaean dream of a warrior who would lead his people to victory over the Persians, the Greeks, and, latterly, the Roman colonizers.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.06)
- Asia > Afghanistan (0.04)
Maximal Ordinal Two-Factorizations
Dürrschnabel, Dominik, Stumme, Gerd
Given a formal context, an ordinal factor is a subset of its incidence relation that forms a chain in the concept lattice, i.e., a part of the dataset that corresponds to a linear order. To visualize the data in a formal context, Ganter and Glodeanu proposed a biplot based on two ordinal factors. For the biplot to be useful, it is important that these factors comprise as much data points as possible, i.e., that they cover a large part of the incidence relation. In this work, we investigate such ordinal two-factorizations. First, we investigate for formal contexts that omit ordinal two-factorizations the disjointness of the two factors. Then, we show that deciding on the existence of two-factorizations of a given size is an NP-complete problem which makes computing maximal factorizations computationally expensive. Finally, we provide the algorithm Ord2Factor that allows us to compute large ordinal two-factorizations.
- Europe > Germany > Saxony > Dresden (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Romania > Nord-Vest Development Region > Cluj County > Cluj-Napoca (0.04)
- (3 more...)
ChatGPT Needs to Get Way Better Before It Takes My Job
The rapid development of conversational AI, particularly ChatGPT, is both impressive and concerning. No one wants to worry about a bot taking their job(Opens in a new window). Employers weighing whether to adopt AI may want to think twice, though. After putting ChatGPT through its paces, we found that it's got a lot to learn. It served up vague and sometimes flat-out wrong information on topics PCMag staffers know well.
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Zhu, Deyao, Chen, Jun, Haydarov, Kilichbek, Shen, Xiaoqian, Zhang, Wenxuan, Elhoseiny, Mohamed
Asking insightful questions is crucial for acquiring knowledge and expanding our understanding of the world. However, the importance of questioning has been largely overlooked in AI research, where models have been primarily developed to answer questions. With the recent advancements of large language models (LLMs) like ChatGPT, we discover their capability to ask high-quality questions when provided with a suitable prompt. This discovery presents a new opportunity to develop an automatic questioning system. In this paper, we introduce ChatCaptioner, a novel automatic-questioning method deployed in image captioning. Here, ChatGPT is prompted to ask a series of informative questions about images to BLIP-2, a strong vision question-answering model. By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions. We conduct human-subject evaluations on common image caption datasets such as COCO, Conceptual Caption, and WikiArt, and compare ChatCaptioner with BLIP-2 as well as ground truth. Our results demonstrate that ChatCaptioner's captions are significantly more informative, receiving three times as many votes from human evaluators for providing the most image information. Besides, ChatCaptioner identifies 53% more objects within the image than BLIP-2 alone measured by WordNet synset matching. Code is available at https://github.com/Vision-CAIR/ChatCaptioner
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)