Information Retrieval
Instagram Rolls Out In-App Local Business Profile Pages - Search Engine Journal
Instagram is introducing a new way to showcase local businesses with in-app profile pages. Raj Nijjer alerted me to this feature while providing several screenshots. As you can see in the examples below, the pages look very much like Google local knowledge panels. They have the business address, hours, contact information, and website. Of course, a link to the business's Instagram profile is featured prominently at the top of the page.
Today's customer decision journey is so complex but AI can help - Search Engine Land
Myth: "The customer journey is not as complex as it's made out to be." One thing is for sure โ the consumer decision journey is more complex than ever before. The average consumer now owns three to four devices and uses multiple online and offline channels throughout their shopping journeys. The game is changing as marketers turn to artificial intelligence, agencies and data to help them navigate new consumer behavior. Every marketer today needs to be addressing these challenges as the CDJ itself is disrupting the digital landscape.
A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams
Bose, Avishek, Behzadan, Vahid, Aguirre, Carlos, Hsu, William H.
We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and developing (marked by significance with respect to similarity with a previously detected event). While some existing approaches to event detection measure novelty and trendiness, typically as independent criteria and occasionally as a holistic measure, this work focuses on detecting both novel and developing events using an unsupervised machine learning approach. Furthermore, our proposed approach enables the ranking of cyber threat events based on an importance score by extracting the tweet terms that are characterized as named entities, keywords, or both. We also impute influence to users in order to assign a weighted score to noun phrases in proportion to user influence and the corresponding event scores for named entities and keywords. To evaluate the performance of our proposed approach, we measure the efficiency and detection error rate for events over a specified time interval, relative to human annotator ground truth.
Real-Time Entity Resolution Made Accessible - Senzing
Knowing exactly who your customers are is an important task for security, fraud detection, marketing, and personalization. The proliferation of data sources and services has made ER very challenging in the internet age. In addition, many applications now increasingly require near real-time entity resolution.
4 chilling lessons from a tech hotline scam
While we love our smartphones, they are vulnerable to hackers. Here are some ways to keep them hacker free. Some people think they're immune to cybercriminals. "I'm not even on their radar," they think. "What are the chances that I'll get targeted? It's not like I'm famous or have zillions of dollars."
Gathering Cyber Threat Intelligence from Twitter Using Novelty Classification
Le, Ba Dung, Wang, Guanhua, Nasim, Mehwish, Babar, Ali
Preventing organizations from Cyber exploits needs timely intelligence about Cyber vulnerabilities and attacks, referred as threats. Cyber threat intelligence can be extracted from various sources including social media platforms where users publish the threat information in real time. Gathering Cyber threat intelligence from social media sites is a time consuming task for security analysts that can delay timely response to emerging Cyber threats. We propose a framework for automatically gathering Cyber threat intelligence from Twitter by using a novelty detection model. Our model learns the features of Cyber threat intelligence from the threat descriptions published in public repositories such as Common Vulnerabilities and Exposures (CVE) and classifies a new unseen tweet as either normal or anomalous to Cyber threat intelligence. We evaluate our framework using a purpose-built data set of tweets from 50 influential Cyber security related accounts over twelve months (in 2018). Our classifier achieves the F1-score of 0.643 for classifying Cyber threat tweets and outperforms several baselines including binary classification models. Our analysis of the classification results suggests that Cyber threat relevant tweets on Twitter do not often include the CVE identifier of the related threats. Hence, it would be valuable to collect these tweets and associate them with the related CVE identifier for cyber security applications.
Multi-Label Product Categorization Using Multi-Modal Fusion Models
Wirojwatanakul, Pasawee, Wangperawong, Artit
In this study, we investigated multi-modal approaches using images, descriptions, and title to categorize e-commerce products on Amazon.com. Specifically, we examined late fusion models, where the modalities are fused at the decision level. Products were each assigned multiple labels, and the hierarchy in the labels were flattened and filtered. For our individual baseline models, we modified a CNN architecture to classify the description and title, and then modified Keras' ResNet-50 to classify the images, achieving F1 scores of 77.0%, 82.7%, and 61.0%, respectively. In comparison, our tri-modal late fusion model can classify products more accurately than single modal models can, improving the F1 score to 88.2%. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the accuracy of multi-label classification problems.
Internet of Things Search Engine
Advancements under the moniker of the Internet of Things (IoT) allow things to network and become the primary producers of data in the Internet.14 IoT makes the state and interactions of real-world available to Web applications and information systems with minimal latency and complexity.25 By enabling massive telemetry and individual addressing of "things," the IoT offers three prominent benefits: spatial and temporal traceability of individual real-world objects for thief prevention, counterfeit product detection and food safety via accessing their pedigree; enabling ambient data collection and analytics for optimizing crop planning, enabling telemedicine and assisted living; and supporting real-time reactive systems such as smart building, automatic logistics and self-driving, networked cars.11 Realizing these benefits requires the ability to discover and resolve queries for contents in the IoT. Offering these abilities is the responsibility of a class of software system called the Internet of Things search engine (IoTSE).
An Empirical Comparison of FAISS and FENSHSES for Nearest Neighbor Search in Hamming Space
Mu, Cun, Yang, Binwei, Yan, Zheng
In this paper, we compare the performances of FAISS and FENSHSES on nearest neighbor search in Hamming space--a fundamental task with ubiquitous applications in nowadays eCommerce. Comprehensive evaluations are made in terms of indexing speed, search latency and RAM consumption. This case study is conducted towards a better understanding of these fundamental trade-offs between nearest neighbor search systems implemented in main memory and the ones implemented in secondary memory, which is largely unaddressed in literature.
VisualData: A Search Engine for Computer Vision Datasets
Algorithms, computation and visual data are the three pillars of computer vision (CV). Researchers, institutions and open source communities have produced sophisticated algorithms and open-sourced code; while global tech giants' supercharged cloud platforms provide all the computational power CV researchers require. However, efficiently sourcing visual data -- particularly images with high-quality annotations -- remains a challenge. Building large datasets is a time-consuming and labor-intensive task which challenges entities with limited budgets. There are hundreds of open visual datasets out there, but searching across them and their millions of entries is not a simple task.