Consolidating and Exploring Open Textual Knowledge Prof. Ido Dagan, Bar Ilan University >> Click here
מבוא לשפה - עיבוד ממוחשב של שפה אנושית עם פרופסור עידו דגן With Spotify
Start with NLP
Recommended textbook, available online:
It also provides great little introductions to many fields of linguistics before you hop into the computational part.
NLP Tutorials Part -I from Basics to Advance
Natural Language Processing with Python
100 ChatGPT terms explained from NLP to Entity Extraction
Natural Language Processing In Healthcare
Natural Language Processing Specialization
Hebrew NLP Resources
NNLP-IL Hebrew and Arabic NLP Resources
מאגרי מידע ושת"פים אפשריים
חוות דעת: שימושים בתכנים מוגנים בזכויות יוצרים לצורך למידת מכונה
spaCy · Industrial-strength Natural Language Processing in Python
Stanza – A Python NLP Package for Many Human Languages
Created by the Stanford NLP Group
Open Source OCR
Speech Recognition - Whisper (OpenAI)
Large language model (LLM)
Open LLMs List
Meta Llama 2
What’s before GPT-4? A deep dive into ChatGPT
GPT-4 Training process
Like previous GPT models, the GPT-4 base model was trained to predict the next word in a document, and was trained using publicly available data (such as internet data) as well as data we’ve licensed. The data is a web-scale corpus of data including correct and incorrect solutions to math problems, weak and strong reasoning, self-contradictory and consistent statements, and representing a great variety of ideologies and ideas.
So when prompted with a question, the base model can respond in a wide variety of ways that might be far from a user’s intent. To align it with the user’s intent within guardrails, we fine-tune the model’s behavior using reinforcement learning with human feedback (RLHF).
Note that the model’s capabilities seem to come primarily from the pre-training process—RLHF does not improve exam performance (without active effort, it actually degrades it). But steering of the model comes from the post-training process—the base model requires prompt engineering to even know that it should answer the questions.
How Language-Neutral is Multilingual BERT?
AraBERT: Transformer-based Model for Arabic Language Understanding
LaBSE - Language-agnostic BERT sentence embedding model supporting 109 languages.
LaBSE model to PyTorch. It can be used to map 109 languages to a shared vector space.
Claude is a large language model (LLM) built by Anthropic.
It's trained to be a helpful assistant in a conversational tone.
Jais - open-source Arabic Large Language Model (LLM)