Commit Graph

4 Commits

Author SHA1 Message Date
Nolan Tremelling 194a8017d6 Add migration script for user limits overrides (#1755)
* Add migration script for user limits overrides

* Bump packages

* Compose fix

* Fix links
2025-01-03 21:12:25 -06:00
emrgnt-cmplxty 51d258262e Release 3.1 (#1011)
* Feature/orchestration v0 (#1006)

* Feature/remove extra r2r abstraction (#996)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* removes an unnecessary abstraction

* sync changes

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>

* first commit

* move towards orchestration

* tweaks

* check in working ingestion

* move

* kg enrichment

* update future, postgres compose

* hatchetize ingestion pipeline

* ready for prime time

* finish

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>

* Feature/add update files workflow (#1010)

* add update files workflow

* rm ingestion pipeline

* Feature/add enrichment flow (#1013)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* Feature/merged enrichment flow (#1016)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* Rm graspologic (#1034)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

* Patch/ollama base cli (#992)

* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

* Ingestion refactor (#991)

* fix test (#993)

* Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256.

* Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries.

* Update runners (#1007)

* Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j.

* Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider.

* Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe.

* hatchet works

* throw error if you run global search before enrichment

* Fix communities in local search

* turn off node desc embedding

* fix rag endpoint

* Increase hatchet msg size

* Update ingestion.py

* Refactor and clean up code formatting

* modified workflow

* Add graph creation functionality

* Refactor KG parameters and logging.

* review

* up

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* Feature/add hatchet api key setup rebased (#1040)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* add hatchet api key setup

* cleanup

* add hatchet api key setup (#1037)

* add hatchet api key setup

* cleanup

* fix merge

* cleanups

* Feature/nolan logs refactored (#1041)

* Update runners (#1007)

* Check in logs

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* Pull open PRs into dev (#1042)

* Pull in subnet and graph PR

* Add in templates

* Add python files for templates in cli (#1043)

* working hatchet integration (#1046)

* Update local_llm_neo4j_kg.toml

* Unstructured fixes (#1048)

* dockerfile

* Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling.

* clean up

* clean up dockerfile

* up

* Update sample file and clean code

* Add hatchet-sdk dependency in project.

* Update providers to include local option.

* Introduce File Provider (#1044)

* Draft of file provider

* Some cleanup

* Regenearte lock

* Stream it

* Use document_id as primary key

* Pydantic v2

* File provider finished

* Make 7272 the default port (#1045)

* Fix poetry.lock

* Precommit

* Enhance Dockerfile and add telemetry events (#1049)

* Fix File Provider (#1050)

* Fix

* Fix parsing pipeline

* working

* Feature/improve docs (#1051)

* improve documentation

* fix unstr

* add ingestion

* fix compose

* Add unstructured chunking configuration updates

* Revert "Add unstructured chunking configuration updates"

This reverts commit bae8c0b65f.

* Separate File Provider and Relational Database Provider (#1054)

* Move to self.execute_query

* Check in push

* Check in

* Get file provider running

* Actually use file provider

* Final touches

* undo changes in compose

* Patch/fix unstructured config rebased (#1059)

* fix unstr err

* tweak

* by_title default

* cleanups

* checkin

* merge

* Graph docs (#1058)

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* cleanup docs

* up

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>

* Graph docs (#1060)

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

---------

Co-authored-by: emrgnt-cmplxty <owen@algofi.org>

* Remove duplicate method (#1061)

* update docs (#1064)

* rm extra prints

* fix img

* Fallback logic (#1062)

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

* Implement fallback parsing mechanism

* Fallback parser

* Refactor code for readability and formatting

* Refactor and enhance media parsers

* Update response types in router.

* Remove telemetry and add logging

* Refactor logging format in parsers

* Refactor image and movie parsers

* Fix formatting in movie_parser.py

* Remove debug logging statements

* Remove debug logging for chunking config

* Rename debug option to build.

---------

Co-authored-by: emrgnt-cmplxty <owen@algofi.org>

* Refactor response models for clarity

* Refactor response types in router.

* Feature/fix agent (#1065)

* ready for merge

* fix agent

* Patch/fix 123 (#1066)

* ready for merge

* fix agent

* fix import

* Feature/add orchestration draft (#1067)

* ready for merge

* fix agent

* fix import

* Fix some of the tests (#1068)

* Fix fallback parsing (#1069)

* Fix fallback parsing

* Fix

* Compose

* up

* Feature/iterate on docs (#1070)

* add orchestration docs

* docs iteration

* iterate

* add images

* add images

* Fix restructuring enum (#1071)

* Feature/formatting cleanup (#1072)

* add orchestration docs

* docs iteration

* iterate

* add images

* add images

* run pre-commit

* reclean

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
2024-09-06 11:15:22 -07:00
Nolan Tremelling 3c06d11c83 Update Actions Runner (#973)
* Update docker build action

* Update runner

* Update compose
2024-08-23 16:22:55 -07:00
emrgnt-cmplxty 7cd3743630 Feature/merge dev and main (#968)
* Feature/merge graphrag group mgmt (#876)

* add group ids to document abstraction, first steps

* extend group permissions

* up

* add tests for new group features

* up

* fixup auth

* onboard extensive regression tests

* adding regression tests

* finish tests

* rm selenium

* test observability

* uncomment tests

* checkin first set of group tests

* modify search, passing vector tests

* checkin work

* full delete logic

* update search to use new filters

* check in

* Clean up

* Check in

* add search

* tests/test_end_to_end.py::test_ingest_txt_document passing

* cleanup logging

* make schemas explicit

* move to run logger abstraction

* cleanup some test workflows

* revive tests

* tweak to pass tests

* tweak rrf

* finish hybrid search cleanup

* fixup on regr tests, regen payloads

* refresh payloads

* refactor api model

* Feature/refactor api model (#868)

* cleanup imports

* flake and cleanup

* coherent global import / export structure

* add ingestion response models

* add management response models

* cleanups

* checkin work on routes

* remove request models

* last fixes

* merge

* add user / group gating

* working test groups

* updating client

---------

Co-authored-by: NolanTrem <34580718+NolanTrem@users.noreply.github.com>

* Clean up API (#878)

* Get running

* fixes in sdk

* Add in more fixes

* Feature/merge dev owen changes (#880)

* add group ids to document abstraction, first steps

* extend group permissions

* up

* add tests for new group features

* up

* fixup auth

* onboard extensive regression tests

* adding regression tests

* finish tests

* rm selenium

* test observability

* uncomment tests

* checkin first set of group tests

* modify search, passing vector tests

* checkin work

* full delete logic

* update search to use new filters

* check in

* Clean up

* Check in

* add search

* tests/test_end_to_end.py::test_ingest_txt_document passing

* cleanup logging

* make schemas explicit

* move to run logger abstraction

* cleanup some test workflows

* revive tests

* tweak to pass tests

* tweak rrf

* finish hybrid search cleanup

* fixup on regr tests, regen payloads

* refresh payloads

* refactor api model

* Feature/refactor api model (#868)

* cleanup imports

* flake and cleanup

* coherent global import / export structure

* add ingestion response models

* add management response models

* cleanups

* checkin work on routes

* remove request models

* last fixes

* merge

* add user / group gating

* working test groups

* updating client

* rename service to restructure

* add get documents for group endpoint

* fix client bugs

* return delete format

* merge cleanups

* merge

* finalize

---------

Co-authored-by: NolanTrem <34580718+NolanTrem@users.noreply.github.com>

* Shreyas/graphrag test (#881)

* add group ids to document abstraction, first steps

* extend group permissions

* up

* add tests for new group features

* up

* fixup auth

* onboard extensive regression tests

* adding regression tests

* finish tests

* rm selenium

* test observability

* uncomment tests

* checkin first set of group tests

* modify search, passing vector tests

* checkin work

* full delete logic

* update search to use new filters

* check in

* Clean up

* Check in

* add search

* tests/test_end_to_end.py::test_ingest_txt_document passing

* cleanup logging

* make schemas explicit

* move to run logger abstraction

* cleanup some test workflows

* revive tests

* tweak to pass tests

* tweak rrf

* finish hybrid search cleanup

* fixup on regr tests, regen payloads

* refresh payloads

* refactor api model

* Feature/refactor api model (#868)

* cleanup imports

* flake and cleanup

* coherent global import / export structure

* add ingestion response models

* add management response models

* cleanups

* checkin work on routes

* remove request models

* last fixes

* merge

* add user / group gating

* sync

* enrich

* up

* fix global search

* rag

* remove client.py

* rm configs

* rm configs

---------

Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
Co-authored-by: NolanTrem <34580718+NolanTrem@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>

* Feature/fix embedding pipe (#882)

* up

* fixup concurrency

* fix ollama embeddings

* fix batching with ollama

* checkin all cleanups

* rm kg cruft (#884)

* rm kg cruft

* tweaks

* tweak 2 (#885)

* Feature/fix retrieval endpoint cruft (#887)

* tweak 2

* fix retrieval endpoint descriptions

* Python SDK (#886)

Clean up Python SDK and routes

* Separate out SDK, add js and go sdk to monorepo (#888)

* Add r2r-js sdk

* Add go sdk

* Pull out python sdk

* remove venv

* Update packages

* Check in fixes

* Remove alembic dependencies

* Feature/merge w nolan (#894)

* cleanup hybrid search

* cleanups in

* Fix structure

* Make graspologic optional

* fix rag stream (#895)

* add py r2r (#896)

* Clean up (#897)

* fix agent (#898)

* define `RAGAgentResponse` (#899)

* Shreyas/unstructured (#900)

* api + oss lib

* rm pdb

* rm poetry lock

* update version

* fixes

* Feature/cleanup client obj logic (#901)

* define `RAGAgentResponse`

* cleanup client logic

* Shreyas/tests (#889)

* init

* tests

* rename service

* api model

* add

* merge

* rm restructure router

* print descriptions

* Refactor CLI (#903)

* Rm files readded by git (#904)

* Remove Execution Wrapper (#905)

* Rm files readded by git

* Fix merge botch

* Feature/fix auth revive tests rebased (#906)

* adding the client touch ups

* fix auth, revive tests

* add back tests

* uncomment run auth workflow

* decruft

* refresh test kg

* fixup toml (#908)

* Feature/fix ingestion update (#909)

* fixup toml

* fix update

* Fix CLI Tests (#912)

Fix CLI tests

* Shreyas/kg runtime cfg (#913)

add kg runtime config

* rename kgenrichmentresponse (#914)

* Feature/add nltk hybrid expansion rebased (#917)

* expand hybrid search with nltk

* cleanups

* cleanup hybrid search

* format

* add setup.py

* update

* add script (#918)

* Fix bug in document chunks (#921)

* Fix bug in update files (#923)

* Shreyas/unstructured (#922)

* fix dockerfiles

* adding config

* fix paths

* mv unstructured dep to docker

* clean

* Update docker_utils.py

* Update unstructured_parsing.py

* Update r2r_chunking.py

* Update app_entry.py

* Feature/repair logging (#925)

* fixing logs

* fix

* rm double logging (#929)

* Configs (#926)

* Fix config logic

* Update config

* Clean up cli entry point

* Disable SSL when installing nltk wordnet (#930)

* Fix analytics endpoint

* Update OpenAI sdk calls (#933)

* Feature/revive advanced rag (#932)

* rm double logging

* revive advanced rag examples

* merge (#934)

* sync model (#935)

* Feature/remove version from ingestion end pt (#936)

* sync model

* remove ability to set version

* tweak versions impl

* fix version bug

* Move docker (#938)

* Move docker

* remove from root

* Clean up sdk/restructure.py

* Fix js tests, completion scoring (#939)

* Shreyas/unstructured docker image (#940)

unstructured docker image

* Update JS (#941)

* Update models (#942)

* Feature/complete group logic (#945)

* fix group logic

* up

* Fix Dockerbuild, Symlink Readme (#944)

* Add back tast prompt override and include title if availible

* Fix docker, sym link readme

* Fix compose file path

* Shreyas/KG Search Result model (#937)

* return type to kg_search_result

* add model

* local and global results

* modify config

* refresh should not be gated by auth (#946)

* Linting sync (#947)

* Remove email from refresh (#948)

* Fix link to image

* Feature/rm print cruft rebase (#953)

* refresh should not be gated by auth

* rm print cruft

* black and sort

* merge

* rm

* update api return type

* Update Actions (#954)

* Update Github Actions (#956)

* Update Actions

* Update actions

* Shreyas/kgsearchresult model (#957)

* return type to kg_search_result

* add model

* local and global results

* modify config

* add models

* up

* fix config path

* fix models

* Login and refresh token bug (#959)

* Update Actions

* Fix bug in login with refresh token

* Point pytest to linux (#960)

* collection docs (#955)

* Feature/merge dev to main (#962)

* merge dev and main

* git rm

* add back collection fix

* fix docker builds (#963)

* Running unstructured docker + code cleanups (#964)

* Small bugfixes on prompts, return types (#965)

* Fix failing CLI tests

* NPM publish action

* remove tarball

* Feature/fix dev tests (#966)

* update auth tests

* fix tests

* back and sort

* decruft

* revert back to gpt-4o

---------

Co-authored-by: NolanTrem <34580718+NolanTrem@users.noreply.github.com>
Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
2024-08-23 15:17:15 -07:00